# AmesHousing Project - Regression & Classification

## Warning: Only run one of regression or classification
### Avoid running both together

### Project Description:
##### Regression: Predicting the type and the price of a house based on area, location, number of bedrooms and other features...
##### Classification: Predict the type of the house (Apartment, Villa, Bungalow, etc...)
##### Type of ML Training/Algorithm: Supervised - Classification, Regression
##### Dataset: Ames Housing Dataset(Kaggle)( [Dataset](https://www.kaggle.com/datasets/prevek18/ames-housing-dataset) )

### Libraries being used:
- Pandas: Tabular data handling (.csv in this project)
- NumPy: For numerical operations.
- PyTorch: For the training (Supports Regression and also classification)
- scikit-learn: Needed for PyTorch compatibility: For StandardScaler, LabelEncoder, OneHotEncoder, etc... 
- Matplotlib: Visualizations like Line/bar/histogram charts.
- Seaborn: For visualizations like correlation heatmaps, pair plots(EDA)

### Installing all the necessary libraries for the project.

In [5]:
!pip install pandas numpy matplotlib seaborn scikit-learn torch


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### Importing the necessary libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.metrics import mean_squared_error, accuracy_score

import torch
import torch.nn as nn
import torch.optim as optim

### Loading the DataSet

In [2]:
df=pd.read_csv('AmesHousing.csv')

### Inspecting the DataSet

In [3]:
print(df.shape)
print(df.head(10))
print(df.tail(10))
print(df.info())
print(df.describe())

(2930, 82)
   Order        PID  MS SubClass MS Zoning  Lot Frontage  Lot Area Street  \
0      1  526301100           20        RL         141.0     31770   Pave   
1      2  526350040           20        RH          80.0     11622   Pave   
2      3  526351010           20        RL          81.0     14267   Pave   
3      4  526353030           20        RL          93.0     11160   Pave   
4      5  527105010           60        RL          74.0     13830   Pave   
5      6  527105030           60        RL          78.0      9978   Pave   
6      7  527127150          120        RL          41.0      4920   Pave   
7      8  527145080          120        RL          43.0      5005   Pave   
8      9  527146030          120        RL          39.0      5389   Pave   
9     10  527162130           60        RL          60.0      7500   Pave   

  Alley Lot Shape Land Contour  ... Pool Area Pool QC  Fence Misc Feature  \
0   NaN       IR1          Lvl  ...         0     NaN    NaN    

### Dropping irrelevent columns or NaNs 

In [4]:
df = df.drop(columns=[
    'Order', 'PID', 'Alley', 'Pool QC', 'Fence', 'Misc Feature', 'Utilities',
    'Fireplace Qu', 'Garage Yr Blt', 'Garage Cond',
    'Condition 2', 'BsmtFin SF 2', 'Low Qual Fin SF',
    '3Ssn Porch', 'Screen Porch', 'Pool Area', 'Misc Val','MS SubClass',
    'Roof Matl','Exterior 2nd','BsmtFin Type 2','Functional','Enclosed Porch',
    'Garage Qual',"Mo Sold"
])

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2930 entries, 0 to 2929
Data columns (total 57 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   MS Zoning       2930 non-null   object 
 1   Lot Frontage    2440 non-null   float64
 2   Lot Area        2930 non-null   int64  
 3   Street          2930 non-null   object 
 4   Lot Shape       2930 non-null   object 
 5   Land Contour    2930 non-null   object 
 6   Lot Config      2930 non-null   object 
 7   Land Slope      2930 non-null   object 
 8   Neighborhood    2930 non-null   object 
 9   Condition 1     2930 non-null   object 
 10  Bldg Type       2930 non-null   object 
 11  House Style     2930 non-null   object 
 12  Overall Qual    2930 non-null   int64  
 13  Overall Cond    2930 non-null   int64  
 14  Year Built      2930 non-null   int64  
 15  Year Remod/Add  2930 non-null   int64  
 16  Roof Style      2930 non-null   object 
 17  Exterior 1st    2930 non-null   o

### Filling in the missing values

In [6]:
df = df.fillna(df.median(numeric_only=True)) #For numerical
df = df.fillna("None") #For Categorical

### Encoding categorical features

In [7]:
categorical_cols = df.select_dtypes(include='object').columns
df = pd.get_dummies(df, columns=categorical_cols) #One-Hot encoding

### Defining target and features

#### For Regression: 

In [None]:
x = df.drop(columns='SalePrice')
y = df['SalePrice']

#### For classification

##### Create price category

In [8]:
df['PriceCategory']=pd.cut(df['SalePrice'],
                          bins=[0,150000, 300000, np.inf],
                          labels=['Low','Medium','High'])

X = df.drop(columns=['SalePrice','PriceCategory'])
y = df['PriceCategory']

##### Encode Labels

In [9]:
le = LabelEncoder()
y = le.fit_transform(y) #Converts to 0,1,2

### Train-Test Split

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

### Scale Features

In [11]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### Conversion to PyTorch Tensors

#### Selecting the hardware (CUDA/CPU)

In [12]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if not torch.cuda.is_available():
    print("Using CPU")
else:
    print("Using CUDA GPU")

Using CUDA GPU


#### Converting

In [15]:
X_train_tensor = torch.tensor(X_train, dtype=torch.float32).to(device)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).to(device)

#### Regression

In [None]:
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).view(-1,1).to(device)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).view(-1,1).to(device)

#### Classification

In [16]:
y_train_tensor = torch.tensor(y_train, dtype=torch.long).to(device)
y_test_tensor = torch.tensor(y_test, dtype=torch.long).to(device)

### Defining the PyTorch Model

#### Regression

In [7]:
class RegressionModel(nn.Module):
    def __init__(self, input_dim):
        super(RegressionModel, self).__init__()
        self.fc=nn.Sequential(
            nn.Linear(input_dim,128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64,1)
        )

    def forward(self , x):
        return self.fc(x)

SyntaxError: incomplete input (1975267751.py, line 4)

#### Classification

In [17]:
class ClassificationModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(ClassificationModel, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, output_dim)
        )

    def forward(self, x):
        return self.fc(x)

### Training the Model

#### Regression

In [None]:
model = RegressionModel(X_train.shape[1]).to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

epochs = 100
for epoch in range(epochs):
    model.train()
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    optimizer.zero_grad()
    optimizer.step()

    if (epoch+1)&10 ==0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

#### Classification

In [18]:
num_classes = len(np.unique(y_train))
model = ClassificationModel(input_dim=X_train.shape[1], output_dim=num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

epochs = 100
for epoch in range(epochs):
    model.train()
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    optimizer.zero_grad()
    optimizer.step()

    if (epoch+1)&10 ==0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

Epoch [1/100], Loss: 1.0721
Epoch [4/100], Loss: 1.0721
Epoch [5/100], Loss: 1.0721
Epoch [16/100], Loss: 1.0721
Epoch [17/100], Loss: 1.0721
Epoch [20/100], Loss: 1.0721
Epoch [21/100], Loss: 1.0721
Epoch [32/100], Loss: 1.0721
Epoch [33/100], Loss: 1.0721
Epoch [36/100], Loss: 1.0721
Epoch [37/100], Loss: 1.0721
Epoch [48/100], Loss: 1.0721
Epoch [49/100], Loss: 1.0721
Epoch [52/100], Loss: 1.0721
Epoch [53/100], Loss: 1.0721
Epoch [64/100], Loss: 1.0721
Epoch [65/100], Loss: 1.0721
Epoch [68/100], Loss: 1.0721
Epoch [69/100], Loss: 1.0721
Epoch [80/100], Loss: 1.0721
Epoch [81/100], Loss: 1.0721
Epoch [84/100], Loss: 1.0721
Epoch [85/100], Loss: 1.0721
Epoch [96/100], Loss: 1.0721
Epoch [97/100], Loss: 1.0721
Epoch [100/100], Loss: 1.0721


### Evaluating the model

#### Regression

In [None]:
model.eval()
preds = model(X_test_tensor).detach().cpu().numpy()
rmse = np.sqrt(mean_squared_error(y_test, preds))
print("RMSE:", rmse)

#### Classification

In [19]:
model.eval()
outputs = model(X_test_tensor)
_, predicted = torch.max(outputs, 1)
acc = accuracy_score(y_test, predicted.cpu().numpy())
print("Accuracy:", acc)

Accuracy: 0.31569965870307165


### Saving or exporting the model

In [20]:
torch.save(model.state_dict(), 'house_model.pth')

### Loading the saved model in order to make predictions

In [24]:
class HousePriceModel(nn.Module):
    def __init__(self, input_dim):
        super(HousePriceModel, self).__init__()
        self.fc = nn.Sequential(   # <--- match this name
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 3)
        )

    def forward(self, x):
        return self.fc(x)

In [25]:
input_size = X_train.shape[1]

In [26]:
model = HousePriceModel(input_size)
model.load_state_dict(torch.load('house_model.pth'))
model.eval() # Set to evaluation mode

HousePriceModel(
  (fc): Sequential(
    (0): Linear(in_features=221, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=3, bias=True)
  )
)

In [28]:
# Example input: match exact preprocessing pipeline used in training
# For example, let's use the first row of X_test (after preprocessing)
sample_input = X_test[0].reshape(1, -1)  # shape: (1, input_dim)

In [29]:
input_tensor = torch.tensor(sample_input, dtype=torch.float32)

### Regression

In [32]:
# Make sure this matches your regression model (output_dim = 1)
with torch.no_grad():
    output = model(input_tensor)
    predicted_price = output.item()

print(f"Predicted Sale Price: ${predicted_price:.2f}")

### Classification

In [34]:
with torch.no_grad():
    output = model(input_tensor)
    predicted_class = torch.argmax(output, dim=1).item()

# Optional: Map predicted_class to label
class_labels = {0: "Low", 1: "Medium", 2: "High"}
print(f"Predicted Price Category: {class_labels[predicted_class]}")

Predicted Price Category: High


## Manual Input

In [81]:
# Recreate the same model architecture as when training
class HousePriceModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(HousePriceModel, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, output_dim)
        )

    def forward(self, x):
        return self.fc(x)

In [82]:
input_size = X_test.shape[1]
output_size = 3  # Low, Medium, High
model = HousePriceModel(input_size, output_size).to(device)
model.load_state_dict(torch.load('house_model.pth'))
model.eval()

HousePriceModel(
  (fc): Sequential(
    (0): Linear(in_features=221, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=3, bias=True)
  )
)

In [128]:
# Example: Pick any already preprocessed row from X_test
manual_input = X_test[238].reshape(1, -1)  # Use any index (not based on y_test!)

In [129]:
# Convert to tensor
input_tensor = torch.tensor(manual_input).float().to(device)

In [130]:
# Make prediction
with torch.no_grad():
    output = model(input_tensor)
    predicted_class = torch.argmax(output, dim=1).item()

In [131]:
# Mapping label indices
class_labels = ['Low', 'Medium', 'High']
print("Predicted House Price Category:", class_labels[predicted_class])

Predicted House Price Category: High
