# Lab-6 : Self-Practice

In this week self-practice, you will implement a neural network model for a regression problem. You will use the [*admission*](./Admission_Predict.csv) dataset attached, used in the previous lab



## 1. Load the dataset and do all the necessary preprocessing

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('Admission_Predict.csv')

df = df.drop(['Serial No.'], axis=1)

df.describe()

Unnamed: 0,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
count,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0
mean,316.8075,107.41,3.0875,3.4,3.4525,8.598925,0.5475,0.72435
std,11.473646,6.069514,1.143728,1.006869,0.898478,0.596317,0.498362,0.142609
min,290.0,92.0,1.0,1.0,1.0,6.8,0.0,0.34
25%,308.0,103.0,2.0,2.5,3.0,8.17,0.0,0.64
50%,317.0,107.0,3.0,3.5,3.5,8.61,1.0,0.73
75%,325.0,112.0,4.0,4.0,4.0,9.0625,1.0,0.83
max,340.0,120.0,5.0,5.0,5.0,9.92,1.0,0.97


In [6]:
X = df.iloc[:, :-1].values
y  = df.iloc[:, -1].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## 2. Create custom pytorch `Dataset`

You should create a class `CustomDataset` that inherits  the abstract class `torch.utils.data.Dataset` from pytorch. 

> **Note** You should overwrite `__getitem__`, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite `__len__`, which is expected to return the size of the dataset by many `~torch.utils.data.Sampler` implementations and the default options of `~torch.utils.data.DataLoader`.

#### Split your dataset into train and test data loaders
You can create a `CustomDataset` instance with the entire dataframe and use [`random_split`](https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split) to split it into training and testing datasets. And then, create test and train dataloader. Or you can split using `train_test_split` from sklearn and past the splitted sets to your Custom dataset class. 

Create train and test dataloader with `batch_size = 32` each complete the following function

In [7]:
import torch
from torch.utils.data import Dataset, DataLoader, random_split

class CustumData(Dataset):
    def __init__(self, X, y):
        super().__init__()
        #initaliaze the variable
        self.X = torch.tensor(X).float()
        self.y = torch.tensor(y).float()

    def __len__(self):
        # return the len of the dataset
        return len(self.X)
    
    def __getitem__(self, idx):
        # return a tuple samples and labels with the corresponding index idx
        return self.X[idx, :], self.y[idx]

In [9]:
# Create the datasets
train_dataset = CustumData(X_train, y_train)
test_dataset = CustumData(X_test, y_test) 

# Create the dataloaders
train_dataloader = DataLoader(train_dataset, batch_size = 32, shuffle = True)
test_dataloader = DataLoader(test_dataset, batch_size = 32)

In [10]:
data, label = next(iter(train_dataloader))
label.shape

torch.Size([32])

## 3. Create the model

Using `nn`, Create a neural network with 1 hidden layers of size 100, each must be followed by a `leaky_relu` activation function and define the forward function

In [11]:
import torch.nn as nn
import torch.nn.functional as F

# complete the code
class Net(nn.Module):
    def __init__(self, n_hidden_unit = 100):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(7, n_hidden_unit)
        self.fc2 = nn.Linear(n_hidden_unit, 1)

    def forward(self, x):
        x = F.leaky_relu(self.fc1(x))
        x = F.leaky_relu(self.fc2(x))
        return x

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model = Net(n_hidden_unit = 100).to(device)
#model = nn.Sequential(nn.Linear(7, 1)).to(device)

# 4. Training loop

Define the appropriate loss function and the training loop for the training and the testing dataloader (as done in the lab). Use SGD optimizer with learning rate 0.01 and momentum 0.5

Print the final loss on the test data

In [130]:
epochs = 20
lr = 0.01
momentum = 0.5
log_interval = 2

criterion = nn.MSELoss(reduction='mean')
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [131]:
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data).squeeze()
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

In [132]:
def test(model, device, test_dataloader):
    model.eval()
    test_loss = 0
    with torch.no_grad():
        for data, target in test_dataloader:
            data, target = data.to(device), target.to(device)
            output = model(data).squeeze()
            test_loss += criterion(output, target).item()  # sum up batch loss
    
    test_loss /= len(test_dataloader.dataset)
    print(f'\nTest set: Average loss: {test_loss}\n')

In [133]:
for epoch in range(1, epochs + 1):
    train(model, device, train_dataloader, optimizer, epoch)
    test(model, device, test_dataloader)


Test set: Average loss: 0.00019697147654369475


Test set: Average loss: 0.0001965737814316526


Test set: Average loss: 0.00020355077576823534


Test set: Average loss: 0.00019874173012794926


Test set: Average loss: 0.00019545836839824916


Test set: Average loss: 0.00019848575320793315


Test set: Average loss: 0.00019582920940592886


Test set: Average loss: 0.0001927310193423182


Test set: Average loss: 0.00019821891037281604


Test set: Average loss: 0.00019936430617235602


Test set: Average loss: 0.0001977500505745411


Test set: Average loss: 0.00020020983938593418


Test set: Average loss: 0.00019390174420550466


Test set: Average loss: 0.00019871890835929663


Test set: Average loss: 0.00019451639091130346


Test set: Average loss: 0.00019887235685018824


Test set: Average loss: 0.0002030903473496437


Test set: Average loss: 0.0001960262467036955


Test set: Average loss: 0.0001961533824214712


Test set: Average loss: 0.00019913600553991273



In [134]:
y_pred_nn = model(torch.tensor(X_test).float()).detach().numpy()

## 5. Compare your Neural network model to a Linear Regression
Train a simple linear regression model on the training set and print MSE on the testing set (`X_test`). Also print the MSE on the test set using the your neural model. 

> Compare the results (which performs best) and justify why

In [135]:
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression

lr = LinearRegression()

lr.fit(X_train, y_train)

y_pred_lr = lr.predict(X_test)

mse_lr = mean_squared_error(y_test, y_pred_lr)
mse_nn = mean_squared_error(y_test, y_pred_nn)

print(f"Linear Regression MSE: {mse_lr}")
print(f"Neural Network MSE: {mse_nn}")

Linear Regression MSE: 0.004617003377285012
Neural Network MSE: 0.006021902921991413
