Будем практиковаться на датасете недвижимости (sklearn.datasets.fetch_california_housing)

Ваша задача:
1. Создать Dataset для загрузки данных
2. Обернуть его в Dataloader
3. Написать архитектуру сети, которая предсказывает стоимость недвижимости. Сеть должна включать BatchNorm слои и Dropout (или НЕ включать, но нужно обосновать)
4. Сравните сходимость Adam, RMSProp и SGD, сделайте вывод по качеству работы модели

train-test разделение нужно сделать с помощью sklearn random_state=13, test_size = 0.25

In [1]:
import math
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

from PIL import Image
from torchvision import transforms, datasets
from sklearn.metrics import r2_score

import torch.nn.functional as F
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.optim import Adam, SGD, RMSprop

import warnings
warnings.filterwarnings('ignore')

In [2]:
clf = fetch_california_housing(as_frame=True)
clf.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])

In [3]:
print(clf.DESCR)

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

    :Number of Instances: 20640

    :Number of Attributes: 8 numeric, predictive attributes and the target

    :Attribute Information:
        - MedInc        median income in block group
        - HouseAge      median house age in block group
        - AveRooms      average number of rooms per household
        - AveBedrms     average number of bedrooms per household
        - Population    block group population
        - AveOccup      average number of household members
        - Latitude      block group latitude
        - Longitude     block group longitude

    :Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived

In [4]:
data = clf.frame
data.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [5]:
X = data.drop(columns=['MedHouseVal'])
y = data['MedHouseVal']
X.shape, y.shape

((20640, 8), (20640,))

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((15480, 8), (5160, 8), (15480,), (5160,))

In [7]:
scaler = MinMaxScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

X_train = torch.FloatTensor(X_train).float()
y_train = torch.FloatTensor(y_train.values).float()

X_test = torch.FloatTensor(X_test).float()
y_test = torch.FloatTensor(y_test.values).float()

1. Создать Dataset для загрузки данных

In [8]:
class Data(torch.utils.data.Dataset):
   
    def __init__(self, data, target):
        
        self.X = data
        self.y = target
        

    def __len__(self):
        
        return len(self.X)
    

    def __getitem__(self, idx):
               
        return [self.X[idx], self.y[idx]]

2. Обернуть его в Dataloader

In [9]:
train_data = Data(X_train, y_train)
train_loader = DataLoader(train_data,
                          batch_size=128,
                          shuffle=True,
                          drop_last=True,
                          num_workers=0
                         )

test_data = Data(X_test, y_test)
test_loader = DataLoader(test_data,
                         batch_size=128,
                         shuffle=True,
                         drop_last=True,
                         num_workers=0
                        )

In [10]:
print(f'data: {train_data[0][0]}\ntarget: {train_data[0][1]}')

data: tensor([0.1995, 0.6078, 0.0335, 0.0217, 0.1387, 0.0017, 0.6440, 0.3003])
target: 1.034000039100647


3. Написать архитектуру сети, которая предсказывает стоимость недвижимости. Сеть должна включать BatchNorm слои и Dropout (или НЕ включать, но нужно обосновать)

In [11]:
class Perceptron(nn.Module):
    
    def __init__(self, input_dim, output_dim, activation='relu'):
        
        super().__init__()
        self.fc = nn.Linear(input_dim, output_dim).cuda()
        assert activation in ['relu', 'sigmoid'], 'Activation func should be "relu" or "sigmoid"!'
        self.activation = activation
        
    def forward(self, x):
        
        x = self.fc(x).cuda()
        return eval(f'F.{self.activation}')(x)       

In [12]:
class FeedForward(nn.Module):
    
    def __init__(self, input_dim, hidden_dim):
        
        super(FeedForward, self).__init__()
        
        self.fc1 = Perceptron(input_dim, 4*hidden_dim).cuda()
        self.dp1 = nn.Dropout(0.4).cuda()
        self.bn1 = nn.BatchNorm1d(4*hidden_dim).cuda()

        self.fc2 = Perceptron(4*hidden_dim, 2*hidden_dim).cuda()
        self.dp2 = nn.Dropout(0.3).cuda()
        self.bn2 = nn.BatchNorm1d(2*hidden_dim).cuda()
        
        self.fc3 = Perceptron(2*hidden_dim, hidden_dim).cuda()
        self.dp3 = nn.Dropout(0.2).cuda()
        self.bn3 = nn.BatchNorm1d(hidden_dim).cuda()
        
        self.fc4 = Perceptron(hidden_dim, 1).cuda()
        
    def forward(self, x):
        
        x = self.fc1(x).cuda()
        x = self.bn1(x).cuda()
        x = self.dp1(x).cuda()
        
        x = self.fc2(x).cuda()
        x = self.bn2(x).cuda()
        x = self.dp2(x).cuda()
        
        x = self.fc3(x).cuda()
        x = self.bn3(x).cuda()
        x = self.dp3(x).cuda()
        
        x = self.fc4(x).cuda()
        
        return x

4. Сравните сходимость Adam, RMSProp и SGD, сделайте вывод по качеству работы модели

In [13]:
net = FeedForward(8, 1024)

optimizer = SGD(net.parameters(), lr=0.005)
criterion = nn.L1Loss()

In [14]:
epochs = 200

net.train()
metrics_train = []
metrics_test = []

for ep in range(epochs):
    running_loss, running_items = 0.0, 0.0
    for i, data in enumerate(train_loader):
        inputs, labels = data[0].cuda(), data[1].cuda()
        
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs.squeeze(), labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        running_items += len(labels)
        
        train_res = net(inputs)
        metrics_train.append(r2_score(labels.cpu().detach().numpy(), train_res.cpu().detach().numpy().reshape(-1)))
        
    if (ep + 1)%20 == 0:  
        net.eval()

        print(f'Epoch [{ep + 1}/{epochs}] | ' \
              f'Step [{i + 1}/{len(train_loader)}] | ' \
              f'Loss: {running_loss / running_items:.3f} | ' \
              f'Train R2: {sum(metrics_train) / len(metrics_train):.3f} | ', end='')

        running_loss, running_items = 0.0, 0.0
        metrics_train = []

        for i, data in enumerate(test_loader):
            test_res = net(data[0].cuda())
            metrics_test.append(r2_score(data[1].cpu().detach().numpy(), test_res.cpu().detach().numpy().reshape(-1)))
        print(f'Test R2: {sum(metrics_test) / len(metrics_test):.3f}')
        metrics_test = []
        net.train()
print('Training is finished!')

Epoch [20/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.310 | Test R2: 0.512
Epoch [40/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.575 | Test R2: 0.594
Epoch [60/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.655 | Test R2: 0.631
Epoch [80/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.671 | Test R2: 0.660
Epoch [100/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.676 | Test R2: 0.662
Epoch [120/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.679 | Test R2: 0.654
Epoch [140/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.683 | Test R2: 0.664
Epoch [160/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.684 | Test R2: 0.668
Epoch [180/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.686 | Test R2: 0.668
Epoch [200/200] | Step [120/120] | Loss: 0.003 | Train R2: 0.689 | Test R2: 0.678
Training is finished!


c опстимизатором SGD метрики сходятся к 200 эпохе.

In [15]:
net = FeedForward(8, 512)

optimizer = Adam(net.parameters(), lr=0.001)
criterion = nn.MSELoss()

In [16]:
epochs = 200

net.train()
metrics_train = []
metrics_test = []

for ep in range(epochs):
    running_loss, running_items = 0.0, 0.0
    for i, data in enumerate(train_loader):
        inputs, labels = data[0].cuda(), data[1].cuda()
        
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs.squeeze(), labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        running_items += len(labels)
        
        train_res = net(inputs)
        metrics_train.append(r2_score(labels.cpu().detach().numpy(), train_res.cpu().detach().numpy().reshape(-1)))
        
    if (ep + 1)%20 == 0:  
        net.eval()

        print(f'Epoch [{ep + 1}/{epochs}] | ' \
              f'Step [{i + 1}/{len(train_loader)}] | ' \
              f'Loss: {running_loss / running_items:.3f} | ' \
              f'Train R2: {sum(metrics_train) / len(metrics_train):.3f} | ', end='')

        running_loss, running_items = 0.0, 0.0
        metrics_train = []

        for i, data in enumerate(test_loader):
            test_res = net(data[0].cuda())
            metrics_test.append(r2_score(data[1].cpu().detach().numpy(), test_res.cpu().detach().numpy().reshape(-1)))
        print(f'Test R2: {sum(metrics_test) / len(metrics_test):.3f}')
        metrics_test = []
        net.train()
print('Training is finished!')

Epoch [20/200] | Step [120/120] | Loss: 0.003 | Train R2: 0.628 | Test R2: 0.707
Epoch [40/200] | Step [120/120] | Loss: 0.003 | Train R2: 0.730 | Test R2: 0.707
Epoch [60/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.755 | Test R2: 0.754
Epoch [80/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.765 | Test R2: 0.669
Epoch [100/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.769 | Test R2: 0.779
Epoch [120/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.784 | Test R2: -11754.423
Epoch [140/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.793 | Test R2: -67.210
Epoch [160/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.797 | Test R2: -17.778
Epoch [180/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.805 | Test R2: 0.622
Epoch [200/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.806 | Test R2: -16374.604
Training is finished!


С оптимизатором Adam метрики показывают какую-то странную картину на тесте. Единственное вменяемое значение на 160 эпохе. Почему так получилось?

In [17]:
net = FeedForward(8, 1024)

optimizer = RMSprop(net.parameters(), lr=0.005)
criterion = nn.MSELoss()

In [18]:
epochs = 200

net.train()
metrics_train = []

for ep in range(epochs):
    running_loss, running_items = 0.0, 0.0
    for i, data in enumerate(train_loader):
        inputs, labels = data[0].cuda(), data[1].cuda()
        
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs.squeeze(), labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        running_items += len(labels)
        
        train_res = net(inputs)
        metrics_train.append(r2_score(labels.cpu().detach().numpy(), train_res.cpu().detach().numpy().reshape(-1)))
        
    if (ep + 1)%20 == 0:  
        net.eval()

        print(f'Epoch [{ep + 1}/{epochs}] | ' \
              f'Step [{i + 1}/{len(train_loader)}] | ' \
              f'Loss: {running_loss / running_items:.3f} | ' \
              f'Train R2: {sum(metrics_train) / len(metrics_train):.3f} | ', end='')

        running_loss, running_items = 0.0, 0.0
        metrics_train = []

        metrics_test = []
        for i, data in enumerate(test_loader):
            test_res = net(data[0].cuda())
            metrics_test.append(r2_score(data[1].cpu().detach().numpy(), test_res.cpu().detach().numpy().reshape(-1)))
        print(f'Test R2: {sum(metrics_test) / len(metrics_test):.3f}')
        net.train()
print('Training is finished!')

Epoch [20/200] | Step [120/120] | Loss: 0.004 | Train R2: 0.421 | Test R2: -43624.476
Epoch [40/200] | Step [120/120] | Loss: 0.003 | Train R2: 0.663 | Test R2: -11.708
Epoch [60/200] | Step [120/120] | Loss: 0.003 | Train R2: 0.754 | Test R2: 0.719
Epoch [80/200] | Step [120/120] | Loss: 0.003 | Train R2: 0.766 | Test R2: -1433446.335
Epoch [100/200] | Step [120/120] | Loss: 0.003 | Train R2: 0.773 | Test R2: -3370933.260
Epoch [120/200] | Step [120/120] | Loss: 0.003 | Train R2: 0.776 | Test R2: 0.709
Epoch [140/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.779 | Test R2: -61011456846.788
Epoch [160/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.795 | Test R2: -20780077788.041
Epoch [180/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.802 | Test R2: -103195589310.460
Epoch [200/200] | Step [120/120] | Loss: 0.002 | Train R2: 0.808 | Test R2: -624176.109
Training is finished!


Периодически сеть показывает очень странные предсказания. Как и в случае с Адамом.