# Abalone
Σε αυτό το notebook εφαρμόζω SVR στο dataset abalone από το UCI. Περισσότερες πληροφορίες: https://archive.ics.uci.edu/dataset/1/abalone

Τα abalones είναι γαστρέποδα μαλάκια και σε αυτό το dataset, δεδομένου κάποιων εξωτερικών χαρακτηριστών αυτών των οργανισμών, προσπαθούμε να προβλέψουμε την ηλικία τους, η οποία υπολογίζεται από τον αριθμό των δακτύλιων στο κέλυφός τους (rings).

Αρχικά, φορτώνω το σύνολο δεδομένων και ελέγχο τις τιμές του.

In [1]:
!pip install ucimlrepo

Collecting ucimlrepo
  Downloading ucimlrepo-0.0.7-py3-none-any.whl (8.0 kB)
Collecting certifi>=2020.12.5
  Downloading certifi-2024.8.30-py3-none-any.whl (167 kB)
[K     |████████████████████████████████| 167 kB 232 kB/s eta 0:00:01
Installing collected packages: certifi, ucimlrepo
  Attempting uninstall: certifi
    Found existing installation: certifi 2020.6.20
    Uninstalling certifi-2020.6.20:
      Successfully uninstalled certifi-2020.6.20
Successfully installed certifi-2024.8.30 ucimlrepo-0.0.7


In [20]:
from ucimlrepo import fetch_ucirepo

# fetch dataset
abalone = fetch_ucirepo(id=1)

# data (as pandas dataframes)
X = abalone.data.features
y = abalone.data.targets

# metadata
print(abalone.metadata)

# variable information
print(abalone.variables)

{'uci_id': 1, 'name': 'Abalone', 'repository_url': 'https://archive.ics.uci.edu/dataset/1/abalone', 'data_url': 'https://archive.ics.uci.edu/static/public/1/data.csv', 'abstract': 'Predict the age of abalone from physical measurements', 'area': 'Biology', 'tasks': ['Classification', 'Regression'], 'characteristics': ['Tabular'], 'num_instances': 4177, 'num_features': 8, 'feature_types': ['Categorical', 'Integer', 'Real'], 'demographics': [], 'target_col': ['Rings'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 1994, 'last_updated': 'Mon Aug 28 2023', 'dataset_doi': '10.24432/C55C7W', 'creators': ['Warwick Nash', 'Tracy Sellers', 'Simon Talbot', 'Andrew Cawthorn', 'Wes Ford'], 'intro_paper': None, 'additional_info': {'summary': 'Predicting the age of abalone from physical measurements.  The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- 

In [21]:
import numpy as np
import pandas as pd

X.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055


In [3]:
X.describe()

Unnamed: 0,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight
count,4177.0,4177.0,4177.0,4177.0,4177.0,4177.0,4177.0
mean,0.523992,0.407881,0.139516,0.828742,0.359367,0.180594,0.238831
std,0.120093,0.09924,0.041827,0.490389,0.221963,0.109614,0.139203
min,0.075,0.055,0.0,0.002,0.001,0.0005,0.0015
25%,0.45,0.35,0.115,0.4415,0.186,0.0935,0.13
50%,0.545,0.425,0.14,0.7995,0.336,0.171,0.234
75%,0.615,0.48,0.165,1.153,0.502,0.253,0.329
max,0.815,0.65,1.13,2.8255,1.488,0.76,1.005


In [4]:
y.describe()

Unnamed: 0,Rings
count,4177.0
mean,9.933684
std,3.224169
min,1.0
25%,8.0
50%,9.0
75%,11.0
max,29.0


Έχουμε 8 ανεξάρτητες μεταβλητές, 1 κατηγοριματική και 7 αριθμητικές. Οι αριθμητικές μεταβλητές παίρνουν γενικά μικρές τιμές και θεωρώ ότι δεν χρειάζεται να κάνω κάποιο normalization. Επίσης, φαίνεται ότι δεν έχουμε ελλειπείς τιμές στο dataset.

Η κατηγορηματική μεταβλητή παίρνει 3 τιμές. Για να μπορούμε να χρησιμοποιήσουμε καλύτερα μοντέλα στο dataset, εφαρμόζω one-hot-encoding και προσθέτω 3 καινούργιες δυαδικές μεταβλητές στο σύνολο δεδομένων:

In [22]:
from sklearn.preprocessing import OneHotEncoder

one_hot = OneHotEncoder(sparse_output=False)
one_hot_encoded = one_hot.fit_transform(X[['Sex']])

one_hot_encoded_df = pd.DataFrame(one_hot_encoded, columns=one_hot.get_feature_names_out(['Sex']))

X = pd.concat([X.drop('Sex', axis=1), one_hot_encoded_df], axis=1)

In [23]:
X

Unnamed: 0,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight,Sex_F,Sex_I,Sex_M
0,0.455,0.365,0.095,0.5140,0.2245,0.1010,0.1500,0.0,0.0,1.0
1,0.350,0.265,0.090,0.2255,0.0995,0.0485,0.0700,0.0,0.0,1.0
2,0.530,0.420,0.135,0.6770,0.2565,0.1415,0.2100,1.0,0.0,0.0
3,0.440,0.365,0.125,0.5160,0.2155,0.1140,0.1550,0.0,0.0,1.0
4,0.330,0.255,0.080,0.2050,0.0895,0.0395,0.0550,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...
4172,0.565,0.450,0.165,0.8870,0.3700,0.2390,0.2490,1.0,0.0,0.0
4173,0.590,0.440,0.135,0.9660,0.4390,0.2145,0.2605,0.0,0.0,1.0
4174,0.600,0.475,0.205,1.1760,0.5255,0.2875,0.3080,0.0,0.0,1.0
4175,0.625,0.485,0.150,1.0945,0.5310,0.2610,0.2960,1.0,0.0,0.0


Χωρίζω τα δεδομένα 60-40 σε train-test. Επίσης, για το hyperparameter tuning, το train set θα χωριστεί περεταίρω σε validation set (80-20 αναλογία).

In [24]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0) # 60-40 split

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=0) # 80-20 split

Η μετρική αξιολόγηση που χρησιμοποιώ είναι η MAE (Mean Absolute Error). Ως baseline, χρησιμοποιώ Linear Regression:

In [27]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

model = LinearRegression()
model.fit(X_train, y_train)

y_pred_val, y_pred_test = model.predict(X_val), model.predict(X_test)

print(f'LR MAE valid: {mean_absolute_error(y_val, y_pred_val)}')
print(f'LR MAE test: {mean_absolute_error(y_test, y_pred_test)}')

LR MAE valid: 1.6003532212342173
LR MAE test: 1.5801734637195477


In [7]:
# Convert these to 1D arrays
y_train, y_val, y_test = y_train.to_numpy().reshape((-1)), y_val.to_numpy().reshape((-1)), y_test.to_numpy().reshape((-1))

### Linear Kernel
Ξεκινάω με την εφαρμογή ενός linear kernel ψάχνοντας καλές τιμές για τα c και ε. Θα κάνω διαδοχικά grid searches, κάθε φορά μεγαλώνοντας το granularity των παραμέτρων με βάση τις καλύτερες τιμές από την προηγούμενη επανάληψη.

In [8]:
from sklearn.svm import SVR

In [10]:
for C in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
    for epsilon in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
        model = SVR(kernel='linear', C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=0.001, e=0.001: 2.1570572487813178
MAE C=0.001, e=0.01: 2.1565606907809434
MAE C=0.001, e=0.1: 2.159283173968401
MAE C=0.001, e=1: 2.1764408290795005
MAE C=0.001, e=10: 3.898008368791076
MAE C=0.001, e=100: 4.661354581673307
MAE C=0.001, e=1000: 4.661354581673307
MAE C=0.01, e=0.001: 1.9024013753490252
MAE C=0.01, e=0.01: 1.9020410602549198
MAE C=0.01, e=0.1: 1.8989658326845749
MAE C=0.01, e=1: 1.9038252912386644
MAE C=0.01, e=10: 3.9482111779505917
MAE C=0.01, e=100: 4.661354581673307
MAE C=0.01, e=1000: 4.661354581673307
MAE C=0.1, e=0.001: 1.7635140409450651
MAE C=0.1, e=0.01: 1.7634821350726453
MAE C=0.1, e=0.1: 1.7614088600254982
MAE C=0.1, e=1: 1.781788488915088
MAE C=0.1, e=10: 4.292204685647806
MAE C=0.1, e=100: 4.661354581673307
MAE C=0.1, e=1000: 4.661354581673307
MAE C=1, e=0.001: 1.5904348371226193
MAE C=1, e=0.01: 1.5897457929935355
MAE C=1, e=0.1: 1.5903344187523234
MAE C=1, e=1: 1.5946565660972096
MAE C=1, e=10: 4.7089000758255875
MAE C=1, e=100: 4.661354581673307


In [18]:
# 2nd iter
# MAE C=100, e=0.001: 1.5619476322090373
for C in [80, 90, 100, 110, 120]:
    for epsilon in [0.0008, 0.0009, 0.001, 0.0011, 0.0012]:
        model = SVR(kernel='linear', C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=80, e=0.0008: 1.5620523493866971
MAE C=80, e=0.0009: 1.562087740137242
MAE C=80, e=0.001: 1.5620770438263487
MAE C=80, e=0.0011: 1.562060813299506
MAE C=80, e=0.0012: 1.5620519423285324
MAE C=90, e=0.0008: 1.5619515935616686
MAE C=90, e=0.0009: 1.5619526652844797
MAE C=90, e=0.001: 1.561956919175167
MAE C=90, e=0.0011: 1.5619634009001886
MAE C=90, e=0.0012: 1.561965208196525
MAE C=100, e=0.0008: 1.5618941083041329
MAE C=100, e=0.0009: 1.5619820980868226
MAE C=100, e=0.001: 1.5619476322090373
MAE C=100, e=0.0011: 1.561878991010998
MAE C=100, e=0.0012: 1.5619597603072213
MAE C=110, e=0.0008: 1.561893861411909
MAE C=110, e=0.0009: 1.5619295324945353
MAE C=110, e=0.001: 1.5619029125017678
MAE C=110, e=0.0011: 1.5618394375433067
MAE C=110, e=0.0012: 1.5618252276073257
MAE C=120, e=0.0008: 1.5619372044916984
MAE C=120, e=0.0009: 1.561938816586101
MAE C=120, e=0.001: 1.561890267853295
MAE C=120, e=0.0011: 1.5619018982666288
MAE C=120, e=0.0012: 1.5618772597339803


Για το linear kernel, η καλύτερη τιμή που παίρνουμε είναι ΜΑΕ περίπου 1.5619 για παραμέτρους `C=100, e=0.0008`.

### Πολυωνυμικά kernel
Συνεχίζω με την χρήση πολυωνυμικών kernels βαθμόυ n=2,3,5,7,9 με παρόμοιο τρόπο όπως με τα linear kernels.

#### 2nd order

In [19]:
for C in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
    for epsilon in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
        model = SVR(kernel='poly', degree=2, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=0.001, e=0.001: 2.072919115055273
MAE C=0.001, e=0.01: 2.072724423525785
MAE C=0.001, e=0.1: 2.0731387230436744
MAE C=0.001, e=1: 2.0886804410149753
MAE C=0.001, e=10: 3.9035284951740725
MAE C=0.001, e=100: 4.661354581673307
MAE C=0.001, e=1000: 4.661354581673307
MAE C=0.01, e=0.001: 1.9047358976512963
MAE C=0.01, e=0.01: 1.904968380261799
MAE C=0.01, e=0.1: 1.9038609756519176
MAE C=0.01, e=1: 1.9160726152836065
MAE C=0.01, e=10: 4.003412441780577
MAE C=0.01, e=100: 4.661354581673307
MAE C=0.01, e=1000: 4.661354581673307
MAE C=0.1, e=0.001: 1.6862538286359952
MAE C=0.1, e=0.01: 1.6860245438095918
MAE C=0.1, e=0.1: 1.684976298719853
MAE C=0.1, e=1: 1.7012872477124308
MAE C=0.1, e=10: 4.539648709798297
MAE C=0.1, e=100: 4.661354581673307
MAE C=0.1, e=1000: 4.661354581673307
MAE C=1, e=0.001: 1.5928442712754132
MAE C=1, e=0.01: 1.5924352319134032
MAE C=1, e=0.1: 1.591977088374134
MAE C=1, e=1: 1.597569793202993
MAE C=1, e=10: 3.8741939057457815
MAE C=1, e=100: 4.661354581673307
MAE 

In [11]:
# MAE C=1000, e=0.001: 1.4665373503791301
for C in [800, 900, 1000, 1100, 1200]:
    for epsilon in [0.0008, 0.0009, 0.001, 0.0011, 0.0012]:
        model = SVR(kernel='poly', degree=2, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=800, e=0.0008: 1.465614980344231
MAE C=800, e=0.0009: 1.46559475159077
MAE C=800, e=0.001: 1.465601555088892
MAE C=800, e=0.0011: 1.4656141534653153
MAE C=800, e=0.0012: 1.4656414813601837
MAE C=900, e=0.0008: 1.466414828750074
MAE C=900, e=0.0009: 1.4664124757906176
MAE C=900, e=0.001: 1.466426457708526
MAE C=900, e=0.0011: 1.466431844247424
MAE C=900, e=0.0012: 1.4664398632355646
MAE C=1000, e=0.0008: 1.4664881866505406
MAE C=1000, e=0.0009: 1.4665305420667663
MAE C=1000, e=0.001: 1.4665373503791301
MAE C=1000, e=0.0011: 1.4665138265060167
MAE C=1000, e=0.0012: 1.4665177818792232
MAE C=1100, e=0.0008: 1.4666839032987529
MAE C=1100, e=0.0009: 1.4666615967094858
MAE C=1100, e=0.001: 1.4667390470247816
MAE C=1100, e=0.0011: 1.466734101666552
MAE C=1100, e=0.0012: 1.466686591849971
MAE C=1200, e=0.0008: 1.467019241649725
MAE C=1200, e=0.0009: 1.467055197854712
MAE C=1200, e=0.001: 1.466953696468887
MAE C=1200, e=0.0011: 1.4670091058297228
MAE C=1200, e=0.0012: 1.4670201037464234


Για poly kernel 2ου βαθμόυ καλύτερη απόδοση παίρνουμε για `C=900, e=0.0008` με ΜΑΕ περίπου 1.466.

#### 3rd order

In [22]:
for C in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
    for epsilon in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
        model = SVR(kernel='poly', degree=3, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=0.001, e=0.001: 2.044066047946673
MAE C=0.001, e=0.01: 2.0439815173701557
MAE C=0.001, e=0.1: 2.044975261691228
MAE C=0.001, e=1: 2.0474756871553694
MAE C=0.001, e=10: 3.9259201864306448
MAE C=0.001, e=100: 4.661354581673307
MAE C=0.001, e=1000: 4.661354581673307
MAE C=0.01, e=0.001: 1.8382525859409116
MAE C=0.01, e=0.01: 1.8379457903632734
MAE C=0.01, e=0.1: 1.8381554297144695
MAE C=0.01, e=1: 1.8611574458398201
MAE C=0.01, e=10: 4.12276804297121
MAE C=0.01, e=100: 4.661354581673307
MAE C=0.01, e=1000: 4.661354581673307
MAE C=0.1, e=0.001: 1.6835523608625387
MAE C=0.1, e=0.01: 1.683999021073484
MAE C=0.1, e=0.1: 1.6850869817506375
MAE C=0.1, e=1: 1.6992812277393066
MAE C=0.1, e=10: 4.416847397415194
MAE C=0.1, e=100: 4.661354581673307
MAE C=0.1, e=1000: 4.661354581673307
MAE C=1, e=0.001: 1.576443444406799
MAE C=1, e=0.01: 1.576371389069583
MAE C=1, e=0.1: 1.577621935724324
MAE C=1, e=1: 1.5827730827614077
MAE C=1, e=10: 3.748097163854953
MAE C=1, e=100: 4.661354581673307
MAE C=

In [12]:
# MAE C=100, e=1: 1.4635615582911914
for C in [80, 90, 100, 110, 120]:
    for epsilon in [0.8, 0.9, 1.0, 1.1, 1.2]:
        model = SVR(kernel='poly', degree=3, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=80, e=0.8: 1.4583771353515365
MAE C=80, e=0.9: 1.4589307920061736
MAE C=80, e=1.0: 1.4637752208730643
MAE C=80, e=1.1: 1.4686781083288756
MAE C=80, e=1.2: 1.4705496652033412
MAE C=90, e=0.8: 1.4566831155882947
MAE C=90, e=0.9: 1.4580851123150385
MAE C=90, e=1.0: 1.463882114484078
MAE C=90, e=1.1: 1.4684441292052175
MAE C=90, e=1.2: 1.470646812746906
MAE C=100, e=0.8: 1.4557067204429222
MAE C=100, e=0.9: 1.4584353454173737
MAE C=100, e=1.0: 1.4635615582911914
MAE C=100, e=1.1: 1.469328434737403
MAE C=100, e=1.2: 1.470266645626286
MAE C=110, e=0.8: 1.4544027813145606
MAE C=110, e=0.9: 1.4588079957211575
MAE C=110, e=1.0: 1.4638939985127837
MAE C=110, e=1.1: 1.4673565101482529
MAE C=110, e=1.2: 1.4703552087023652
MAE C=120, e=0.8: 1.4537272708958957
MAE C=120, e=0.9: 1.4596948100082126
MAE C=120, e=1.0: 1.4640240995155636
MAE C=120, e=1.1: 1.4661004680295575
MAE C=120, e=1.2: 1.4703360553178038


In [19]:
# MAE C=120, e=0.8: 1.4537272708958957
for C in [120, 130, 140, 150]:
    for epsilon in [0.4, 0.5, 0.6, 0.8]:
        model = SVR(kernel='poly', degree=3, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=120, e=0.4: 1.4572831387326242
MAE C=120, e=0.5: 1.455261360252594
MAE C=120, e=0.6: 1.4484495143775113
MAE C=120, e=0.8: 1.4537272708958957
MAE C=130, e=0.4: 1.4574008864971293
MAE C=130, e=0.5: 1.4554376609384745
MAE C=130, e=0.6: 1.4481969041962466
MAE C=130, e=0.8: 1.454067080264884
MAE C=140, e=0.4: 1.456874048420204
MAE C=140, e=0.5: 1.4552796029200876
MAE C=140, e=0.6: 1.4481612763052871
MAE C=140, e=0.8: 1.4548389031671662
MAE C=150, e=0.4: 1.4574792987645122
MAE C=150, e=0.5: 1.4553591533395158
MAE C=150, e=0.6: 1.4483500778123322
MAE C=150, e=0.8: 1.4550709579528283


Σε πολυωνυμικό kernel 3ου βαθμόυ έχουμε το μοντέλο `C=140, e=0.6` με ΜΑΕ 1.448.
#### 5ου βαθμού

In [10]:
for C in [0.001, 0.01, 0.1, 1, 10]: # avoid larger values, no better results and longer execution times
    for epsilon in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
        model = SVR(kernel='poly', degree=5, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=0.001, e=0.001: 1.9719052819950396
MAE C=0.001, e=0.01: 1.9718026252689334
MAE C=0.001, e=0.1: 1.9706609675050093
MAE C=0.001, e=1: 1.9714927797352952
MAE C=0.001, e=10: 4.075080425914788
MAE C=0.001, e=100: 4.661354581673307
MAE C=0.001, e=1000: 4.661354581673307
MAE C=0.01, e=0.001: 1.8185690432470059
MAE C=0.01, e=0.01: 1.8178590521634426
MAE C=0.01, e=0.1: 1.8233661397455516
MAE C=0.01, e=1: 1.8316512597003725
MAE C=0.01, e=10: 4.153077706604875
MAE C=0.01, e=100: 4.661354581673307
MAE C=0.01, e=1000: 4.661354581673307
MAE C=0.1, e=0.001: 1.6595721539354014
MAE C=0.1, e=0.01: 1.6581182629800177
MAE C=0.1, e=0.1: 1.653231316039142
MAE C=0.1, e=1: 1.6763821576132283
MAE C=0.1, e=10: 3.9126570692840654
MAE C=0.1, e=100: 4.661354581673307
MAE C=0.1, e=1000: 4.661354581673307
MAE C=1, e=0.001: 1.5458203597723326
MAE C=1, e=0.01: 1.545093936456148
MAE C=1, e=0.1: 1.5473383763510566
MAE C=1, e=1: 1.552791103487906
MAE C=1, e=10: 3.676000180365137
MAE C=1, e=100: 4.661354581673307
MA

In [21]:
# MAE C=10, e=0.01: 1.4775486700583578
for C in [8, 9, 10, 11, 12]:
    for epsilon in [0.008, 0.009, 0.01, 0.011, 0.012]:
        model = SVR(kernel='poly', degree=5, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=8, e=0.008: 1.477059165831463
MAE C=8, e=0.009: 1.4771977934872487
MAE C=8, e=0.01: 1.4772469000298685
MAE C=8, e=0.011: 1.4772205682178867
MAE C=8, e=0.012: 1.4772211266554225
MAE C=9, e=0.008: 1.4760918703459176
MAE C=9, e=0.009: 1.4759261444858751
MAE C=9, e=0.01: 1.4758706184814916
MAE C=9, e=0.011: 1.4757898296685465
MAE C=9, e=0.012: 1.4758707493260559
MAE C=10, e=0.008: 1.4776493438560114
MAE C=10, e=0.009: 1.4775345778843598
MAE C=10, e=0.01: 1.4775486700583578
MAE C=10, e=0.011: 1.4775557542635926
MAE C=10, e=0.012: 1.4775322907315138
MAE C=11, e=0.008: 1.4785495132624507
MAE C=11, e=0.009: 1.4783599650563273
MAE C=11, e=0.01: 1.478231006048066
MAE C=11, e=0.011: 1.4779528998700486
MAE C=11, e=0.012: 1.477781268440112
MAE C=12, e=0.008: 1.4786794631373743
MAE C=12, e=0.009: 1.4788076011215692
MAE C=12, e=0.01: 1.4790599948491099
MAE C=12, e=0.011: 1.4792426480413396
MAE C=12, e=0.012: 1.4793784405889612


Σε πολυωνυμικό kernel 5ου βαθμόυ έχουμε το μοντέλο `C=9, e=0.011` με ΜΑΕ 1.4758.

#### Μεγαλύτερου βαθμού

In [22]:
# degree 7
for C in [0.001, 0.01, 0.1, 1, 10]: # avoid larger values, no better results and longer execution times
    for epsilon in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
        model = SVR(kernel='poly', degree=7, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=0.001, e=0.001: 1.9537870025752304
MAE C=0.001, e=0.01: 1.95358211935082
MAE C=0.001, e=0.1: 1.953846788737504
MAE C=0.001, e=1: 1.9586181817813162
MAE C=0.001, e=10: 4.075718887204106
MAE C=0.001, e=100: 4.661354581673307
MAE C=0.001, e=1000: 4.661354581673307
MAE C=0.01, e=0.001: 1.769568030772976
MAE C=0.01, e=0.01: 1.7699711965952674
MAE C=0.01, e=0.1: 1.7694376850797555
MAE C=0.01, e=1: 1.780180610176134
MAE C=0.01, e=10: 4.155508006904249
MAE C=0.01, e=100: 4.661354581673307
MAE C=0.01, e=1000: 4.661354581673307
MAE C=0.1, e=0.001: 1.6265957403986115
MAE C=0.1, e=0.01: 1.626939878231873
MAE C=0.1, e=0.1: 1.625463234528994
MAE C=0.1, e=1: 1.643107156600543
MAE C=0.1, e=10: 3.835537652199051
MAE C=0.1, e=100: 4.661354581673307
MAE C=0.1, e=1000: 4.661354581673307
MAE C=1, e=0.001: 1.5787977037482122
MAE C=1, e=0.01: 1.5762599395417078
MAE C=1, e=0.1: 1.5750833715825623
MAE C=1, e=1: 1.5640779938764626
MAE C=1, e=10: 3.831940982646093
MAE C=1, e=100: 4.661354581673307
MAE C=1,

In [None]:
# degree 9
for C in [0.001, 0.01, 0.1, 1]: # avoid larger values, no better results and longer execution times
    for epsilon in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
        model = SVR(kernel='poly', degree=9, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=0.001, e=0.001: 1.891929550173022
MAE C=0.001, e=0.01: 1.8917666785308331
MAE C=0.001, e=0.1: 1.8880342701431556
MAE C=0.001, e=1: 1.9113881984663712
MAE C=0.001, e=10: 4.183779607139776
MAE C=0.001, e=100: 4.661354581673307
MAE C=0.001, e=1000: 4.661354581673307
MAE C=0.01, e=0.001: 1.8007091794464958
MAE C=0.01, e=0.01: 1.7996377997016664
MAE C=0.01, e=0.1: 1.7954665216086814
MAE C=0.01, e=1: 1.7870230072283935
MAE C=0.01, e=10: 4.026151043886346
MAE C=0.01, e=100: 4.661354581673307
MAE C=0.01, e=1000: 4.661354581673307
MAE C=0.1, e=0.001: 1.7448306451304907
MAE C=0.1, e=0.01: 1.7457691458508091
MAE C=0.1, e=0.1: 1.744884558941778
MAE C=0.1, e=1: 1.7398351396590293
MAE C=0.1, e=10: 4.033416758637063
MAE C=0.1, e=100: 4.661354581673307
MAE C=0.1, e=1000: 4.661354581673307
MAE C=1, e=0.001: 1.8177306961569868
MAE C=1, e=0.01: 1.8257106542879225


In [None]:
# degree 13
for C in [0.001, 0.01, 0.1]: # avoid larger values, no better results and longer execution times
    for epsilon in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
        model = SVR(kernel='poly', degree=13, C=C, epsilon=epsilon)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_val)

        print(f'MAE C={C}, e={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE C=0.001, e=0.001: 2.6633308199285453
MAE C=0.001, e=0.01: 2.659623402814195


Γενικά, παρατηρούμε ότι, για πολυωνυμικά kernel μεγάλου βαθμού, η απόδοση μειώνεται. Πριν προχωρήσω στα RBF, δοκιμάζω τα καλύτερα μοντέλα που πήρα από το validation loss στο test set.

In [10]:
def train_and_test(model):
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    
    return mean_absolute_error(y_test, y_pred)

# Linear
best_linear = SVR(kernel='linear', C=100, epsilon=0.0008)
print(f'Linear C=100, e=0.0008 MAE: {train_and_test(best_linear)}')

# 2nd order
best_poly2 = SVR(kernel='poly', degree=2, C=900, epsilon=0.0008)
print(f'Best SVR poly deg 2 MAE: {train_and_test(best_poly2)}')

# 3rd order
best_poly3 = SVR(kernel='poly', degree=3, C=140, epsilon=0.6)
print(f'Best SVR poly deg 3 MAE: {train_and_test(best_poly3)}')

# 5th order
best_poly5 = SVR(kernel='poly', degree=5, C=9, epsilon=0.011)
print(f'Best SVR poly deg 5 MAE: {train_and_test(best_poly5)}')

# 7th order
best_poly7 = SVR(kernel='poly', degree=7, C=1, epsilon=1)
print(f'Best SVR poly deg 7 MAE: {train_and_test(best_poly7)}')

# 9th order
best_poly9 = SVR(kernel='poly', degree=9, C=0.1, epsilon=0.001)
print(f'Best SVR poly deg 9 MAE: {train_and_test(best_poly9)}')


Linear C=100, e=0.0008 MAE: 1.5637257485418539
Best SVR poly deg 2 MAE: 1.4677292666212247
Best SVR poly deg 3 MAE: 1.4788224770012222
Best SVR poly deg 5 MAE: 1.5121951820347814
Best SVR poly deg 7 MAE: 1.5967220691316726
Best SVR poly deg 9 MAE: 1.8810544839368584


Παρατηρούμε ελαφρώς αυξημένες αποδόσεις σε σχέση με το validation loss, ωστόσο οι συγκριτικές αποδώσεις των μοντέλων είναι παρόμοια, με το πολυωνυμικό 2ου βαθμού να έχει το μικρότερο ΜΑΕ.

### RBF kernel
Ακολουθούμε παρόμοια στρατιγική fine tuning με τα polynomial kernel μοντέλα, μόνο που εδώ θα πρέπει να έχουμε 3D grid search προσθέτοντας το gamma.

In [12]:
for gamma in [100, 10, 1, 0.1, 0.01, 0.001]:
    for C in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
        for epsilon in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
            model = SVR(kernel='rbf', C=C, epsilon=epsilon, gamma=gamma)
            model.fit(X_train, y_train)

            y_pred = model.predict(X_val)

            print(f'MAE γ={gamma}, C={C}, ε={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE γ=100, C=0.001, ε=0.001: 2.37570492519549
MAE γ=100, C=0.001, ε=0.01: 2.375743322683693
MAE γ=100, C=0.001, ε=0.1: 2.3734411265025503
MAE γ=100, C=0.001, ε=1: 2.375365865215194
MAE γ=100, C=0.001, ε=10: 3.893136880364179
MAE γ=100, C=0.001, ε=100: 4.661354581673307
MAE γ=100, C=0.001, ε=1000: 4.661354581673307
MAE γ=100, C=0.01, ε=0.001: 2.2428638389360382
MAE γ=100, C=0.01, ε=0.01: 2.2430278730942534
MAE γ=100, C=0.01, ε=0.1: 2.243489003729362
MAE γ=100, C=0.01, ε=1: 2.2397087810915264
MAE γ=100, C=0.01, ε=10: 3.899515316655219
MAE γ=100, C=0.01, ε=100: 4.661354581673307
MAE γ=100, C=0.01, ε=1000: 4.661354581673307
MAE γ=100, C=0.1, ε=0.001: 1.8270351654806443
MAE γ=100, C=0.1, ε=0.01: 1.8265554926350587
MAE γ=100, C=0.1, ε=0.1: 1.8309014512073412
MAE γ=100, C=0.1, ε=1: 1.8762600322614167
MAE γ=100, C=0.1, ε=10: 3.9641998310000743
MAE γ=100, C=0.1, ε=100: 4.661354581673307
MAE γ=100, C=0.1, ε=1000: 4.661354581673307
MAE γ=100, C=1, ε=0.001: 1.5592681983514385
MAE γ=100, C=1, ε=0.0

MAE γ=0.1, C=1000, ε=1: 1.4858124233261447
MAE γ=0.1, C=1000, ε=10: 4.792332450809889
MAE γ=0.1, C=1000, ε=100: 4.661354581673307
MAE γ=0.1, C=1000, ε=1000: 4.661354581673307
MAE γ=0.01, C=0.001, ε=0.001: 2.385082190460316
MAE γ=0.01, C=0.001, ε=0.01: 2.3848655817307005
MAE γ=0.01, C=0.001, ε=0.1: 2.3834313187824936
MAE γ=0.01, C=0.001, ε=1: 2.3853671566578747
MAE γ=0.01, C=0.001, ε=10: 3.8925379485145046
MAE γ=0.01, C=0.001, ε=100: 4.661354581673307
MAE γ=0.01, C=0.001, ε=1000: 4.661354581673307
MAE γ=0.01, C=0.01, ε=0.001: 2.3368908625994687
MAE γ=0.01, C=0.01, ε=0.01: 2.336803515515761
MAE γ=0.01, C=0.01, ε=0.1: 2.3347115941994376
MAE γ=0.01, C=0.01, ε=1: 2.3397819272804554
MAE γ=0.01, C=0.01, ε=10: 3.8935069751848808
MAE γ=0.01, C=0.01, ε=100: 4.661354581673307
MAE γ=0.01, C=0.01, ε=1000: 4.661354581673307
MAE γ=0.01, C=0.1, ε=0.001: 2.056365836591094
MAE γ=0.01, C=0.1, ε=0.01: 2.055630373010765
MAE γ=0.01, C=0.1, ε=0.1: 2.0562219547507774
MAE γ=0.01, C=0.1, ε=1: 2.0844011525093435

Παίρνουμε πολύ καλές τιμές όταν έχουμε `gamma=10` και `C=10` και `ε < 0.1`. Θα δοκιμάσω τιμές κοντά σε αυτά τα νούμερα.

In [14]:
for gamma in [2, 5, 10, 20, 40, 60]:
    for C in [2, 5, 10, 20, 40, 60]:
        for epsilon in [0.01]:
            model = SVR(kernel='rbf', C=C, epsilon=epsilon, gamma=gamma)
            model.fit(X_train, y_train)

            y_pred = model.predict(X_val)

            print(f'MAE γ={gamma}, C={C}, ε={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE γ=2, C=2, ε=0.01: 1.5009682022214137
MAE γ=2, C=5, ε=0.01: 1.4715604946205183
MAE γ=2, C=10, ε=0.01: 1.463062003315973
MAE γ=2, C=20, ε=0.01: 1.4591666428694419
MAE γ=2, C=40, ε=0.01: 1.4562132755878048
MAE γ=2, C=60, ε=0.01: 1.456733097570816
MAE γ=5, C=2, ε=0.01: 1.4664641912055285
MAE γ=5, C=5, ε=0.01: 1.4453148250039114
MAE γ=5, C=10, ε=0.01: 1.4376234497057114
MAE γ=5, C=20, ε=0.01: 1.4361414243491317
MAE γ=5, C=40, ε=0.01: 1.4549109757662113
MAE γ=5, C=60, ε=0.01: 1.4653732923604998
MAE γ=10, C=2, ε=0.01: 1.450488288898706
MAE γ=10, C=5, ε=0.01: 1.4356736182185088
MAE γ=10, C=10, ε=0.01: 1.4366624584009677
MAE γ=10, C=20, ε=0.01: 1.4562381851023356
MAE γ=10, C=40, ε=0.01: 1.4763275480344609
MAE γ=10, C=60, ε=0.01: 1.4946660876978233
MAE γ=20, C=2, ε=0.01: 1.452744643440108
MAE γ=20, C=5, ε=0.01: 1.4520442461444456
MAE γ=20, C=10, ε=0.01: 1.4714417891086782
MAE γ=20, C=20, ε=0.01: 1.4978818338987703
MAE γ=20, C=40, ε=0.01: 1.542815120111768
MAE γ=20, C=60, ε=0.01: 1.5692446740

Τρέχω άλλη μια φορά να δω μήπως πιάσω καλύτερα αποτελέσματα:

In [15]:
for gamma in [5, 8, 10, 12, 15]:
    for C in [1, 3, 5, 8, 10]:
        for epsilon in [0.01]:
            model = SVR(kernel='rbf', C=C, epsilon=epsilon, gamma=gamma)
            model.fit(X_train, y_train)

            y_pred = model.predict(X_val)

            print(f'MAE γ={gamma}, C={C}, ε={epsilon}: {mean_absolute_error(y_val, y_pred)}')

MAE γ=5, C=1, ε=0.01: 1.5032342086496462
MAE γ=5, C=3, ε=0.01: 1.4554145649615295
MAE γ=5, C=5, ε=0.01: 1.4453148250039114
MAE γ=5, C=8, ε=0.01: 1.4411486973793608
MAE γ=5, C=10, ε=0.01: 1.4376234497057114
MAE γ=8, C=1, ε=0.01: 1.4861371274907065
MAE γ=8, C=3, ε=0.01: 1.441337328347913
MAE γ=8, C=5, ε=0.01: 1.4330185099298325
MAE γ=8, C=8, ε=0.01: 1.4336225499554778
MAE γ=8, C=10, ε=0.01: 1.4341127594102792
MAE γ=10, C=1, ε=0.01: 1.4771957371862992
MAE γ=10, C=3, ε=0.01: 1.440596196927387
MAE γ=10, C=5, ε=0.01: 1.4356736182185088
MAE γ=10, C=8, ε=0.01: 1.4357853421056652
MAE γ=10, C=10, ε=0.01: 1.4366624584009677
MAE γ=12, C=1, ε=0.01: 1.47634024119857
MAE γ=12, C=3, ε=0.01: 1.4412979069391312
MAE γ=12, C=5, ε=0.01: 1.4362805008787831
MAE γ=12, C=8, ε=0.01: 1.4407726324534516
MAE γ=12, C=10, ε=0.01: 1.4450737503426125
MAE γ=15, C=1, ε=0.01: 1.4756071844788787
MAE γ=15, C=3, ε=0.01: 1.4408495519025182
MAE γ=15, C=5, ε=0.01: 1.4423341914451537
MAE γ=15, C=8, ε=0.01: 1.4516309450002103
MA

Παρατηρούμε ότι έχουμε καλά αποτελέσματα σε τιμές του γάμμα κοντά στο 10. Συγκεκριμένα, το μοντέλο με `γ=8, C=5, ε=0.01` έκανε την καλύτερη απόδοση με ΜΑΕ περίπου 1.433. Δοκιμάζω το παραπάνω μοντέλο στα test data για να το συγκρίνω με τα πολυωνυμικά και το γραμμικό:

In [18]:
best_rbf = SVR(kernel='rbf', gamma=8, C=5, epsilon=0.01)
print(f'Best SVR RBF MAE: {train_and_test(best_rbf)}')

Best SVR RBF MAE: 1.4878350432291223


Παρατηρούμε ότι πιάνει ΜΑΕ ελαφρώς αυξηνόμενο από τα poly 2 και 3.