# Multi-Layer Perceptron Regressor

> for more information about MLP-regressor follow the [link](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html?highlight=multi%20layer%20perceptron%20regressor)

## Import libraries

In [None]:
# reload modules before executing user code
%load_ext autoreload
# reload all modules every time before executing Python code
%autoreload 2
# render plots in notebook
%matplotlib inline

In [None]:
# data wrangling
import pandas as pd
import numpy as np
from dslectures.core import *
from pathlib import Path

# data viz
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.image as mpimg

sns.set(color_codes=True)
sns.set_palette(sns.color_palette("muted"))

# ml magic
# scikit-learn
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score


from pandas.api.types import is_object_dtype, is_numeric_dtype

## Load the data

We also make use of the `pathlib` library to handle our filepaths:

In [None]:
datapath = Path('../data')
%ls {datapath}

[1m[36mprocessed[m[m/ [1m[36mraw[m[m/


In [None]:
train_data = pd.read_csv(datapath/'raw/train.csv'); train_data.head(); train_data.head().T

Unnamed: 0,0,1,2,3,4
artist_name,Netherfriends,Maxo Kream,Drebae,TWICE,Lisa Howard
track_id,7luDJV4DDZmjH3QDdCqpcO,1F09jtMfeYNkUj6piQjXwM,2tR2VS8rAndKb0C5hS3IpD,1U3cHXWaa0FqlUWLLBL7Kz,1vAP99gg1HzuUUITxJPdyn
track_name,Money Everyday,ATW,Trust in Me (feat. Allura),BRAND NEW GIRL,Cheeseburger In Paradise
acousticness,0.0722,0.155,0.239,0.016,0.281
danceability,0.79,0.677,0.661,0.674,0.607
duration_ms,186410,149967,231732,213956,228556
energy,0.384,0.747,0.625,0.965,0.771
instrumentalness,1.72e-06,0,0,9.59e-05,5.29e-06
key,7,6,6,7,2
liveness,0.1,0.239,0.0971,0.145,0.221


In [None]:
test_data = pd.read_csv(datapath/'raw/test.csv'); test_data.head(); test_data.head(1).T

Unnamed: 0,0
artist_name,Supriya Lohith
track_id,7rrY55kdGxRC65rfeeJkG0
track_name,Nade Nade
acousticness,0.485
danceability,0.764
duration_ms,192000
energy,0.605
instrumentalness,1.19e-05
key,2
liveness,0.0915


---

#### Exercise #1

Even though you may be told a dataset has been cleaned and prepared for training a model, you should always perform some sanity checks! 

* Check that `housing_data` is free of missing values
* Check that all columns are numerical

---

In [None]:
train_data.isnull().sum()

artist_name         0
track_id            0
track_name          0
acousticness        0
danceability        0
duration_ms         0
energy              0
instrumentalness    0
key                 0
liveness            0
loudness            0
mode                0
speechiness         0
tempo               0
time_signature      0
valence             0
popularity          0
collection_date     0
dtype: int64

In [None]:
train_data.dtypes

artist_name          object
track_id             object
track_name           object
acousticness        float64
danceability        float64
duration_ms           int64
energy              float64
instrumentalness    float64
key                   int64
liveness            float64
loudness            float64
mode                  int64
speechiness         float64
tempo               float64
time_signature        int64
valence             float64
popularity            int64
collection_date      object
dtype: object

In [None]:
def getting_rid_features(data):
    data.drop(columns=["artist_name", "track_id", "track_name","collection_date"],axis=1,inplace=True)
    return data

In [None]:
def convert_strings_to_categories(data):
    for col in data.columns:
        if is_object_dtype(data[col]):
            data[col] = data[col].astype('category')

In [None]:
train_data.to_csv(datapath/'processed/train_data_processed_schab9.csv', index=False)

## Select a performance measure

Before we can train any model, we need to think about which performance measure we wish to optimise for. For regression problems the Root Mean Square Error (RMSE) is often used as it measures the _**standard deviation**_ of the errors the algorithm makes in its predictions and gives a higher weight to large errors. For example, an RMSE equal to 50,000 means that about 68% of the algorithm's predictions fall within 50,000 CHF of the actual value, and about 95% fall within 100,000 CHF.

> Note: In general, lower values of RMSE indicate a better fit to the data.

Mathematically, the formula for RMSE is:

$$\mathrm{RMSE} = \sqrt{\frac{1}{m}\sum_{i=1}^m \left(\hat{y}_i - y_i\right)^2}$$

*One possibility to calculate the deviation*

where $m$ is the number of instances in the dataset you are measuring the RMSE on, $\hat{y}_i$ is the model's prediction for the $i^{th}$ instance, and $y_i$ is the actual label. Let's create a simple function that uses scitkit-learn's [mean_squared_error function](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) (which is just RMSE$^2$):

In [None]:
def rmse(y, yhat):
    """A utility function to calculate the Root Mean Square Error (RMSE).
    
    Args:
        y (array): Actual values for target.
        yhat (array): Predicted values for target.
        
    Returns:
        rmse (double): The RMSE.
    """
    return np.sqrt(mean_squared_error(y, yhat))

### The Multi-Layer Perceptron Regressor


In [None]:
from sklearn.neural_network import MLPRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.base import *

In [None]:
getting_rid_features(train_data)
train_data.head().T

Unnamed: 0,0,1,2,3,4
acousticness,0.0722,0.155,0.239,0.016,0.281
danceability,0.79,0.677,0.661,0.674,0.607
duration_ms,186410.0,149967.0,231732.0,213956.0,228556.0
energy,0.384,0.747,0.625,0.965,0.771
instrumentalness,2e-06,0.0,0.0,9.6e-05,5e-06
key,7.0,6.0,6.0,7.0,2.0
liveness,0.1,0.239,0.0971,0.145,0.221
loudness,-11.201,-5.68,-8.09,-3.754,-5.106
mode,1.0,0.0,0.0,1.0,1.0
speechiness,0.231,0.318,0.101,0.0696,0.112


### Arrange data into a feature matrix and target vector

In [None]:
train_data.head()

Unnamed: 0,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,popularity
0,0.0722,0.79,186410,0.384,2e-06,7,0.1,-11.201,1,0.231,156.077,4,0.203,2
1,0.155,0.677,149967,0.747,0.0,6,0.239,-5.68,0,0.318,153.921,4,0.371,41
2,0.239,0.661,231732,0.625,0.0,6,0.0971,-8.09,0,0.101,132.899,4,0.385,9
3,0.016,0.674,213956,0.965,9.6e-05,7,0.145,-3.754,1,0.0696,157.041,4,0.75,41
4,0.281,0.607,228556,0.771,5e-06,2,0.221,-5.106,1,0.112,138.087,4,0.496,19


In [None]:
train_data.describe()

Unnamed: 0,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,popularity
count,117293.0,117293.0,117293.0,117293.0,117293.0,117293.0,117293.0,117293.0,117293.0,117293.0,117293.0,117293.0,117293.0,117293.0
mean,0.342946,0.581336,212535.0,0.568661,0.224272,5.232247,0.194912,-9.987972,0.608101,0.112156,119.464675,3.878654,0.439534,24.152592
std,0.345778,0.19021,122677.4,0.260323,0.360387,3.602673,0.167941,6.549731,0.488176,0.12463,30.154314,0.514728,0.259093,19.664392
min,0.0,0.0,3203.0,0.0,0.0,0.0,0.0,-60.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0316,0.459,163814.0,0.395,0.0,2.0,0.0975,-11.923,0.0,0.0389,96.014,4.0,0.223,7.0
50%,0.204,0.605,201813.0,0.602,0.000149,5.0,0.124,-7.989,1.0,0.0559,120.025,4.0,0.42,21.0
75%,0.638,0.727,241013.0,0.775,0.444,8.0,0.236,-5.69,1.0,0.129,139.646,4.0,0.638,38.0
max,0.996,0.996,5610020.0,1.0,1.0,11.0,0.999,1.806,1.0,0.966,249.983,5.0,1.0,100.0


In [None]:
train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 117293 entries, 0 to 117292
Data columns (total 14 columns):
acousticness        117293 non-null float64
danceability        117293 non-null float64
duration_ms         117293 non-null int64
energy              117293 non-null float64
instrumentalness    117293 non-null float64
key                 117293 non-null int64
liveness            117293 non-null float64
loudness            117293 non-null float64
mode                117293 non-null int64
speechiness         117293 non-null float64
tempo               117293 non-null float64
time_signature      117293 non-null int64
valence             117293 non-null float64
popularity          117293 non-null int64
dtypes: float64(9), int64(5)
memory usage: 12.5 MB


In [None]:
y = train_data['popularity']
X = train_data.drop(columns='popularity')


In [None]:
#X, y = make_regression(n_samples=10000, random_state=1)#n_samples default = 100
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=1)
print(f' We have {len(X_train)} train rows + {len(X_valid)} valid rows')
model = MLPRegressor(random_state=1, max_iter=500)
model.fit(X_train, y_train)
#y_pred_test = model.predict(test_data[:2])
#y_pred_test
model.predict(X_valid[:2])

 We have 93834 train rows + 23459 valid rows


array([ 8.41427276, 19.03573542])

In [None]:
X.shape

(117293, 13)

model.get_params returns a dictonary of all parameters for this estimators

In [None]:
model.get_params(deep=True)

{'activation': 'relu',
 'alpha': 0.0001,
 'batch_size': 'auto',
 'beta_1': 0.9,
 'beta_2': 0.999,
 'early_stopping': False,
 'epsilon': 1e-08,
 'hidden_layer_sizes': (100,),
 'learning_rate': 'constant',
 'learning_rate_init': 0.001,
 'max_fun': 15000,
 'max_iter': 500,
 'momentum': 0.9,
 'n_iter_no_change': 10,
 'nesterovs_momentum': True,
 'power_t': 0.5,
 'random_state': 1,
 'shuffle': True,
 'solver': 'adam',
 'tol': 0.0001,
 'validation_fraction': 0.1,
 'verbose': False,
 'warm_start': False}

### Evaluate the predictions  

model.score() returns the coefficient of determination $R^2$ of the prediction.  

The coefficient $R^2$ is defined as $(1-(u/v))$

$u$ is the residual sum of squares $\sum(y_{true} - y_{pred})^2$  

ans $v$ is the total sum of squares $\sum(y_{true} - \bar{y}_{true})^2$  

the best possible score is 1.0 for $R^2$

In [None]:
model.score(X_valid, y_valid)

-0.299125312513981

In [None]:
train_data.shape

(117293, 14)

In [None]:
data_merged = train_data.append(test_data)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


In [None]:
data_merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 130326 entries, 0 to 13032
Data columns (total 18 columns):
acousticness        130326 non-null float64
artist_name         13033 non-null object
collection_date     13033 non-null object
danceability        130326 non-null float64
duration_ms         130326 non-null int64
energy              130326 non-null float64
instrumentalness    130326 non-null float64
key                 130326 non-null int64
liveness            130326 non-null float64
loudness            130326 non-null float64
mode                130326 non-null int64
popularity          117293 non-null float64
speechiness         130326 non-null float64
tempo               130326 non-null float64
time_signature      130326 non-null int64
track_id            13033 non-null object
track_name          13033 non-null object
valence             130326 non-null float64
dtypes: float64(10), int64(4), object(4)
memory usage: 18.9+ MB


In [None]:
data_merged.shape

(130326, 18)

In [None]:
y = data_merged['popularity']
X = data_merged.drop(columns='popularity')

In [None]:
#X, y = make_regression(n_samples=1000, random_state=1)#n_samples default = 100
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=1)
print(f' We have {len(X_train)} train rows + {len(X_valid)} valid rows')
model = MLPRegressor(random_state=1, max_iter=500)
model.fit(X_train, y_train)
#y_pred_test = model.predict(test_data[:2])
#y_pred_test
model.predict(X[:2])

 We have 104260 train rows + 26066 valid rows


ValueError: could not convert string to float: "The O'Neill Brothers Group"

In [None]:
data_merged[117294:130326]

Unnamed: 0,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,popularity
1,0.99500,0.234,131000,0.0637,0.91800,7,0.2120,-26.841,0,0.0433,70.251,4,0.2010,
2,0.78200,0.591,192697,0.5580,0.00000,5,0.6500,-6.492,0,0.1200,79.961,4,0.3970,
3,0.07290,0.660,215381,0.6480,0.00000,2,0.1920,-9.691,1,0.0480,129.045,4,0.8360,
4,0.98600,0.472,494418,0.1140,0.53400,3,0.1030,-20.791,1,0.0613,69.240,3,0.0345,
5,0.00155,0.831,239518,0.5870,0.01810,1,0.0855,-7.439,1,0.1150,125.991,4,0.7860,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13028,0.05220,0.766,144000,0.5760,0.00000,7,0.1130,-9.302,1,0.0719,149.955,4,0.6860,
13029,0.99400,0.432,147347,0.1440,0.89300,10,0.1040,-25.868,1,0.0389,106.833,4,0.8860,
13030,0.92600,0.304,102582,0.1850,0.92300,3,0.1570,-31.639,0,0.0467,172.967,4,0.8850,
13031,0.02160,0.632,200250,0.4980,0.00000,10,0.1130,-6.136,0,0.0377,160.224,4,0.5320,


In [None]:
y_pred_test = model.predict(test_data[:2])

ValueError: X has 13 features, but MLPRegressor is expecting 100 features as input.

In [None]:
submission = pd.read_csv(datapath/'submitted/sample_submission.csv')
submission.shape

In [None]:
submission['popularity'] = y_pred_test
submission.to_csv(datapath/'submitted/first_submission_schab9.csv', index=False)