# Training a MLP

In this example, we are going to create, train and score a Regressor

Dataset Information:

The dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located on the field in a significantly polluted area, at road level,within an Italian city. Data were recorded from March 2004 to February 2005 (one year)representing the longest freely available recordings of on field deployed air quality chemical sensor devices responses. Ground Truth hourly averaged concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) and were provided by a co-located reference certified analyzer. Evidences of cross-sensitivities as well as both concept and sensor drifts are present as described in De Vito et al., Sens. And Act. B, Vol. 129,2,2008 (citation required) eventually affecting sensors concentration estimation capabilities. Missing values are tagged with -200 value. 


In [3]:
# import some libraries
import pandas as pd
import numpy as np

Import dataset, separate attribute columns from output

In [11]:
# Load data. The original dataset includes names in first row
dataset = pd.read_csv("AirQuality_clean2.csv")

# Separate the class from the attributes
target = pd.DataFrame(dataset, columns= ['CO'])

print (target)

attributes= dataset.loc[ : , dataset.columns != 'CO']
print(attributes)

       CO
0     2.9
1     4.8
2     6.9
3     6.1
4     3.9
...   ...
7389  2.7
7390  2.5
7391  1.5
7392  1.6
7393  1.2

[7394 rows x 1 columns]
      PT08.S1  PT08.S2  PT08.S3  PT08.S4  PT08.S5     T    RH      AH
0        1383     1020     1008     1719     1104   9.8  67.6  0.8185
1        1581     1319      799     2083     1409  10.3  64.2  0.8065
2        1776     1488      702     2333     1704   9.7  69.3  0.8319
3        1640     1404      743     2191     1654   9.6  67.8  0.8133
4        1313     1076      957     1707     1285   9.1  64.0  0.7419
...       ...      ...      ...      ...      ...   ...   ...     ...
7389     1248     1018      599     1289     1167  19.9  33.0  0.7608
7390     1180      894      636     1200     1372  17.5  40.7  0.8073
7391     1102      812      693     1178     1042  16.4  46.6  0.8642
7392     1116      803      696     1173     1055  15.5  49.0  0.8579
7393     1100      769      722     1147     1049  14.3  52.5  0.8497

[7394 rows x 8

Split dataset

In [12]:
#Splitting the dataset into  training and validation sets
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test= train_test_split(
    attributes,target,test_size=0.33, random_state=50)

#print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Train an MLP

In [16]:
# Train one model with raw data to establish a reference.
# In this case, we will train a MLP
from sklearn.neural_network import MLPRegressor

#Initializing the MLPRegressor
regr = MLPRegressor(hidden_layer_sizes=(100,50,10), max_iter=1000,
                           activation = 'relu',solver='adam',random_state=1)

#Fitting the training data to the network
regr.fit(X_train, y_train.values.ravel())

#Predicting y for X_val
y_pred = regr.predict(X_test)

Compute accuracy

In [17]:
regr.score(X_test, y_test)

0.6901872795820116