## Multilayer perceptron

Harjoitusdatan lähde: https://www.openml.org/search?type=data&sort=runs&status=active&id=6

P. W. Frey and D. J. Slate. "Letter Recognition Using Holland-style Adaptive Classifiers". Machine Learning 6(2), 1991

Tässä harjoitustyössä pyritään ennustamaan aakkosia niihin liittyvien mustavalkoarvojen mukaan käyttäen neuroverkkomenetelmää. 

In [2]:
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_openml
from sklearn.compose import make_column_selector as selector
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier

In [3]:
X, y = fetch_openml("letter", version=1, as_frame=True, return_X_y=True)

In [4]:
X

Unnamed: 0,x-box,y-box,width,high,onpix,x-bar,y-bar,x2bar,y2bar,xybar,x2ybr,xy2br,x-ege,xegvy,y-ege,yegvx
0,2,4,4,3,2,7,8,2,9,11,7,7,1,8,5,6
1,4,7,5,5,5,5,9,6,4,8,7,9,2,9,7,10
2,7,10,8,7,4,8,8,5,10,11,2,8,2,5,5,10
3,4,9,5,7,4,7,7,13,1,7,6,8,3,8,0,8
4,6,7,8,5,4,7,6,3,7,10,7,9,3,8,3,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,5,10,5,8,3,4,10,7,8,12,10,9,2,9,2,6
19996,4,7,6,5,3,7,8,2,10,12,6,8,1,9,6,8
19997,4,8,4,6,4,7,8,7,4,10,7,6,3,9,3,7
19998,4,11,4,8,3,0,2,4,6,1,0,7,0,8,0,8


Muodostetaan datan käsittelyä varten pipelinet, ja columntransformer, jotka yhdessä muuntavat datan 0 keskiarvoisiksi, keskihajonnalla 1. Pipeline onehotencodaisi kategoriset muuttujat jos niitä olisi, mutta tässä datassa koko data olikin numeerista. Se on osa pipelineä harjoituksen vuoksi.

In [5]:
numerical_transformer = Pipeline(
    steps = [("imputer", SimpleImputer(strategy="median")),
            ("scaler", StandardScaler())]
)

categorical_transformer = Pipeline(
    steps = [("imputer", SimpleImputer(strategy="constant", fill_value = "not_available")),
           ("ohe", OneHotEncoder(handle_unknown = "ignore"))]
)

preprocessor = ColumnTransformer(
    transformers = [("numerical", numerical_transformer, selector(dtype_exclude="category")),
                    ("categorical", categorical_transformer, selector(dtype_include="category"))],
    n_jobs=-1
)

In [6]:
mlp = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", MLPClassifier(max_iter=400, hidden_layer_sizes = [200,200]))]
)

Seuraavaksi fitataan malli, ja ristiinvalidoidaan sen osumatarkkuus.

In [7]:
mlp.fit(X,y)

In [8]:
np.mean(cross_val_score(mlp, X,y))

0.9638