# Ejercicio 6

Los archivos Segment_Train.csv y Segment_Test.csv contienen información referida a regiones de 3x3 píxeles pertenecientes a 7 imágenes distintas. Cada una corresponde a uno de los siguientes tipos de superficie: ladrillo, cielo, follaje, cemento, ventana, camino y pasto.  

Cada región de 3x3 ha sido caracterizada por 19 atributos numéricos: centroides, densidades, intensidades, colores, etc.  

El atributo 20 corresponde al número de imagen de la cual fue extraída la región (1: ladrillo, 2: cemento, 3: follaje, 4: pasto, 5: camino, 6: cielo, 7: ventana).  

Entrene una red neuronal multiperceptrón para que dada una región de 3x3, representada a través de los 19 atributos indicados, sea capaz de identificar a cuál de las 7 imágenes corresponde.  

Utilice los ejemplos del archivo Segment_Train.csv para entrenar y los del archivo Segment_Test.csv para testear.  
Realice al menos 10 ejecuciones independientes de la configuración seleccionada para respaldar sus afirmaciones referidas a la performance del modelo.  

Fuente: https://archive.ics.uci.edu/ml/datasets/Image+Segmentation


In [9]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import preprocessing

DATA_PATH="./Data/"

data = pd.read_csv(DATA_PATH + 'segment_train.csv', header=0, names=[
    "label",
    "REGION-CENTROID-COL", "REGION-CENTROID-ROW", "REGION-PIXEL-COUNT",
    "SHORT-LINE-DENSITY-5", "SHORT-LINE-DENSITY-2",
    "VEDGE-MEAN", "VEDGE-SD", "HEDGE-MEAN", "HEDGE-SD",
    "INTENSITY-MEAN", "RAWRED-MEAN", "RAWBLUE-MEAN", "RAWGREEN-MEAN",
    "EXRED-MEAN", "EXBLUE-MEAN", "EXGREEN-MEAN",
    "VALUE-MEAN", "SATURATION-MEAN", "HUE-MEAN"
])
X_train = data.drop('label', axis=1)
T_train = data['label']

binarizer = preprocessing.LabelBinarizer()
T_train = binarizer.fit_transform(T_train)

X_train=np.array(X_train)
scaler = preprocessing.StandardScaler()
X_train = scaler.fit_transform(X_train)

data = pd.read_csv(DATA_PATH + 'segment_test.csv', header=0, names=[
    "label",
    "REGION-CENTROID-COL", "REGION-CENTROID-ROW", "REGION-PIXEL-COUNT",
    "SHORT-LINE-DENSITY-5", "SHORT-LINE-DENSITY-2",
    "VEDGE-MEAN", "VEDGE-SD", "HEDGE-MEAN", "HEDGE-SD",
    "INTENSITY-MEAN", "RAWRED-MEAN", "RAWBLUE-MEAN", "RAWGREEN-MEAN",
    "EXRED-MEAN", "EXBLUE-MEAN", "EXGREEN-MEAN",
    "VALUE-MEAN", "SATURATION-MEAN", "HUE-MEAN"
])
X_test = data.drop('label', axis=1)
T_test = data['label']

binarizer = preprocessing.LabelBinarizer()
T_test = binarizer.fit_transform(T_test)

X_test=np.array(X_test)
scaler = preprocessing.StandardScaler()
X_test = scaler.fit_transform(X_test)



In [10]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

train_scores = []
test_scores = []
train_effectiveness = []
test_effectiveness = []

for i in range(10):
  clf = MLPClassifier(
    solver='adam',
    learning_rate_init=0.05,
    hidden_layer_sizes=(20,20),
    random_state=i,
    max_iter=2500,
    batch_size=200,
    tol=1.0e-05,
    n_iter_no_change=30,
    early_stopping=True,
    validation_fraction=0.2,
    activation='tanh'
  )
  clf.fit(X_train, T_train)

  # Entrenamiento
  Y_pred_train = clf.predict(X_train)
  score_train = clf.score(X_train, T_train)
  Y_it_train = binarizer.inverse_transform(T_train)
  Y_pred_it_train = binarizer.inverse_transform(Y_pred_train)
  efectividad_train = 100 * (Y_pred_it_train == Y_it_train).sum() / len(Y_it_train)
  train_scores.append(score_train)
  train_effectiveness.append(efectividad_train)

  # Test
  Y_pred_test = clf.predict(X_test)
  score_test = clf.score(X_test, T_test)
  Y_it_test = binarizer.inverse_transform(T_test)
  Y_pred_it_test = binarizer.inverse_transform(Y_pred_test)
  efectividad_test = 100 * (Y_pred_it_test == Y_it_test).sum() / len(Y_it_test)
  test_scores.append(score_test)
  test_effectiveness.append(efectividad_test)

print("Entrenamiento:")
print("Efectividad media: %6.2f%% ± %6.2f%%" % (np.mean(train_effectiveness), np.std(train_effectiveness)))
print("Score medio: %6.2f ± %6.2f" % (np.mean(train_scores), np.std(train_scores)))
print("\nTest:")
print("Efectividad media: %6.2f%% ± %6.2f%%" % (np.mean(test_effectiveness), np.std(test_effectiveness)))
print("Score medio: %6.2f ± %6.2f" % (np.mean(test_scores), np.std(test_scores)))



Entrenamiento:
Efectividad media:  92.05% ±   3.03%
Score medio:   0.91 ±   0.03

Test:
Efectividad media:  85.50% ±   2.06%
Score medio:   0.83 ±   0.03
