## Introducción al Machine Learning 
### (Ranking) Clasificación Ordinal 

Este ejemplo muestra como plantear el modelo de estimar el precio
de una vivienda en una tarea de ranking de 4 categorias ordinales

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 

El paquete *datasets* trae cojuntos de prueba y utilidades para descargar o generar datasets de entrenamiento 

In [None]:
from sklearn.datasets import load_boston

In [None]:
boston_data = load_boston()

In [None]:
boston_data.keys()

In [None]:
boston_df = pd.DataFrame(boston_data['data'], columns=boston_data['feature_names'])
boston_df['target'] = boston_data['target']

In [None]:
boston_df.head()

Utilizamos las características y la variable objetivo directamente del diccionario disponible en la variable boston_data. Entrenameros un **arbol de regresión**.

In [None]:
features = boston_df.drop(columns='target')
target = boston_df.target

In [None]:
plt.hist(target)

Hacemos una **discretización** para transformar el problema en multi-clase con categorias ordinales. El número en el texto nos es de ayuda para poder luego ordenar esas categorias

In [None]:
target = pd.cut(target, bins=4, labels=['1_pequeña','2_media','3_grande','4_lujo'])

In [None]:
target.value_counts()

In [None]:
target = target.astype(str)

In [None]:
target

___

### Clasificación Ordinal como un problema de multi-clase puro

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import precision_score, accuracy_score
from sklearn.metrics import confusion_matrix
from scipy import stats

In [None]:
train_x, test_x, train_y, test_y = train_test_split(features,
                                                    target,
                                                    train_size=0.6,
                                                    stratify=target,
                                                    random_state=11
                                                    )

In [None]:
train_x

In [None]:
boost = AdaBoostClassifier()

In [None]:
_ = boost.fit(train_x, train_y)

In [None]:
pred_y = boost.predict(test_x)

In [None]:
accuracy_score(test_y, pred_y)

In [None]:
boost.classes_

In [None]:
data_conf_matrix = confusion_matrix(test_y, pred_y, labels=boost.classes_).T
pd.DataFrame(data_conf_matrix, index=boost.classes_, columns=boost.classes_)

In [None]:
precision_score(test_y, pred_y, average='micro')

In [None]:
precision_score(test_y, pred_y, average='macro')

In [None]:
precision_score(test_y, pred_y, average=None)

In [None]:
tau, _ = stats.kendalltau(test_y, pred_y)
tau

____

### Clasificación ordinal con descomposición binaria 

Cada problema de clasificación binaria indica si la categoría es mayor que cierto ordinal

In [None]:
ordinal_train_y = pd.DataFrame({
    '1:>pequeña' : train_y > '1_pequeña',
    '2:>media'  : train_y > '2_media',
    '3:>grande'   : train_y > '3_grande',
})
ordinal_train_y

bucle para calcular todos los modelos y sus predicciones en train

In [None]:
ordinal_models = dict()
ordinal_preds = dict()
for iclass in ordinal_train_y.columns:
    i_target = ordinal_train_y[iclass]
    print("___")
    print(iclass)
    print(i_target)
    boost = AdaBoostClassifier(n_estimators=3)
    ordinal_models[iclass] = boost.fit(train_x, i_target)
    print(boost.classes_)
    i_label = np.where(boost.classes_ == True)[0][0]
    print(i_label)
    ordinal_preds[iclass] = boost.predict_proba(test_x)[:,i_label] 

In [None]:
pd.DataFrame(ordinal_preds)

Recomponemos las probabilidades de cada categoría.  
La predicción corresponde a la categoría con mayor probabilidad

In [None]:
prob_peq = 1 - ordinal_preds['1:>pequeña']
prob_media = ordinal_preds['1:>pequeña'] - ordinal_preds['2:>media']
prob_grande = ordinal_preds['2:>media'] - ordinal_preds['3:>grande']
prob_lujo = ordinal_preds['3:>grande']

cum_pred = pd.DataFrame({'1_pequeña': prob_peq,
              '2_media': prob_media,
              '3_grande': prob_grande,
              '4_lujo': prob_lujo}, index=test_y.index)
cum_pred

In [None]:
ordinal_pred_y = cum_pred.idxmax(axis=1)
ordinal_pred_y

In [None]:
pd.concat([test_y, ordinal_pred_y], axis=1)

In [None]:
labels = sorted(list(test_y.unique()))
labels

In [None]:
ordinal_pred_y.unique()

In [None]:
data_conf_matrix = confusion_matrix(test_y, ordinal_pred_y, labels=labels).T
pd.DataFrame(data_conf_matrix, index=labels, columns=labels)

In [None]:
precision_score(test_y, ordinal_pred_y, average='micro')

In [None]:
precision_score(test_y, ordinal_pred_y, average='macro')

In [None]:
precision_score(test_y, ordinal_pred_y, average=None)

In [None]:
tau, _ = stats.kendalltau(test_y, ordinal_pred_y)
tau