### Clasificación desempeño del estudiante

El modelo predice el desempeño del estudiante a partir de características relacionadas con: calificación, herramientas de evaluación automática de código fuente y matrícula.

El umbral del desempeño es:

- 0 - Bajo desempeño  (calificación entre 0.0 y 2.9)
- 1 - Medio desempeño (calificación entre 3.0 y 4.0)
- 2 - Alto desempeño  (calificación entre 4.1 y 5.0)

#### Diccionario de datos

| Variable                  | Tipo             | Descripción |
|---------------------------|------------------|-------------|
| curso                     | cadena           | Código del curso |
| nombres                   | cadena           | Nombres del estudiante |
| apellidos                 | cadena           | Apellidos del estudiante |
| correo_electronico        | cadena           | Correo del estudiante |
| lab_1                     | numérico decimal | Calificación del laboratorio 1 |
| tiempo_entrega_lab_1      | numérico decimal | Tiempo de entrega del laboratorio 1 (horas) |
| intentos_lab_1            | numérico entero  | Total de intentos del laboratorio 1 |
| resultado_lab_1           | numérico entero  | Resultado de INGInious para el laboratorio 1 (0- No presentó; 1- Failed ; 2- Overflow ; 3- Success) |
| lab_2                     | numérico decimal | Calificación del laboratorio 2 |
| tiempo_entrega_lab_2      | numérico decimal | Tiempo de entrega del laboratorio 2 |
| cantidad_intentos_lab_2   | numérico entero  | Total de intentos del laboratorio 2 |
| resultado_lab_2           | numérico entero  | Resultado de INGInious para el laboratorio 1 (0- No presentó; 1- Failed ; 2- Overflow ; 3- Success) |
| lab_3                     | numérico decimal | Calificación del laboratorio 3 |
| periodo                   | numérico entero  | (0-Igual a 2022, 1-Inferior a 2022) |
| tipo_matricula            | numérico entero  | Tipo de matricula (0-Retirado, 1-Normal, 2-Repitente) |
| grade                     | numérico entero  | Calificación final (0- Bajo desempeño; 1- Medio desempeño; 2- Alto desempeño) |

#### Se importan las librerias

In [1]:
# Se importan las librerias
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt

# Modelos Clasificación: árboles de decisión, Naive Bayes, Bosque Aleatorio, SVM 
# libreria para Naive Bayes
from sklearn.naive_bayes import GaussianNB
#libreria para SVM
from sklearn.svm import SVC
# libreria para árboles de decisión
from sklearn.tree import DecisionTreeClassifier
# libreria para Bosque Aleatorio
from sklearn.ensemble import RandomForestClassifier
# libreria para Regresión Logística
from sklearn.linear_model import LogisticRegression
# libreria para K-NN
from sklearn.neighbors import KNeighborsClassifier
# libreria para MLP
from sklearn.neural_network import MLPClassifier
# libreria para Gradient Boosting Classifier
from sklearn.ensemble import GradientBoostingClassifier

# Se importa la libreria para dividir los datos de entrenamiento y de pruebas
from sklearn.model_selection import train_test_split
# Genera la matriz de confusión
from sklearn.metrics import confusion_matrix
# Gerera el reporte de la clasificación
from sklearn.metrics import classification_report
# Librerias para métricas del modelo
from sklearn.metrics import precision_score, recall_score, f1_score

# Libreria para calcular la media y la desviación estándar utilizadas en las características
from sklearn.preprocessing import StandardScaler
# Libreria de búsqueda en cuadrícula
from sklearn.model_selection import GridSearchCV

### Se cargan los datos a un dataframe

In [4]:
# Se cargan los registros en un DataFrame 
# y se le asigna el nombre a las columnas
data = pd.read_csv("data/classification_data.csv", sep=";")

data

Unnamed: 0,lab_1,tiempo_entrega_lab_1,intentos_lab_1,resultado_lab_1,lab_2,lab_3,exam_1,grade
0,4.4,0.63,2,3,4.4,4.9,3.5,2
1,3.3,0.33,1,3,3.3,3.7,3.7,1
2,4.7,0.78,3,3,4.3,4.8,4.1,2
3,4.3,0.28,1,3,4.0,4.9,3.4,2
4,4.2,0.50,10,3,3.1,4.9,3.2,1
...,...,...,...,...,...,...,...,...
463,2.2,3.89,6,3,3.0,3.5,4.1,1
464,4.6,0.35,1,0,0.0,0.0,4.3,0
465,4.6,0.48,1,0,5.0,0.0,4.6,1
466,4.6,0.00,0,0,0.0,0.0,3.7,0


### Preprocesamiento de datos

In [5]:
# Se buscan registros NaN para eliminarlos
print('Columna         Cantidad NaN')
print(data.isnull().sum(axis = 0))
print(data.shape)

# Se eliminan los registros NA
#data = data.dropna()

Columna         Cantidad NaN
lab_1                   0
tiempo_entrega_lab_1    0
intentos_lab_1          0
resultado_lab_1         0
lab_2                   0
lab_3                   0
exam_1                  0
grade                   0
dtype: int64
(468, 8)


In [6]:
# Se consulta la cantidad de registros para Grade2
data.groupby('grade').size()

grade
0    162
1    200
2    106
dtype: int64

In [7]:
# Se realiza el resample

from sklearn.utils import resample

df_bajo = data[data['grade'] == 0]
df_medio = data[data['grade'] == 1]
df_alto = data[data['grade'] == 2]

data_resample_bajo = resample(df_bajo,
                replace = True,
                n_samples = 200,
                random_state = 1)

data_resample_alto = resample(df_alto,
                replace = True,
                n_samples = 200,
                random_state = 1)

data2 = pd.concat([data_resample_bajo, df_medio, data_resample_alto])

data2['grade'].value_counts()


0    200
1    200
2    200
Name: grade, dtype: int64

In [8]:
# Se genera las estadísticas del DataFrame
data2.describe()

Unnamed: 0,lab_1,tiempo_entrega_lab_1,intentos_lab_1,resultado_lab_1,lab_2,lab_3,exam_1,grade
count,600.0,600.0,600.0,600.0,600.0,600.0,600.0,600.0
mean,2.954,2.202933,3.06,2.621667,3.2935,4.044833,3.036667,1.0
std,1.854422,4.200492,4.514092,0.888733,1.83559,1.590888,1.528206,0.817178
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,0.3,1.0,3.0,1.6,4.0,1.9,0.0
50%,3.55,0.48,1.0,3.0,4.0,4.9,3.7,1.0
75%,4.7,1.0,3.0,3.0,4.9,5.0,4.4,2.0
max,5.0,22.08,39.0,3.0,5.0,5.0,4.9,2.0


In [9]:
# Se observa el tipo de datos de las columnas
data2.dtypes

lab_1                   float64
tiempo_entrega_lab_1    float64
intentos_lab_1            int64
resultado_lab_1           int64
lab_2                   float64
lab_3                   float64
exam_1                  float64
grade                     int64
dtype: object

#### Se crea el set de entrenamiento y de pruebas

In [22]:
# Se definen los valores de las características	
features = ['lab_1','tiempo_entrega_lab_1','intentos_lab_1']
			
# calificación
X = data2[features]
# se define la variable objetivo
y = data2['grade'].values

# Se dividen los datos para el entrenamiento (80% entrenamiento y 20% pruebas)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8, random_state= 1)

#### Se ocultan todas las advertencias

In [42]:
import warnings

def fxn():
    warnings.warn("deprecated", DeprecationWarning)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    fxn()

#### Mejores características Eli5

In [24]:
# Se crea el modelo
dtc = DecisionTreeClassifier() 
  
# Se entrena el modelo
dtc.fit(X_train, y_train)

pred = dtc.predict(X_test)

# Mejores características - Eli5
from eli5 import show_weights

show_weights(dtc, feature_names = features)

Weight,Feature
0.5254,lab_1
0.3873,tiempo_entrega_lab_1
0.0873,intentos_lab_1


### ----------------------------------------------------------------------

### Predicción sin ajuste de Hiperparámetros

In [25]:
# Se definen los valores de las características
features = ['lab_1','tiempo_entrega_lab_1','intentos_lab_1']
# calificación
X = data2[features]
# se define la variable objetivo
y = data2['grade'].values

# Se dividen los datos para el entrenamiento (80% entrenamiento y 20% pruebas)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    train_size = 0.8, 
                                                    random_state= 1)

#### Naive Bayes

In [26]:
nb = GaussianNB()

# Se entrena el modelo
nb.fit(X_train, y_train)

pred = nb.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la Accuracy del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

[[30  3  8]
 [14  7 11]
 [ 1  7 39]]
              precision    recall  f1-score   support

           0       0.67      0.73      0.70        41
           1       0.41      0.22      0.29        32
           2       0.67      0.83      0.74        47

    accuracy                           0.63       120
   macro avg       0.58      0.59      0.58       120
weighted avg       0.60      0.63      0.61       120

Precisión:  0.6
Recall:  0.63
F1-Score:  0.61


### SVC

In [27]:
svm = SVC()

svm.fit(X_train, y_train)

pred = svm.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

[[21 13  7]
 [ 5 17 10]
 [ 0  2 45]]
              precision    recall  f1-score   support

           0       0.81      0.51      0.63        41
           1       0.53      0.53      0.53        32
           2       0.73      0.96      0.83        47

    accuracy                           0.69       120
   macro avg       0.69      0.67      0.66       120
weighted avg       0.70      0.69      0.68       120

Precisión:  0.7
Recall:  0.69
F1-Score:  0.68


### Decision Tree

In [28]:
# Se crea el modelo
dtc = DecisionTreeClassifier() 
  
# Se entrena el modelo
dtc.fit(X_train, y_train)

pred = dtc.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

[[37  3  1]
 [ 8 20  4]
 [ 0  3 44]]
              precision    recall  f1-score   support

           0       0.82      0.90      0.86        41
           1       0.77      0.62      0.69        32
           2       0.90      0.94      0.92        47

    accuracy                           0.84       120
   macro avg       0.83      0.82      0.82       120
weighted avg       0.84      0.84      0.84       120

Precisión:  0.84
Recall:  0.84
F1-Score:  0.84


#### Random Forest

In [29]:
# Se crea el modelo
rf = RandomForestClassifier()

# Se entrena el modelo
rf.fit(X_train, y_train)

pred = rf.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

[[38  2  1]
 [ 7 20  5]
 [ 0  2 45]]
              precision    recall  f1-score   support

           0       0.84      0.93      0.88        41
           1       0.83      0.62      0.71        32
           2       0.88      0.96      0.92        47

    accuracy                           0.86       120
   macro avg       0.85      0.84      0.84       120
weighted avg       0.86      0.86      0.85       120

Precisión:  0.86
Recall:  0.86
F1-Score:  0.85


#### Logistic Regression

In [30]:
# Se crea el modelo
lr = LogisticRegression()

# Se entrena el modelo
lr.fit(X_train, y_train)

pred = lr.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

[[24  9  8]
 [ 7 16  9]
 [ 0  1 46]]
              precision    recall  f1-score   support

           0       0.77      0.59      0.67        41
           1       0.62      0.50      0.55        32
           2       0.73      0.98      0.84        47

    accuracy                           0.72       120
   macro avg       0.71      0.69      0.68       120
weighted avg       0.71      0.72      0.70       120

Precisión:  0.71
Recall:  0.72
F1-Score:  0.7


#### K-NN

In [33]:
# Se crea el modelo
knn = KNeighborsClassifier()

# Se entrena el modelo
knn.fit(X_train, y_train)

pred = knn.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

[[32  6  3]
 [11 11 10]
 [ 0  6 41]]
              precision    recall  f1-score   support

           0       0.74      0.78      0.76        41
           1       0.48      0.34      0.40        32
           2       0.76      0.87      0.81        47

    accuracy                           0.70       120
   macro avg       0.66      0.67      0.66       120
weighted avg       0.68      0.70      0.68       120

Precisión:  0.68
Recall:  0.7
F1-Score:  0.68


  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)


#### MLP

In [34]:
# Se crea el modelo
mlp = MLPClassifier()

# Se entrena el modelo
mlp.fit(X_train, y_train)

pred = mlp.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

[[32  7  2]
 [ 6 18  8]
 [ 0  3 44]]
              precision    recall  f1-score   support

           0       0.84      0.78      0.81        41
           1       0.64      0.56      0.60        32
           2       0.81      0.94      0.87        47

    accuracy                           0.78       120
   macro avg       0.77      0.76      0.76       120
weighted avg       0.78      0.78      0.78       120

Precisión:  0.78
Recall:  0.78
F1-Score:  0.78




#### Gradient Boosting Classifier

In [35]:
# Se crea el modelo
gbc = GradientBoostingClassifier()

# Se entrena el modelo
gbc.fit(X_train, y_train)

pred = gbc.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

[[33  7  1]
 [ 4 21  7]
 [ 0  0 47]]
              precision    recall  f1-score   support

           0       0.89      0.80      0.85        41
           1       0.75      0.66      0.70        32
           2       0.85      1.00      0.92        47

    accuracy                           0.84       120
   macro avg       0.83      0.82      0.82       120
weighted avg       0.84      0.84      0.84       120

Precisión:  0.84
Recall:  0.84
F1-Score:  0.84


### --------------------------------------------------------------------------

### Predicción con ajuste de Hiperparámetros (Grid Search)

#### Naive Bayes

In [36]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

nb = GaussianNB()

# Parámetros

grid = {
    'var_smoothing': np.logspace(0,-9, num=100)
}

grid_search = GridSearchCV(estimator = nb, 
                           param_grid = grid, 
                           cv= 10, 
                           verbose=1,
                           n_jobs=-1,  
                           scoring = "accuracy")

searchResults = grid_search.fit(X_train, y_train.ravel())

# extract the best model and evaluate it
bestModel = searchResults.best_estimator_

print("Best Parameters (GridSearch):", bestModel)
print("-----------------------------------------------------------")

# Se crea un objeto con los mejores ajustes de Hiperparámetros
nb = bestModel

# Se entrena el modelo
nb.fit(X_train, y_train)

pred = nb.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))



Fitting 10 folds for each of 100 candidates, totalling 1000 fits
Best Parameters (GridSearch): GaussianNB(var_smoothing=0.008111308307896872)
-----------------------------------------------------------
[[30  3  8]
 [14  7 11]
 [ 1  7 39]]
              precision    recall  f1-score   support

           0       0.67      0.73      0.70        41
           1       0.41      0.22      0.29        32
           2       0.67      0.83      0.74        47

    accuracy                           0.63       120
   macro avg       0.58      0.59      0.58       120
weighted avg       0.60      0.63      0.61       120

Precisión:  0.6
Recall:  0.63
F1-Score:  0.61


#### SVC

In [37]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

svm = SVC()

# Parámetros
gamma =  [0.1, 1.0, 10, 100]
C = [0.1, 1.0, 10, 100]
kernel = ['rbf','linear']

grid = dict(gamma = gamma,
            C = C,
            kernel = kernel)

grid_search = GridSearchCV(estimator = svm, 
                           param_grid = grid, 
                           cv= 10,  
                           verbose=1, 
                           n_jobs=-1,
                           scoring = "accuracy")

searchResults = grid_search.fit(X_train, y_train.ravel())

# extract the best model and evaluate it
bestModel = searchResults.best_estimator_

print("Best Parameters (GridSearch):", bestModel)
print("-----------------------------------------------------------")

# Se crea un objeto con los mejores ajustes de Hiperparámetros
svm = bestModel
  
# Se entrena el modelo con los mejores parámetros
svm.fit(X_train, y_train)

pred = svm.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

Fitting 10 folds for each of 32 candidates, totalling 320 fits
Best Parameters (GridSearch): SVC(gamma=100)
-----------------------------------------------------------
[[30  8  3]
 [ 5 19  8]
 [ 2  4 41]]
              precision    recall  f1-score   support

           0       0.81      0.73      0.77        41
           1       0.61      0.59      0.60        32
           2       0.79      0.87      0.83        47

    accuracy                           0.75       120
   macro avg       0.74      0.73      0.73       120
weighted avg       0.75      0.75      0.75       120

Precisión:  0.75
Recall:  0.75
F1-Score:  0.75


#### Decision Tree

In [38]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

dt = DecisionTreeClassifier()

# Parámetros
max_depth = [2, 3, 5, 10, 20]
min_samples_leaf =  [5, 10, 20, 50, 100]
criterion = ["gini", "entropy"]

grid = dict(max_depth = max_depth,
            min_samples_leaf = min_samples_leaf,
            criterion = criterion)

grid_search = GridSearchCV(estimator = dt, 
                           param_grid = grid, 
                           cv= 10,  
                           verbose=1, 
                           n_jobs=-1,
                           scoring = "accuracy")

searchResults = grid_search.fit(X_train, y_train.ravel())

# extract the best model and evaluate it
bestModel = searchResults.best_estimator_

print("Best Parameters (GridSearch):", bestModel)
print("-----------------------------------------------------------")

# Se crea un objeto con los mejores ajustes de Hiperparámetros
dtc = bestModel
  
# Se entrena el modelo con los mejores parámetros
dtc.fit(X_train, y_train)

pred = dtc.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

Fitting 10 folds for each of 50 candidates, totalling 500 fits
Best Parameters (GridSearch): DecisionTreeClassifier(criterion='entropy', max_depth=20, min_samples_leaf=5)
-----------------------------------------------------------
[[33  5  3]
 [10 16  6]
 [ 1  1 45]]
              precision    recall  f1-score   support

           0       0.75      0.80      0.78        41
           1       0.73      0.50      0.59        32
           2       0.83      0.96      0.89        47

    accuracy                           0.78       120
   macro avg       0.77      0.75      0.75       120
weighted avg       0.78      0.78      0.77       120

Precisión:  0.78
Recall:  0.78
F1-Score:  0.77


#### Random Forest

In [39]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

rf = RandomForestClassifier()

# Parámetros
bootstrap = [True, False]
max_depth = [10, 20, 50, 100, None]
max_features = ['sqrt', 'log2', None]
min_samples_leaf = [1, 2, 4]
min_samples_split = [2, 5, 10]
n_estimators = [5, 20, 50, 100]

grid = dict(bootstrap = bootstrap, 
            max_depth = max_depth,
            max_features = max_features,
            min_samples_leaf = min_samples_leaf,
            min_samples_split = min_samples_split,
            n_estimators = n_estimators)

grid_search = GridSearchCV(estimator = rf, 
                           param_grid = grid, 
                           cv= 10,  
                           verbose=1, 
                           n_jobs=-1,
                           scoring = "accuracy")

searchResults = grid_search.fit(X_train, y_train.ravel())

# extract the best model and evaluate it
bestModel = searchResults.best_estimator_

print("Best Parameters (GridSearch):", bestModel)
print("-----------------------------------------------------------")

# Se crea un objeto con los mejores ajustes de Hiperparámetros
rf = bestModel
  
# Se entrena el modelo con los mejores parámetros
rf.fit(X_train, y_train)

pred = rf.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

Fitting 10 folds for each of 1080 candidates, totalling 10800 fits
Best Parameters (GridSearch): RandomForestClassifier(max_depth=100, max_features='sqrt', min_samples_split=5,
                       n_estimators=50)
-----------------------------------------------------------
[[36  3  2]
 [ 8 18  6]
 [ 0  0 47]]
              precision    recall  f1-score   support

           0       0.82      0.88      0.85        41
           1       0.86      0.56      0.68        32
           2       0.85      1.00      0.92        47

    accuracy                           0.84       120
   macro avg       0.84      0.81      0.82       120
weighted avg       0.84      0.84      0.83       120

Precisión:  0.84
Recall:  0.84
F1-Score:  0.83


#### Logistic Regression

In [40]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

lr = LogisticRegression()

# Parámetros
solver = ['lbfgs','newton-cg','liblinear']
penalty = ['l2']
C = [100, 10, 1.0, 0.1, 0.01]
max_iter = [100, 1000,2500, 5000]

grid = dict(solver = solver,
            penalty = penalty,
            C = C,
            max_iter = max_iter)

grid_search = GridSearchCV(estimator = lr, 
                           param_grid = grid, 
                           cv= 10,  
                           verbose=1, 
                           n_jobs=-1,
                           scoring = "accuracy")

searchResults = grid_search.fit(X_train, y_train.ravel())

# extract the best model and evaluate it
bestModel = searchResults.best_estimator_

print("Best Parameters (GridSearch):", bestModel)
print("-----------------------------------------------------------")

# Se crea un objeto con los mejores ajustes de Hiperparámetros
lr = bestModel
  
# Se entrena el modelo con los mejores parámetros
lr.fit(X_train, y_train)

pred = lr.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

Fitting 10 folds for each of 60 candidates, totalling 600 fits
Best Parameters (GridSearch): LogisticRegression(C=100, solver='liblinear')
-----------------------------------------------------------
[[27  6  8]
 [11 11 10]
 [ 0  5 42]]
              precision    recall  f1-score   support

           0       0.71      0.66      0.68        41
           1       0.50      0.34      0.41        32
           2       0.70      0.89      0.79        47

    accuracy                           0.67       120
   macro avg       0.64      0.63      0.63       120
weighted avg       0.65      0.67      0.65       120

Precisión:  0.65
Recall:  0.67
F1-Score:  0.65


#### K-NN

In [43]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

knn = KNeighborsClassifier()

# Parámetros
n_neighbors = [1, 3, 5, 10]
weights = ['uniform','distance']
algorithm = ['auto','ball_tree','kd_tree','brute']

grid = dict(n_neighbors = n_neighbors,
            weights = weights,
            algorithm = algorithm)

grid_search = GridSearchCV(estimator = knn, 
                           param_grid = grid, 
                           cv= 10,  
                           verbose=1, 
                           n_jobs=-1,
                           scoring = "accuracy")

searchResults = grid_search.fit(X_train, y_train.ravel())

# extract the best model and evaluate it
bestModel = searchResults.best_estimator_

print("Best Parameters (GridSearch):", bestModel)
print("-----------------------------------------------------------")

# Se crea un objeto con los mejores ajustes de Hiperparámetros
lr = bestModel
  
# Se entrena el modelo con los mejores parámetros
lr.fit(X_train, y_train)

pred = lr.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

Fitting 10 folds for each of 32 candidates, totalling 320 fits


  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
  mode, _ = stats.mo

Best Parameters (GridSearch): KNeighborsClassifier(n_neighbors=10, weights='distance')
-----------------------------------------------------------
[[38  2  1]
 [11 14  7]
 [ 0  2 45]]
              precision    recall  f1-score   support

           0       0.78      0.93      0.84        41
           1       0.78      0.44      0.56        32
           2       0.85      0.96      0.90        47

    accuracy                           0.81       120
   macro avg       0.80      0.77      0.77       120
weighted avg       0.80      0.81      0.79       120

Precisión:  0.8
Recall:  0.81
F1-Score:  0.79


  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)


#### MLP

In [44]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

mlp = MLPClassifier(max_iter=150)

# Parámetros
hidden_layer_sizes = [(10,),(20,),(50,),(100,)]
activation = ['tanh', 'relu']
solver = ['sgd', 'adam']
alpha = [0.0001, 0.05]
learning_rate =  ['constant','adaptive']

grid = dict(hidden_layer_sizes = hidden_layer_sizes,
            activation = activation, 
            solver = solver,
            alpha = alpha,
            learning_rate = learning_rate)


'''
grid = {
    'hidden_layer_sizes': [(10,),(20,),(50,),(100,)],
    'activation': ['tanh', 'relu'], 
    'solver': ['sgd', 'adam'], 
    'alpha' : [0.0001, 0.05],
    'learning_rate' : ['constant','adaptive'],
}
'''

grid_search = GridSearchCV(estimator = mlp, 
                           param_grid = grid, 
                           cv= 3,  
                           verbose=1, 
                           n_jobs=-1,
                           scoring = "accuracy")

searchResults = grid_search.fit(X_train, y_train.ravel())

# extract the best model and evaluate it
bestModel = searchResults.best_estimator_

print("Best Parameters (GridSearch):", bestModel)
print("-----------------------------------------------------------")

# Se crea un objeto con los mejores ajustes de Hiperparámetros
lr = bestModel
  
# Se entrena el modelo con los mejores parámetros
lr.fit(X_train, y_train)

pred = lr.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

Fitting 3 folds for each of 64 candidates, totalling 192 fits




Best Parameters (GridSearch): MLPClassifier(hidden_layer_sizes=(50,), max_iter=150)
-----------------------------------------------------------
[[25  9  7]
 [ 6 18  8]
 [ 0  2 45]]
              precision    recall  f1-score   support

           0       0.81      0.61      0.69        41
           1       0.62      0.56      0.59        32
           2       0.75      0.96      0.84        47

    accuracy                           0.73       120
   macro avg       0.73      0.71      0.71       120
weighted avg       0.73      0.73      0.72       120

Precisión:  0.73
Recall:  0.73
F1-Score:  0.72




#### GBC

In [45]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

gbc = GradientBoostingClassifier()

# Parámetros
#loss = ['log_loss', 'exponential']
learning_rate = [0.01, 0.05, 0.1, 0.15, 0.2]
criterion = ['friedman_mse', 'squared_error']
max_depth = [3,5,8]
max_features = ['log2','sqrt']

grid = dict(#loss = loss,
            learning_rate = learning_rate,
            criterion = criterion,
            max_depth = max_depth,
            max_features = max_features)

grid_search = GridSearchCV(estimator = gbc, 
                           param_grid = grid, 
                           cv= 10,  
                           verbose=1, 
                           n_jobs=-1,
                           scoring = "accuracy")

searchResults = grid_search.fit(X_train, y_train.ravel())

# extract the best model and evaluate it
bestModel = searchResults.best_estimator_

print("Best Parameters (GridSearch):", bestModel)
print("-----------------------------------------------------------")

# Se crea un objeto con los mejores ajustes de Hiperparámetros
gbc = bestModel
  
# Se entrena el modelo con los mejores parámetros
gbc.fit(X_train, y_train)

pred = gbc.predict(X_test)

# Se imprime la matriz de confusión
print(confusion_matrix(y_test, pred))
# Se imprime la precisión del modelo
print(classification_report(y_test, pred))

# Otras métricas clasificación: Precisión, Recall, F1-Score
print("Precisión: ", round(precision_score(y_test, pred, average='weighted'), 2))
print("Recall: ", round(recall_score(y_test, pred, average='weighted'),2))
print("F1-Score: ", round(f1_score(y_test, pred, average='weighted'),2))

Fitting 10 folds for each of 60 candidates, totalling 600 fits
Best Parameters (GridSearch): GradientBoostingClassifier(learning_rate=0.2, max_features='log2')
-----------------------------------------------------------
[[35  5  1]
 [ 4 22  6]
 [ 0  0 47]]
              precision    recall  f1-score   support

           0       0.90      0.85      0.88        41
           1       0.81      0.69      0.75        32
           2       0.87      1.00      0.93        47

    accuracy                           0.87       120
   macro avg       0.86      0.85      0.85       120
weighted avg       0.86      0.87      0.86       120

Precisión:  0.86
Recall:  0.87
F1-Score:  0.86
