## Practica Semanal Calificada

Ud va a desarrollar un modelo de clasificacion binaria utilizando **regresion logistica** para determinar si un paciente tiene o no un padecimiento cardiaco.

La variable a predecir es la variable "target" la cual tiene valores de 1 y 0. Donde 1 es que el paciente tiene un padecimiento cardiaco.

Los atributos X son los siguientes:

- age
- sex
- chest pain type (4 values)
- resting blood pressure
- serum cholestoral in mg/dl
- fasting blood sugar > 120 mg/dl
- resting electrocardiographic results (values 0,1,2)
- maximum heart rate achieved
- exercise induced angina
- oldpeak = ST depression induced by exercise relative to rest
- the slope of the peak exercise ST segment
- number of major vessels (0-3) colored by flourosopy
- thal: 3 = normal; 6 = fixed defect; 7 = reversable defect

El dataset contiene 303 registros de un hospital de Cleveland. Es recomendable que lea el siguiente [paper](https://www.researchgate.net/publication/309210947_Heart_Disease_prediction_using_Machine_learning_and_Data_Mining_Technique/link/5805eb0f08ae03256b75d9a1/download) para que obtenga un mejor conocimiento sobre el trasfondo del problema. 

In [9]:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt 
from sklearn.preprocessing import normalize
import numpy as np

In [64]:
data = pd.read_csv("data/heart.csv")
data

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0


In [65]:
# Despliego la matriz de correlacion para darme una idea de la interaccion de los features
corr = data.corr()

# Desplegar la matriz de correlaciones con pandas
corr.style.background_gradient(cmap='plasma').set_precision(2)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
age,1.0,-0.1,-0.07,0.28,0.21,0.12,-0.12,-0.4,0.1,0.21,-0.17,0.28,0.07,-0.23
sex,-0.1,1.0,-0.05,-0.06,-0.2,0.05,-0.06,-0.04,0.14,0.1,-0.03,0.12,0.21,-0.28
cp,-0.07,-0.05,1.0,0.05,-0.08,0.09,0.04,0.3,-0.39,-0.15,0.12,-0.18,-0.16,0.43
trestbps,0.28,-0.06,0.05,1.0,0.12,0.18,-0.11,-0.05,0.07,0.19,-0.12,0.1,0.06,-0.14
chol,0.21,-0.2,-0.08,0.12,1.0,0.01,-0.15,-0.01,0.07,0.05,-0.0,0.07,0.1,-0.09
fbs,0.12,0.05,0.09,0.18,0.01,1.0,-0.08,-0.01,0.03,0.01,-0.06,0.14,-0.03,-0.03
restecg,-0.12,-0.06,0.04,-0.11,-0.15,-0.08,1.0,0.04,-0.07,-0.06,0.09,-0.07,-0.01,0.14
thalach,-0.4,-0.04,0.3,-0.05,-0.01,-0.01,0.04,1.0,-0.38,-0.34,0.39,-0.21,-0.1,0.42
exang,0.1,0.14,-0.39,0.07,0.07,0.03,-0.07,-0.38,1.0,0.29,-0.26,0.12,0.21,-0.44
oldpeak,0.21,0.1,-0.15,0.19,0.05,0.01,-0.06,-0.34,0.29,1.0,-0.58,0.22,0.21,-0.43


In [66]:
# EJERCICIO - Feature Engineering
# En esta seccion incluya transformaciones, scaling, eliminar o crear variables segun considere apropiado.

data = data.drop('trestbps',axis=1)
data = data.drop('exang',axis=1)

X = np.array((data.loc[:, data.columns != 'target']))
y = np.array(data['target'])

data

Unnamed: 0,age,sex,cp,chol,fbs,restecg,thalach,oldpeak,slope,ca,thal,target
0,63,1,3,233,1,0,150,2.3,0,0,1,1
1,37,1,2,250,0,1,187,3.5,0,0,2,1
2,41,0,1,204,0,0,172,1.4,2,0,2,1
3,56,1,1,236,0,1,178,0.8,2,0,2,1
4,57,0,0,354,0,1,163,0.6,2,0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,241,0,1,123,0.2,1,0,3,0
299,45,1,3,264,0,1,132,1.2,1,0,3,0
300,68,1,0,193,1,1,141,3.4,1,2,3,0
301,57,1,0,131,0,1,115,1.2,1,1,3,0


In [67]:
# EJERCICIO - Entrenar Modelo & Resampling
# Utilize SKlearn para entrenar el modelo de logistic regression. 
# Como ud solo tiene este dataset, recuerde utilizar la mejor tecnica de resampling. (ver notebook #2)


model = LogisticRegression(multi_class='multinomial', max_iter=1000)
scores = cross_val_score(model, X, y.reshape(-1), cv=10)

print("Exactitud de cada particion:", scores)
print("Exactitud Promedio:", scores.mean())

# Valores antes de transformaciones
#Exactitud de cada particion: [0.87096774 0.80645161 0.83870968 0.86666667 0.9        0.76666667
# 0.86666667 0.83333333 0.7        0.73333333]
#Exactitud Promedio: 0.8182795698924732

Exactitud de cada particion: [0.93548387 0.77419355 0.90322581 0.9        0.9        0.7
 0.83333333 0.93333333 0.7        0.76666667]
Exactitud Promedio: 0.8346236559139785


In [68]:
# EJERCICIO - Metricas y Evaluacion
# Implemente la matrix de confusion y calcule todas las metricas del notebook #4 utilizando classification_report 
# y accuracy_score. Considere el resultado obtenido e itere sobre todo el notebook hasta sentirse satisfecho con
# los resultados. 
# ** ES VALIDO HACER BENCHMARK CON SUS COMPANEROS DE CLASE ** pueden postear sus metricas en el Grupo de WA para
# comparar resultados.

model.fit(X,y)
y_prima = model.predict(X)

accuracy_score(y, y_prima)

print(classification_report(y, y_prima ))


# Valores antes de transformaciones
#         precision recall  f1-score   support
#
#           0       0.89      0.76      0.82       138
#           1       0.82      0.92      0.87       165
#
#    accuracy                           0.85       303
#   macro avg       0.86      0.84      0.84       303
#weighted avg       0.85      0.85      0.85       303


              precision    recall  f1-score   support

           0       0.88      0.80      0.84       138
           1       0.84      0.91      0.87       165

    accuracy                           0.86       303
   macro avg       0.86      0.85      0.86       303
weighted avg       0.86      0.86      0.86       303



In [69]:
#Deberia dar 1
print("Tiene problemas:" ,model.predict([[63,1,3,233,1,0,150,2.3,0,0,1]]))

#Deberia dar 0
print("Tiene problemas:" ,model.predict([[57,1,0,131,0,1,115,1.2,1,1,3]]))

Tiene problemas: [1]
Tiene problemas: [0]


In [124]:
# Bienvenidos al mundo de machine learning!