# Diabetes Classification - Data Explore

[Jose R. Zapata](https://joserzapata.github.io)
- https://joserzapata.github.io
- https://twitter.com/joserzapata
- https://www.linkedin.com/in/jose-ricardo-zapata-gonzalez/       


## Introduction


Analyze factors related to readmission as well as other outcomes pertaining to patients in order to classify a patient-hospital outcome

3 different outputs:

1. No readmission

2. A readmission in less than `30` days (this situation is not good, because maybe your treatment was not appropriate);

3. A readmission in more than 30 days (this one is not so good as well the last one, however, the reason could be the state of the patient.


## Main Objective

> **How effective was the treatment received in hospital?**

## Principal References

### Paper

Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, “Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records,” BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.

https://www.hindawi.com/journals/bmri/2014/781670/

### Dataset

https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008#

### Data description

https://www.hindawi.com/journals/bmri/2014/781670/tab1/

# Import libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

# Load Dataset

In [2]:
diabetic_df = pd.read_csv("../data/interim/diabetes_clean.csv")

In [3]:
diabetic_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53216 entries, 0 to 53215
Data columns (total 43 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   race                      53216 non-null  object
 1   gender                    53216 non-null  object
 2   age                       53216 non-null  object
 3   admission_type_id         53216 non-null  int64 
 4   discharge_disposition_id  53216 non-null  int64 
 5   admission_source_id       53216 non-null  int64 
 6   time_in_hospital          53216 non-null  int64 
 7   medical_specialty         53216 non-null  object
 8   num_lab_procedures        53216 non-null  int64 
 9   num_procedures            53216 non-null  int64 
 10  num_medications           53216 non-null  int64 
 11  number_outpatient         53216 non-null  int64 
 12  number_emergency          53216 non-null  int64 
 13  number_inpatient          53216 non-null  int64 
 14  diag_1                

# Exploratory Data Analysis

Is important to convert to categorical data the features represented by numbers




In [4]:
diabetic_df["admission_type_id"] = diabetic_df["admission_type_id"].astype('category')
diabetic_df["discharge_disposition_id"] = diabetic_df["discharge_disposition_id"].astype('category')
diabetic_df["admission_source_id"] = diabetic_df["admission_source_id"].astype('category')

Before a specific data exploration and feature engineering

I will use the data profilling tool:

`pandas-profilling` - https://pandas-profiling.github.io/pandas-profiling/docs/master/index.html


In [5]:
# import pandas profilling lib
from pandas_profiling import ProfileReport

In [6]:
#profile = ProfileReport(diabetic_df, title="Diabetes data Profiling Report 1",
#                        explorative=True, samples=None, correlations=None, ##                        missing_diagrams=None, duplicates=None, interactions=None)

In [7]:
#profile.to_notebook_iframe()

In [8]:
# save data profilling report
##profile.to_file("../reports/Diabetes_data_profiling_univariate_report_1.html")

## Data profiling analysis

### Univariate analysis and Transformations

- race: 77.7% of the people in the dataset is caucasian
- gender: division is balanced
- age: is categorical in ranges of 10 years, around 47.3% of the people are over 60 years and 65.4% over 50
- admission_type_id: 50% of the people admision is for Emergency
- discharge_disposition_id: 65.2% of the people where discharged to 
- admission_source_id: 51.8% of the people came from Emergency Room
- time_in_hospital: skew = 1.22, median = 3 days, max = 14 days, Q1 = 2, IQR=4, std=2.91
- medical_specialty: HIGH CARDINALITY 68 distinct values, 47.3% unknow, 16% Internal medicine, 6.9
% General
- num_lab_procedures: median = 44, IQR=26, Skew = -0.21, std = 19.9, max = 121
- num_procedures: ZEROS, max = 6 , skew = 1.19, median = 1, mean = 1.46, std = 1.77, IQR=2
- num_medications: min=1, max=81, median=14, skew=1.44, std=8.4
- number_outpatient: ZEROS, median=0, skew=9.16, max 36, std=1.02
- number_inpatient: ZEROS, median=0, skew=5.71, max=12, std=0.59
- diag_1: HIGH CARDINALITY, 671 distinct values, 414 7.7% most common
- diag_2: HIGH CARDINALITY, 697 distinct values, 250 7.7% most common
- diag_3: HIGH CARDINALITY, 724 distinct values, 250 14.2% most common
- max_glu_serum: 95.6% None
- A1Cresult:  81.2% None
- metformin: 78% No
- repaglinide: 98.8% No
- nateglinide: 99.3% No
- chlorpropamide: 99.9 No
- glimepiride: 94.7% No
- acetohexamide: 99.9% No
- glipizide: 87.4% No
- glyburide: 89.1% No
- tolbutamide: 99.9% No
- pioglitazone: 92.5% No
- rosiglitazone: 93.5% No
- acarbose: 99.7% No
- miglitol: 99.9% No
- troglitazone: 99.9% No
- tolazamide: 99.9% No
- insulin: 49.8% No
- glyburide-metformin: 99.3%
- glipizide-metformin: 99.9%
- metformin-rosiglitazone: 99.9%
- metformin-pioglitazone: 99.9%
- change: Balanced
- diabetesMed: 75.2% Yes
- **readmitted:output variable, imbalanced, 77.4% No , >30 18.5%, <30 4%**

**Variable Warnings**

variable | descrip | warning
     --- | ---     | ---
medical_specialty | has a high cardinality:  68 distinct values|	High cardinality
diag_1 | has a high cardinality:  671 distinct values	| High cardinality
diag_2 | has a high cardinality:  697 distinct values	| High cardinality
diag_3 | has a high cardinality:  724 distinct values	| High cardinality
number_emergency | is highly skewed (γ1 = 27.22616943)	| Skewed
num_procedures | has 22882 (43.0%) zeros	| Zeros
number_outpatient | has 46621 (87.6%) zeros	| Zeros
number_emergency | has 49784 (93.6%) zeros	| Zeros
number_inpatient | has 47199 (88.7%) zeros	| Zeros

### Feature Engineering 1

#### diag_1, diag_2 and diag_3
The variables diag_1, diag_2, diag_3 they had a lot of unique values, i'm going to aggregate the number of levels into less categories in orderto minimize the complexity. Base on [Table 2](https://www.hindawi.com/journals/bmri/2014/781670/tab2/) of the research report, [Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records](https://www.hindawi.com/journals/bmri/2014/781670/) and converted the levels of all three variables into 9 categories. 



In [9]:
diag_cols = ['diag_1','diag_2','diag_3']
df_copy = diabetic_df[diag_cols].copy()

In [10]:
diag_cols = ['diag_1','diag_2','diag_3']
df_copy = diabetic_df[diag_cols].copy()
for col in diag_cols:
    df_copy[col] = df_copy[col].str.replace('E','-')
    df_copy[col] = df_copy[col].str.replace('V','-')
    condition = df_copy[col].str.contains('250')
    df_copy.loc[condition,col] = '250'

df_copy[diag_cols] = df_copy[diag_cols].astype(float)

In [11]:
for col in diag_cols:
    df_copy['temp']=np.nan
    
    condition = (df_copy[col]>=390) & (df_copy[col]<=459) | (df_copy[col]==785)
    df_copy.loc[condition,'temp']='Circulatory'
    
    condition = (df_copy[col]>=460) & (df_copy[col]<=519) | (df_copy[col]==786)
    df_copy.loc[condition,'temp']='Respiratory'
    
    condition = (df_copy[col]>=520) & (df_copy[col]<=579) | (df_copy[col]==787)
    df_copy.loc[condition,'temp']='Digestive'

    condition = (df_copy[col]>=800) & (df_copy[col]<=999)
    df_copy.loc[condition,'temp']='Injury'

    condition = (df_copy[col]>=710) & (df_copy[col]<=739)
    df_copy.loc[condition,'temp']='Muscoloskeletal'
    
    condition = (df_copy[col]>=580) & (df_copy[col]<=629) | (df_copy[col]==788)
    df_copy.loc[condition,'temp']='Genitourinary'    
     
    condition = (df_copy[col]>=140) & (df_copy[col]<=239) | (df_copy[col]==780)
    df_copy.loc[condition,'temp']='Neoplasms'

    condition = (df_copy[col]>=240) & (df_copy[col]<=279) | (df_copy[col]==781)
    df_copy.loc[condition,'temp']='Neoplasms'

    condition = (df_copy[col]>=680) & (df_copy[col]<=709) | (df_copy[col]==782)
    df_copy.loc[condition,'temp']='Neoplasms'

    condition = (df_copy[col]>=790) & (df_copy[col]<=799) | (df_copy[col]==784)
    df_copy.loc[condition,'temp']='Neoplasms'
    
    condition = (df_copy[col]>=1) & (df_copy[col]<=139)
    df_copy.loc[condition,'temp']='Neoplasms'

    condition = (df_copy[col]>=290) & (df_copy[col]<=319)
    df_copy.loc[condition,'temp']='Neoplasms'

    condition = (df_copy[col]==250)
    df_copy.loc[condition,'temp']='Diabetes'
    
    #condition = df_copy[col]==0
    #df_copy.loc[condition,col]='?'
    
    df_copy['temp']=df_copy['temp'].fillna('Others')
    condition = df_copy['temp']=='0'
    df_copy.loc[condition,'temp']=np.nan
    df_copy[col]=df_copy['temp']
    df_copy.drop('temp',axis=1,inplace=True)

In [12]:
diabetic_df[diag_cols] = df_copy.copy()
del df_copy

In [13]:
# Number of unique values per diag columns
diabetic_df[diag_cols].describe().loc['unique']

diag_1    9
diag_2    9
diag_3    9
Name: unique, dtype: object

#### medical_specialty

This variable has high cardinality

In [14]:
px.histogram(diabetic_df, x="medical_specialty", title = "medical_specialty").update_xaxes(categoryorder= "total descending")

How is the interquartile distribution of `medical_specialty`

In [15]:
medi_no_unkown = diabetic_df["medical_specialty"].value_counts().iloc[1:]
print(f'Number of medical specialty know values = {medi_no_unkown.sum()}')
px.box(medi_no_unkown,title = "medical_specialty")

Number of medical specialty know values = 28045


less of the 75% of medical specialty has less than 240 pattiens is around .0086% of the data,
so i'm going to change all the specialty below 240 pattiens to value = 'Other'  

In [16]:
ms_counts = diabetic_df["medical_specialty"].value_counts()
top = ms_counts[ms_counts>240]
print(f'Number of top specialty = {len(top)}')

Number of top specialty = 18


In [17]:
# Change specialty to 
diabetic_df.loc[~diabetic_df['medical_specialty'].isin(list(top.index)),'medical_specialty']='Other'

In [18]:
diabetic_df["medical_specialty"].unique()

array(['Other', 'Unknow', 'InternalMedicine', 'Family/GeneralPractice',
       'Cardiology', 'Surgery-General', 'Orthopedics', 'Gastroenterology',
       'Nephrology', 'Orthopedics-Reconstructive',
       'Surgery-Cardiovascular/Thoracic', 'Pulmonology', 'Psychiatry',
       'Emergency/Trauma', 'Surgery-Neuro', 'ObstetricsandGynecology',
       'Urology', 'Surgery-Vascular', 'Radiologist'], dtype=object)

#### Glucose Serum test
A blood glucose test is used to find out if your blood sugar levels are in the healthy range. It is often used to help diagnose and monitor diabetes.

'>200' : = indicates diabetes '>300' : = Indicates diabetes 'Norm' : = Normal 'None' : = test was not taken
values will be replaced with numeric values

'>200' : 1 '>300' : 1 'Norm' :0  'None' : -99


In [19]:
diabetic_df["max_glu_serum"].replace({'>200':1 ,'>300':1 ,'Norm':0 ,'None':-99}, inplace=True)

A1C result and Readmission A1C test - The A1C test is a blood test that provides information about your average levels of blood glucose, also called blood sugar, over the past 3 months
values will be replaced with numeric values

'>7' :1
'>8' :1
Norm : 0 = Normal
None : -99 = Test was not taken

In [20]:
diabetic_df["A1Cresult"].replace({'>7':1 ,'>8':1 ,'Norm':0 ,'None':-99}, inplace=True)

### Encoding categorical nominal features

Simple value encoding for binomial categorical data

In [21]:

diabetic_df['change'].replace('Ch', 1, inplace=True)
diabetic_df['change'].replace('No', 0, inplace=True)
diabetic_df['gender'].replace('Male', 1, inplace=True)
diabetic_df['gender'].replace('Female', 0, inplace=True)
diabetic_df['diabetesMed'].replace('Yes', 1, inplace=True)
diabetic_df['diabetesMed'].replace('No', 0, inplace=True)

### Medications

Number of medication changes: The dataset contains 23 features for 23 drugs (or combos) which indicate for each of these, whether a change in that medication was made or not during the current hospital stay of patient. 

Medication change for diabetics upon admission has been shown in this research: [What are Predictors of Medication Change and Hospital Readmission in Diabetic Patients?](https://www.ischool.berkeley.edu/projects/2017/what-are-predictors-medication-change-and-hospital-readmission-diabetic-patients) to be associated with lower readmission rates.

New variable is created  to count how many changes were made in total for each patient.

In [22]:
keys = ['metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride', 'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol', 'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone','metformin-rosiglitazone', 'glipizide-metformin', 'troglitazone', 'tolbutamide', 'acetohexamide']

In [23]:
for col in keys:
    colname = str(col) + 'temp'
    diabetic_df[colname] = diabetic_df[col].apply(lambda x: 0 if (x == 'No' or x == 'Steady') else 1)
diabetic_df['numchange'] = 0
for col in keys:
    colname = str(col) + 'temp'
    diabetic_df['numchange'] = diabetic_df['numchange'] + diabetic_df[colname]
    del diabetic_df[colname]
    
diabetic_df['numchange'].value_counts()  

0    40768
1    11721
2      666
3       58
4        3
Name: numchange, dtype: int64

To simplify the variables into the model

Encoding each of the 23 drugs . To better fit those variables into a Machine Learning Model, we interpret the variables to numeric binary variables to reflect their nature.

- ('No', 0)
- ('Steady', 1)
- ('Up', 1)
- ('Down', 1) 

In [24]:
for col in keys:
    diabetic_df[col].replace({'No': 0,'Steady': 1 , 'Up':1, 'Down': 1},inplace=True)    

### Numerical data with too much Zeros
The colums: num_procedures,  number_outpatient, number_emergency and number_inpatient

this information are relevant and i don't transform the data in this step

### Outcome Variable
The main question is

> How effective was the treatment received in hospital?

- A readmission in less than 30 days (this situation is not good, **because maybe your treatment was not appropriate**)
- A readmission in more than 30 days (this one is not so good as well the last one, however, **the reason could be the state of the patient**).

So my goal is to classify pattiens with readmission in less than 30 days, so i'm combining the information of readmission in more than 30 days `>30` and No readmissions

i'm going to encode the output variable:

- `1` if readmission in less than 30 days `<30`
- `0` if No readmission or readmission in more than 30 days `>30`


In [25]:
diabetic_df["readmitted"].replace({'<30':1, '>30':0, 'NO':0}, inplace=True)

In [26]:
diabetic_df["readmitted"].value_counts()

0    51063
1     2153
Name: readmitted, dtype: int64

**The variable is still imbalanced**, i'm going to balanced the output after variable analysis

Base on the paper: [Imbalanced Data Problem Solving in Classification of Diabetes Patients](https://ph02.tci-thaijo.org/index.php/gskku/article/view/145226)

The best results in this dataset was obtained using SMOTE. compared against Oversampling, undersampling or Hybrid

### Check for duplicates

after the data transformations new duplicate records can occur

In [27]:
diabetic_df.duplicated().sum()

1

In [28]:
diabetic_df.drop_duplicates(keep=False,inplace=True, ignore_index=True) 

In [29]:
##profile = ProfileReport(diabetic_df, title="Diabetes data Profiling Report 2",#
#                        explorative=True, samples=None, correlations=None,# 
#                        missing_diagrams=None, duplicates=None, interactions=None)
#profile.to_notebook_iframe()                        

In [30]:
##profile.to_file("../reports/Diabetes_data_profiling_univariate_report_2.html")

In [31]:
diabetic_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53214 entries, 0 to 53213
Data columns (total 44 columns):
 #   Column                    Non-Null Count  Dtype   
---  ------                    --------------  -----   
 0   race                      53214 non-null  object  
 1   gender                    53214 non-null  int64   
 2   age                       53214 non-null  object  
 3   admission_type_id         53214 non-null  category
 4   discharge_disposition_id  53214 non-null  category
 5   admission_source_id       53214 non-null  category
 6   time_in_hospital          53214 non-null  int64   
 7   medical_specialty         53214 non-null  object  
 8   num_lab_procedures        53214 non-null  int64   
 9   num_procedures            53214 non-null  int64   
 10  num_medications           53214 non-null  int64   
 11  number_outpatient         53214 non-null  int64   
 12  number_emergency          53214 non-null  int64   
 13  number_inpatient          53214 non-null  int6

## Bivariate analysis

### Target vs Categorical data

In [32]:
import scipy.stats as stats
from scipy.stats import chi2_contingency

class ChiSquare:
    def __init__(self, dataframe):
        self.df = dataframe
        self.p = None #P-Value
        self.chi2 = None #Chi Test Statistic
        self.dof = None
        
        self.dfObserved = None
        self.dfExpected = None
        
    def _print_chisquare_result(self, colX, alpha):
        result = ""
        if self.p<alpha:
            result="{0} is IMPORTANT for Prediction".format(colX)
        else:
            result="{0} is NOT an important predictor. (Discard {0} from model)".format(colX)

        print(result)
        
    def TestIndependence(self,colX,colY, alpha=0.05):
        X = self.df[colX].astype(str)
        Y = self.df[colY].astype(str)
        
        self.dfObserved = pd.crosstab(Y,X) 
        chi2, p, dof, expected = stats.chi2_contingency(self.dfObserved.values)
        self.p = p
        self.chi2 = chi2
        self.dof = dof 
        
        self.dfExpected = pd.DataFrame(expected, columns=self.dfObserved.columns, index =                                                       self.dfObserved.index)
        
        self._print_chisquare_result(colX,alpha)

In [33]:
#Initialize ChiSquare Class
cT = ChiSquare(diabetic_df)

In [34]:
diabetic_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53214 entries, 0 to 53213
Data columns (total 44 columns):
 #   Column                    Non-Null Count  Dtype   
---  ------                    --------------  -----   
 0   race                      53214 non-null  object  
 1   gender                    53214 non-null  int64   
 2   age                       53214 non-null  object  
 3   admission_type_id         53214 non-null  category
 4   discharge_disposition_id  53214 non-null  category
 5   admission_source_id       53214 non-null  category
 6   time_in_hospital          53214 non-null  int64   
 7   medical_specialty         53214 non-null  object  
 8   num_lab_procedures        53214 non-null  int64   
 9   num_procedures            53214 non-null  int64   
 10  num_medications           53214 non-null  int64   
 11  number_outpatient         53214 non-null  int64   
 12  number_emergency          53214 non-null  int64   
 13  number_inpatient          53214 non-null  int6

In [35]:
#categorical variables
testCol = ['race', 'age', 'admission_type_id',
       'discharge_disposition_id', 'admission_source_id', 
       'medical_specialty', 'diag_1', 'diag_2', 'diag_3']

In [36]:
for var in testCol:
    cT.TestIndependence(colX=var,colY='readmitted')

race is IMPORTANT for Prediction
age is IMPORTANT for Prediction
admission_type_id is IMPORTANT for Prediction
discharge_disposition_id is IMPORTANT for Prediction
admission_source_id is IMPORTANT for Prediction
medical_specialty is IMPORTANT for Prediction
diag_1 is IMPORTANT for Prediction
diag_2 is IMPORTANT for Prediction
diag_3 is NOT an important predictor. (Discard diag_3 from model)


After the chi-squared results i'm going to deleted the diag_3 variable 

In [37]:
px.histogram(diabetic_df, y = 'diag_3', color='readmitted', barmode='group' , title= "diag_3 vs readmitted")

In [38]:
diabetic_df.drop('diag_3',axis=1,inplace=True)

In [39]:
diabetic_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53214 entries, 0 to 53213
Data columns (total 43 columns):
 #   Column                    Non-Null Count  Dtype   
---  ------                    --------------  -----   
 0   race                      53214 non-null  object  
 1   gender                    53214 non-null  int64   
 2   age                       53214 non-null  object  
 3   admission_type_id         53214 non-null  category
 4   discharge_disposition_id  53214 non-null  category
 5   admission_source_id       53214 non-null  category
 6   time_in_hospital          53214 non-null  int64   
 7   medical_specialty         53214 non-null  object  
 8   num_lab_procedures        53214 non-null  int64   
 9   num_procedures            53214 non-null  int64   
 10  num_medications           53214 non-null  int64   
 11  number_outpatient         53214 non-null  int64   
 12  number_emergency          53214 non-null  int64   
 13  number_inpatient          53214 non-null  int6

### Target vs Numeric Columns

In [40]:
newdf = diabetic_df.select_dtypes(include=['int64'])
newdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53214 entries, 0 to 53213
Data columns (total 35 columns):
 #   Column                   Non-Null Count  Dtype
---  ------                   --------------  -----
 0   gender                   53214 non-null  int64
 1   time_in_hospital         53214 non-null  int64
 2   num_lab_procedures       53214 non-null  int64
 3   num_procedures           53214 non-null  int64
 4   num_medications          53214 non-null  int64
 5   number_outpatient        53214 non-null  int64
 6   number_emergency         53214 non-null  int64
 7   number_inpatient         53214 non-null  int64
 8   number_diagnoses         53214 non-null  int64
 9   max_glu_serum            53214 non-null  int64
 10  A1Cresult                53214 non-null  int64
 11  metformin                53214 non-null  int64
 12  repaglinide              53214 non-null  int64
 13  nateglinide              53214 non-null  int64
 14  chlorpropamide           53214 non-null  int64
 15  gl

In [41]:
newdf_corr = newdf.corr()

In [42]:
px.imshow(newdf_corr,  width=800, height=800)

# Save data imbalanced

In [43]:
diabetic_df.to_csv("../data/processed/data_imbalanced.csv",index=False)

## Outliers

## Feature Engineering 
### Transformacion de Variables
**Se usa cuando es necesario:**
- Cambiar la escala las variables (Normalizar, escalamiento, etc), esto no cambia la forma de la distribucion de los datos
- Cambiar relaciones No ineales en Lineales (Ej: cambiar la escala Logaritmica a lineal)
- Usar una la distribución simétrica en vez de una distribución sesgada o asimetrica, Para una la distribución sesgada a la derecha, tomamos la raíz cuadrada / cúbica o el logaritmo de la variable, y para la desviación a la izquierda, tomamos el cuadrado / cubo o exponencial de las variables.
- Cambair variables continuas a categoricas

Los tipos de transformacion son:

#### Normalizacion

#### Escalamiento

#### Logaritmica

#### raíz cuadrada / cúbica

#### Binning , Cambios de Numericas a Categoricas

### Creacion de Variables
pueden ser:
#### Crear Variables derivadas de Otras

# Ayudas Y Referencias

- Correction to: Hospital Readmission of Patients with Diabetes - https://link.springer.com/article/10.1007/s11892-018-0989-1

- Center for disease control and prevention, Diabetes atlas- https://gis.cdc.gov/grasp/diabetes/DiabetesAtlas.html

- https://medium.com/@joserzapata/paso-a-paso-en-un-proyecto-machine-learning-bcdd0939d387
- [a-complete-machine-learning-walk-through-in-python-part-one](https://towardsdatascience.com/a-complete-machine-learning-walk-through-in-python-part-one-c62152f39420)

- https://www.kaggle.com/vignesh1609/readmission-classification-model

- https://www.kaggle.com/kavyarall/predicting-effective-treatments/

[Jose R. Zapata](https://joserzapata.github.io)
- https://joserzapata.github.io
- https://twitter.com/joserzapata
- https://www.linkedin.com/in/jose-ricardo-zapata-gonzalez/   