# Modelo predictivo. Aplicación usuario 👫🏻

*Elaboración de un modelo predictivo para la variable Attrition.Modelado de los datos en base a su importanci relativa para acomodar la elección e introducción de los datos por parte del usuario. Exportación del modelo para uso de usuario.*

## Índice 📎

1. Exportación de librerías
2. Importación de los datos
3. Modelado de los datos
4. Modelo usuario
5. Exportación del modelo y los datos

## 1. Exportación de librerias 📚

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score,f1_score,precision_score,recall_score
from sklearn.metrics import mean_squared_error
from sklearn.tree import plot_tree

## 2. Importación de los datos 📉

In [3]:
data = pd.read_csv("Data/Attrition_modeldata.csv")
data.drop("Unnamed: 0",axis=1, inplace=True)

In [4]:
data.head()

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EnvironmentSatisfaction,Gender,...,PerformanceRating,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,2,Female,...,3,1,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,3,Male,...,4,4,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,4,Male,...,3,2,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,4,Female,...,3,3,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,1,Male,...,3,4,1,6,3,3,2,2,2,2


In [5]:
data.columns

Index(['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department',
       'DistanceFromHome', 'Education', 'EducationField',
       'EnvironmentSatisfaction', 'Gender', 'HourlyRate', 'JobInvolvement',
       'JobLevel', 'JobRole', 'JobSatisfaction', 'MaritalStatus',
       'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked', 'OverTime',
       'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction',
       'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear',
       'WorkLifeBalance', 'YearsAtCompany', 'YearsInCurrentRole',
       'YearsSinceLastPromotion', 'YearsWithCurrManager'],
      dtype='object')

## 3. Modelado de los datos 🤗

*Partiremos del modelo Random Forest ya que en el anterior notebook (2.Modelo predictivo.Attririon) concluíamos que era el que nos aportaba mayor fiabilidad y exhaustividad.*

Teniendo en cuenta la importancia de cada variable a la hora de predecir el modelo procederemos a eliminar las columnas que menos influyen o afectan a tal predicción de cara a reducir los campos que el usuario/cliente final deberá rellenar en la web para predecir la posibilidad o índice de deserción de su plantilla.

In [6]:
data2 = data

In [7]:
data2.drop(["EducationField","DailyRate","HourlyRate","MonthlyRate","YearsSinceLastPromotion","YearsWithCurrManager","TotalWorkingYears","JobInvolvement","PerformanceRating","JobLevel","JobInvolvement","TrainingTimesLastYear","StockOptionLevel"], axis=1,inplace=True)

In [8]:
data2.columns

Index(['Age', 'Attrition', 'BusinessTravel', 'Department', 'DistanceFromHome',
       'Education', 'EnvironmentSatisfaction', 'Gender', 'JobRole',
       'JobSatisfaction', 'MaritalStatus', 'MonthlyIncome',
       'NumCompaniesWorked', 'OverTime', 'PercentSalaryHike',
       'RelationshipSatisfaction', 'WorkLifeBalance', 'YearsAtCompany',
       'YearsInCurrentRole'],
      dtype='object')

Antes de diseñar el modelo predictivo convertiremos algunas columnas "manualmente" de númerica a categórica (para el caso de aquellas que dispongan de pocos valores únicos), mientras que para el resto lo realizaremos mediante get dummies.

*Para ver más detalle del proceso véase el jupyter notebook:  "2. Modelo predictivo.Attrition"*

- Attrition

In [9]:
valores= {"Yes":1, "No":0} #definimos los nuevos valores
data2["Attrition"] = data2["Attrition"].apply(lambda x: valores[x]) #reestablecemos los valores

- Gender

In [10]:
valores2= {"Female":1, "Male":0} #definimos los nuevos valores
data2["Gender"] = data2["Gender"].apply(lambda x: valores2[x]) #reestablecemos los valores

- OverTime

In [11]:
valores3= {"Yes":1, "No":0} #definimos los nuevos valores
data2["OverTime"] = data2["OverTime"].apply(lambda x: valores3[x]) #reestablecemos los valores

In [12]:
data2.head()

Unnamed: 0,Age,Attrition,BusinessTravel,Department,DistanceFromHome,Education,EnvironmentSatisfaction,Gender,JobRole,JobSatisfaction,MaritalStatus,MonthlyIncome,NumCompaniesWorked,OverTime,PercentSalaryHike,RelationshipSatisfaction,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole
0,41,1,Travel_Rarely,Sales,1,2,2,1,Sales Executive,4,Single,5993,8,1,11,1,1,6,4
1,49,0,Travel_Frequently,Research & Development,8,1,3,0,Research Scientist,2,Married,5130,1,0,23,4,3,10,7
2,37,1,Travel_Rarely,Research & Development,2,2,4,0,Laboratory Technician,3,Single,2090,6,1,15,2,3,0,0
3,33,0,Travel_Frequently,Research & Development,3,4,4,1,Research Scientist,3,Married,2909,1,1,11,3,3,8,7
4,27,0,Travel_Rarely,Research & Development,2,1,1,0,Laboratory Technician,2,Married,3468,9,0,12,4,3,2,2


- Resto de valores

In [13]:
data2 = pd.get_dummies(data2)
data2.shape

(1470, 33)

In [14]:
#nuevas columnas generadas
data2.columns

Index(['Age', 'Attrition', 'DistanceFromHome', 'Education',
       'EnvironmentSatisfaction', 'Gender', 'JobSatisfaction', 'MonthlyIncome',
       'NumCompaniesWorked', 'OverTime', 'PercentSalaryHike',
       'RelationshipSatisfaction', 'WorkLifeBalance', 'YearsAtCompany',
       'YearsInCurrentRole', 'BusinessTravel_Non-Travel',
       'BusinessTravel_Travel_Frequently', 'BusinessTravel_Travel_Rarely',
       'Department_Human Resources', 'Department_Research & Development',
       'Department_Sales', 'JobRole_Healthcare Representative',
       'JobRole_Human Resources', 'JobRole_Laboratory Technician',
       'JobRole_Manager', 'JobRole_Manufacturing Director',
       'JobRole_Research Director', 'JobRole_Research Scientist',
       'JobRole_Sales Executive', 'JobRole_Sales Representative',
       'MaritalStatus_Divorced', 'MaritalStatus_Married',
       'MaritalStatus_Single'],
      dtype='object')

## 4. Modelo Usuario 💁🏻

In [15]:
#definimos de nuevo, nuestras variables x e y
X= data2.drop("Attrition", axis=1)
y= data2["Attrition"]

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

In [17]:
rf= RandomForestClassifier() #creamos el modelo
rf.fit(X_train,y_train) #entrenamos el modelo

RandomForestClassifier()

In [18]:
#predecimos
y_predict = rf.predict(X_test) #testeo
y_predict_train = rf.predict(X_train) #entrenamiento

In [19]:
#métrica de error
randomforest = {
        "Accuracy Test": accuracy_score(y_predict, y_test).round(3) ,
        "MSE Test": mean_squared_error(y_predict, y_test).round(3) ,
        "Precision score": precision_score(y_predict, y_test).round(3) ,
        "Recall score": recall_score(y_predict, y_test).round(3) ,
        "F1 score": f1_score(y_predict, y_test).round(3)}
pd.Series(randomforest)

Accuracy Test      0.867
MSE Test           0.133
Precision score    0.051
Recall score       0.500
F1 score           0.093
dtype: float64

## 5. Exportación del modelo y los datos 🚀

In [27]:
import pickle
pickle.dump(rf, open("Streamlit/Modelo_usuario.pkl", "wb+"))

In [25]:
#exportación de los nuevos datos
data2.to_csv("Data/Modelo_usuario.csv")