# Installing and Importing Packages and Importing the Data  

In [None]:
# Installing pydotplus package that could provide a python interface to graphviz's dot language.
# We will use this package later on to plot the feature importance.
!pip install pydotplus

In [None]:
##Importing the packages
#Data processing packages
import numpy as np 
import pandas as pd 

#Visualization packages
import matplotlib.pyplot as plt
import seaborn as sns 

#Machine Learning packages
from sklearn.svm import SVC,NuSVC
#from xgboost import XGBClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB,MultinomialNB
from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.tree import DecisionTreeClassifier, ExtraTreeClassifier
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis, LinearDiscriminantAnalysis
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix

import os
print(os.listdir("../input"))

In [None]:
# Importing the Dataset
dataset = pd.read_csv("../input/WA_Fn-UseC_-HR-Employee-Attrition.csv")
dataset.head()

In [None]:
# Shuffle the Dataset.
dataset = dataset.sample(frac=1,random_state=4)

In [None]:
# Check if our Dataset has any nan values
dataset.isnull().values.any()

In [None]:
sns.countplot(x = dataset['Age'], y=None, hue =dataset['Attrition']).set_title( "Attrition rate with the Age")

In [None]:
sns.countplot(x = dataset['Department'], y=None, hue =dataset['Attrition']).set_title( "Attrition distribution in different department")

In [None]:
sns.countplot(x = dataset['DistanceFromHome'], y=None, hue =dataset['Attrition']).set_title( "Attrition distribution depending on distance from home to company")

In [None]:
sns.countplot(x = dataset['YearsAtCompany'], y=None, hue =dataset['Attrition']).set_title( "Age distribution depending on years at the company")

In [None]:
# Build a histogram that shows the attrition in the company 
sns.countplot(x = 'Attrition',data= dataset,palette = "Set2").set_title('Attrition')

***************************

# Cleaning the Data and Data visualization

At first, we have to drop the unecessary and unsigneficant features such as: **EmployeeCount**, **employeeNumber**, **Over18**, **StandardHours** 

In [None]:
ToDrop = ["EmployeeCount", "EmployeeNumber", "Over18", "StandardHours"]
dataset = dataset.drop(ToDrop, axis=1)

#### WARNING

To Check: **NumCompaniesWorked** 

Now, we will devide the dataset into 3 datasets to visualize the correlation between the ontinuous values and the target value:

**DisData:** for the discreete values

**ContData:** for the continuous values

**Y:** for the target values

In [None]:
Y = dataset.loc[:, "Attrition"]

In [None]:
X = dataset.loc[:, dataset.columns !="Attrition"]

Continuous = ["Age", "DailyRate", "DistanceFromHome", "HourlyRate", "MonthlyIncome", "MonthlyRate", 
              "PercentSalaryHike","TotalWorkingYears", "TrainingTimesLastYear", "YearsAtCompany", "YearsInCurrentRole", 
              "YearsSinceLastPromotion", "YearsWithCurrManager"]

ContData = X[Continuous].copy()
ContData.head()

In [None]:
Discreet = ['BusinessTravel', 'Department', 'Education', 'EducationField', 'EnvironmentSatisfaction',
            'Gender', 'JobInvolvement', 'JobLevel', 'JobRole','JobSatisfaction', 'MaritalStatus',
            'NumCompaniesWorked', 'OverTime', 'PerformanceRating', 'RelationshipSatisfaction', 
            'StockOptionLevel', 'WorkLifeBalance']

DisData = X[Discreet].copy()
DisData.head()

In [None]:
plt.figure(figsize =(15,8))
sns.heatmap(ContData.corr(),annot=True,cmap='viridis')
plt.show()

In [None]:
Test = DisData["Department"]
Test1 = pd.get_dummies(Test, drop_first= True)
Test1

************************

# Preparing our datasets

**First, we have to convert the categorical values to numerical ones with getdummies and seperate the features and the target value**

In [None]:
X, Y = dataset.loc[:, dataset.columns !="Attrition"], dataset.loc[:, "Attrition"]
X = pd.get_dummies(X, drop_first= True) # X has 44 columns
Y = pd.get_dummies(Y, drop_first= True)

In [None]:
X.head()

In [None]:
# transform the Y collumn to an array
Y = np.ravel(Y)
Y.shape

# Features' Selection
**There are lots of methods to select features such as:**

## Univariate feature selection
Univariate feature selection works by selecting the best features based on univariate statistical tests.

### SelectKBes(k_best), SelectPercentile(percentile), SelectFpr(fpr), SelectFdr(fdr), SelectFwe(fwe) ...

* **SelectKBest:** Removes all but the K highest scoring features

* **SelectPercentile:** Leaves a percentage of the features (Percentile is the Percent of features to keep)

* **FPR(false positive rate):** Selects the pvalues below alpha based on a FPR test.

    FPR test stands for False Positive Rate test. It controls the total amount of false detections.

* **FDR(false discovery rate):** 

* **FWE(family wise error):** Selects the p-values corresponding to Family-wise error rate

==> These objects take as input a **scoring function** that returns univariate scores and p-values (or only scores for SelectKBest and SelectPercentile)

We have as exemples of scoring functions: **Chi2, f-classif, mutual_info_classif**

==> The scoring function takes two arrays X and y, and returning a pair of arrays **(scores, pvalues)** or a single array with scores. Default is f_classif (see below “See also”). The default function only works with classification tasks. its **attributes** are:
    
   * **scores_ : array-like, shape=(n_features,)** ==> Scores of features.

   * **pvalues_ : array-like, shape=(n_features,)**  ==> p-values of feature scores, None if score_func returned scores only.



#### Chi2: It's based on Chi-squared statistical test
The null hypothesis for chi2 test is that "**two categorical variables are independent**". So a **higher value of chi2** statistic means "**two categorical variables are dependent**" and MORE USEFUL for classification.

In other words, Chi2 test calculates the dependence between the features and the target value, then SelectKBest gives you the best features based on higher chi2 values.

In [None]:
#importing the necessary librairies
from sklearn.feature_selection import GenericUnivariateSelect
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import SelectPercentile
from sklearn.feature_selection import chi2

# printing the indexes of the selected features
#V = GenericUnivariateSelect(chi2, 'k_best', param=40).fit(X,Y).get_support(indices=True)
#X.iloc[:,V].columns

In [None]:
# Build the function that selects the best k features based on chi2 statistical test
def selectbest_chi2(X,Y,k):
    #Selecting the best k features
    X_new = GenericUnivariateSelect(chi2, 'k_best', param=k).fit_transform(X, Y)
    ## X_new = GenericUnivariateSelect(chi2, 'percentile', param=k).fit_transform(X, Y)
    return X_new

#### f_classif: It computes the ANOVA (Analysis of Variance) F-value for the provided sample.

F-tests are named after its test statistic, F. The F-statistic is simply a ratio of two variances. Variances are a measure of dispersion, or how far the data are scattered from the mean. Larger values represent greater dispersion.

The F-statistic is this ratio:

**F = variation between sample means / variation within the samples**

In other words, f_classif classify the features by the F value that means it prioritizes the features that have a big variation between their means.

In [None]:
#importing the necessary librairie
from sklearn.feature_selection import GenericUnivariateSelect
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import SelectPercentile
from sklearn.feature_selection import f_classif

In [None]:
# Build the function that selects the best k features based on F statistical test.
def selectbest_fclassif(X,Y,k):
    #Selecting the best k features
    X_new = GenericUnivariateSelect(f_classif, 'k_best', param=k).fit_transform(X, Y)
    return X_new

#### mutual_info_classif: It's based on mutual information
Mutual information (MI) between two random variables is a non-negative value, which measures the dependency between the variables. It is equal to zero if and only if two random variables are independent, and higher values mean higher dependency.

so, the mutual_info_classif prioritizes the most independent features and eliminates the dependent features or leaves one of them.

In [None]:
#importing the necessary librairies
from sklearn.feature_selection import GenericUnivariateSelect
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import SelectPercentile
from sklearn.feature_selection import mutual_info_classif

In [None]:
# Build the function that selects the best k features based on mutual information between the features.
def selectbest_mutualinfoclassif (X,Y,k):
    #Selecting the best k features
    X_new = GenericUnivariateSelect(mutual_info_classif, 'k_best', param=k).fit_transform(X, Y)
    return X_new

********************************

## Recursive feature elimination
Recursive feature elimination (RFE) is a feature selection method that fits a model and removes the weakest feature until the specified number of features is reached.

**Remarque:** This function doesn't work with all the models.

In [None]:
def recursive_elimination(model,X,Y):
    from sklearn.model_selection import StratifiedKFold
    from sklearn.feature_selection import RFECV
    from sklearn.datasets import make_classification

    # The "accuracy" scoring is proportional to the number of correct
    # classifications
    rfecv = RFECV(estimator=model, step=1, cv=StratifiedKFold(2),scoring='accuracy')
    rfecv.fit(X, Y)

    print("Optimal number of features : %d" % rfecv.n_features_)

    # Plot number of features VS. cross-validation scores
    plt.figure()
    plt.xlabel("Number of features selected")
    plt.ylabel("Cross validation score (nb of correct classifications)")
    plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
    plt.show()

**********************

# Feature Ranking via the models
With this function we can see the features' importance for the models based on tree's model.

With **Ploty** package we can plot the features' importance.

In [None]:
# Import statements required for Plotly 
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls

Building functions that  tell us which features within our dataset has been given most importance through the model. 

In [None]:
# Ranking function specified only to the models which are based on trees (models in DecisionTreeClassifier)
def Ranking(model,title):
    trace = go.Scatter(
        y = model.feature_importances_,
        x = X.columns.values,
        mode='markers',
        marker=dict(
            sizemode = 'diameter',
            sizeref = 1,
            size = 13,
            #size= rf.feature_importances_,
            #color = np.random.randn(500), #set color equal to a variable
            color = model.feature_importances_,
            colorscale='Portland',
            showscale=True
        ),
        text = X.columns.values
    )
    data = [trace]

    layout= go.Layout(
        autosize= True,
        hovermode= 'closest',
         xaxis= dict(
             ticklen= 5,
             showgrid=False,
            zeroline=False,
            showline=False
         ),
        yaxis=dict(
            title= 'Feature Importance',
            showgrid=False,
            zeroline=False,
            ticklen= 5,
            gridwidth= 2
        ),
        showlegend= False
    )
    fig = go.Figure(data=data, layout=layout)
    py.iplot(fig,filename='scatter2010')

*********************

# Let's make some predictions!! Then compare the different models

The function below allows us to scale our dataset to standardize the range of independent variables or features of data.

In [None]:
def scaling(X_new):
    #feature scaling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_new = sc.fit_transform(X_new)
    return X_new

After scaling the data, we will use the function below which allows us to split our database into training and testing datasets

In [None]:
def splitting(X_new,Y): 
    # Splitting the dataset into the Training set and Test set
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X_new, Y, test_size = 0.25, random_state = 0)
    return X_train, X_test, y_train, y_test

Below we built a function which takes as input the test set, the train set and the model used. Then, it trains the models on the train set and makes the prediction and gives us the results of the prediction and some values that allow us to know the perfomance of our model. 

These values are:
* **accuracy = ** (number of points classified correctly) / (total number of points in your test set)
* **RocAUC score : **Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
* **Precision = :**TruePositive/(TruePositive+FalsePositive)
* **Recall = :**TruePositive/(TruePositive+FalseNegative)
* **F1_score = :**2*(Precision*Recall)/(Precision+Recall)
* **Mean Square Error : **Mean Squared Error of an estimator measures the average of error squares i.e. the average squared difference between the estimated values and true value.It is always non negative and values close to zero are better.
* **Time of execution**

In [None]:
#Function to Train and Test Machine Learning Model
def train_test_ml_model(X_train,y_train,X_test,model,Model):
    import time
    start = time.time()

    model.fit(X_train,y_train) #Train the Model
    y_pred = model.predict(X_test) #Use the Model for prediction
    end = time.time()

    # Test the Model
    from sklearn.metrics import confusion_matrix
    cm = confusion_matrix(y_test,y_pred)
    
    # Test the model accuracy
    from sklearn.model_selection import cross_val_score
    #accuracy = cross_val_score(estimator = model, X = X_new, y = y_train, cv = 4) #It gives you the accuracy of each training
    accuracy = round(100*np.trace(cm)/np.sum(cm),1)
    
    # Test the RocAUC of the model
    #from sklearn.metrics import roc_auc_score
   # roc_value = roc_auc_score(y_test, y_pred)
    
    #Precision,Recall,F1_score
    Precision = cm[0][0] / (cm[0][0] + cm[0][1])
    Recall = cm[0][0] / (cm[0][0] + cm[1][0])
    F1 = 2*(Precision*Recall)/(Precision+Recall)

    #Mean square error
    #from sklearn.metrics import mean_squared_error
    #MSE = mean_squared_error(y_test, y_pred)
    
    #Plot/Display the results
    cm_plot(cm,Model)
    
    print('The Wrong predicted points are:')
    Wrong_Prediction(X_test,y_test,y_pred)

    Results = pd.DataFrame(columns=['accuracy','Precision = TP/(TP+FP)','Recall = TP/(TP+FN)',
                                    'F1_score','Time of execution'], index=[Model])
    results = {'accuracy':accuracy,'Precision = TP/(TP+FP)':Precision,
         'Recall = TP/(TP+FN)':Recall,'F1_score':F1,'Time of execution':end - start} 
    Results.loc[Model] = pd.Series(results)
    return (Results)

Make a function to plot the confusion matrix. It's a square matrix that contains 4 blocs which are:
* **True Positive (TP) : ** Real and predicted values are YES_Attrition
* **False Positive (FP) : ** Real and Prredicted values are NO_Attrition
* **True Negative (TN) : ** Real values are YES_Attrition and predicted ones are NO_Attrition
* **False Negative (FN) : ** Real values are NO_Attrition and predicted ones are YES_Attrition

In [None]:
#Function to plot Confusion Matrix
def cm_plot(cm,Model):
    plt.clf()
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.viridis)
    classNames = ['Negative','Positive']
    plt.title('Comparison of Prediction Result for '+ Model)
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    tick_marks = np.arange(len(classNames))
    plt.xticks(tick_marks, classNames, rotation=45)
    plt.yticks(tick_marks, classNames)
    s = [['TN','FP'], ['FN', 'TP']]
    for i in range(2):
        for j in range(2):
            plt.text(j,i, str(s[i][j])+" = "+str(cm[i][j]))
    plt.show()

The function below, gives us the indices of the wrong predicted values.

In [None]:
def Wrong_Prediction (X_test,y_test,y_pred):
    WIndex = []
    for i in range(len(y_test)):
        if y_test[i] != y_pred[i] :
            WIndex.append(i)
    print (WIndex)

**Now, let's test some models**

First of all, we have to split our data into train data and test data

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)

## Logistic Regression

In this model we will use different scoring function of Univariate feature selection and compare their performance, then in the next models we will only use the scoring function that outperforms the others in this model.

In [None]:
from sklearn.linear_model import LogisticRegression  #Import package related to Model
Model = "LogisticRegression"
model= LogisticRegression(C=0.35000000000000003, solver="newton-cg", max_iter=200) #Create the Model

#### Selectbest_chi2

In [None]:
X_new = selectbest_chi2(X,Y,40)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)
Results = pd.DataFrame()
Results = train_test_ml_model(X_train,y_train,X_test,model,Model)

After getting our model results and to improve its prformance, we will use now the features' recursive elimination methods wich gives us the optimal number of features

In [None]:
recursive_elimination(model,X_train,y_train)

Now, we train our model with the new number of features

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)

In [None]:
X_new = selectbest_chi2(X,Y,37)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)
results_logistic = pd.DataFrame()
results_logistic = train_test_ml_model(X_train,y_train,X_test,model,Model)

Results = Results.append(results_logistic.iloc[0,:])
Results

==> as we suspected, the performance of our model has improved

We will do the same steps in the other models

## Random Forest
Random Forests are called Bagging Meta-Estimators.

In [None]:
from sklearn.ensemble import RandomForestClassifier  #Import package related to Model
Model = "RandomForestClassifier"
model= RandomForestClassifier(n_estimators=100, bootstrap = True, max_features = 'sqrt') #Create the Model

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)
X_new = selectbest_chi2(X,Y,40)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)
results = pd.DataFrame()
results = train_test_ml_model(X_train,y_train,X_test,model,Model)


In [None]:
recursive_elimination(model,X_train,y_train)

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)
X_new = selectbest_chi2(X,Y,25)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)
results_forest = pd.DataFrame()
results_forest = train_test_ml_model(X_train,y_train,X_test,model,Model)

results = results.append(results_forest.iloc[0,:])
results

Shown below is an Interactive Plotly diagram of the various feature importances for the Random Forest.

In [None]:
title= 'Random Forest Feature Importance'
Ranking(model,title)

## Kernel SVM

Linear Kernel

In [None]:
from sklearn import svm  #Import package related to Model
Model = "LinearSVM"
model = svm.SVC(kernel='linear', C=1.0)  # Create the Model

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)
X_new = selectbest_chi2(X,Y,40)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)

results = pd.DataFrame()
results = train_test_ml_model(X_train,y_train,X_test,model,Model)

In [None]:
recursive_elimination(model,X_train,y_train)

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)
X_new = selectbest_chi2(X,Y,27)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)
results_linearSVM = pd.DataFrame()
results_linearSVM = train_test_ml_model(X_train,y_train,X_test,model,Model)

results = results.append(results_linearSVM.iloc[0,:])
results

## XGBoost

sequential decision trees (one after the other) but we build these trees with boosting techniques and with according weights to each point
these weights are initially equal, if a point get predicted wrongly, we increase its weight. 

In [None]:
from xgboost import XGBClassifier  #Import package related to Model
Model = "XGBClassifier"
model=XGBClassifier() #Create the Model

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)
X_new = selectbest_chi2(X,Y,40)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)

results = pd.DataFrame()
results = train_test_ml_model(X_train,y_train,X_test,model,Model)

In [None]:
recursive_elimination(model,X_train,y_train)

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)
X_new = selectbest_chi2(X,Y,30)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)
results_XGBC = pd.DataFrame()
results_XGBC = train_test_ml_model(X_train,y_train,X_test,model,Model)

results = results.append(results_XGBC.iloc[0,:])
results

Shown below is an Interactive Plotly diagram of the various feature importances for the XGBoost.

In [None]:
title= 'XGBoost Feature Importance'
Ranking(model,title)

## Let's make some combinations!!

**First Combination**: LogisticRegression, RandomForestClassifier and SVM

In [None]:
import matplotlib.gridspec as gridspec
import itertools
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import EnsembleVoteClassifier
from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions

# Initializing Classifiers
clf1 = LogisticRegression(C=0.35000000000000003, solver="newton-cg", max_iter=200) #Create the First Model
clf2 = RandomForestClassifier(n_estimators=100, bootstrap = True, max_features = 'sqrt') #Create the Second Model
clf3 = svm.SVC(kernel='linear', C=1.0, probability=True)  # Create the Third Model

eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[3, 1.5, 2], voting='soft')


model = eclf
Model = "Combination1"

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)
X_new = selectbest_chi2(X,Y,40)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)

results_Combination1 = pd.DataFrame()
results_Combination1 = train_test_ml_model(X_train,y_train,X_test,model,Model)
results_Combination1

**Second Combination**: LogisticRegression, XGBoost and SVM

In [None]:
import matplotlib.gridspec as gridspec
import itertools
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from xgboost import XGBClassifier  #Import package related to Model
from mlxtend.classifier import EnsembleVoteClassifier
from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions

# Initializing Classifiers
clf1 = LogisticRegression(C=0.35000000000000003, solver="newton-cg", max_iter=200) #Create the First Model
clf2 = XGBClassifier()  #Create the Second Model
clf3 = svm.SVC(kernel='linear', C=1.0, probability=True)  # Create the Third Model

eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[3, 1.5, 2], voting='soft')


model = eclf
Model = "LR+XGB+SVM"

In [None]:
X_train, X_test, y_train, y_test = splitting(X,Y)
X_new = selectbest_chi2(X,Y,40)
X_new = scaling(X_new)
X_train, X_test, y_train, y_test = splitting(X_new,Y)

results_Combination2 = pd.DataFrame()
results_Combination2 = train_test_ml_model(X_train,y_train,X_test,model,Model)
results_Combination2

**Below, the table that resumes the results of all the models**

In [None]:
Results = pd.DataFrame
Results = results_logistic
T = [results_linearSVM,results_forest,results_XGBC,results_Combination2]
for i in range(len (T)):
    Results = Results.append(T[i])

Results

*******************************

### Decision Tree

### Random Forest

### XGBoost

### AdaBoost

# Remarques
### * we can use the **ensemble learning** which means we combine different models and create one model which it result is the mean of the other results and this to get
    
    **1**. better accuracy (low error)
    
    **2**. Higher consistency (avoid the overfitting)
    
    **3**. Reduce bias and variance errors

The methods to apply ensemble learning are:
### * Bagging: split the train set into subsets 
### * Boosting: Also split the train set into subsets but when finding some wrong predicted points in a subset (bag), we coppy that point in the next subset and run thee model 
### * GetDummies, Label Encoder and One Hot Encoder
   Label Encoder replaces each category with a number (1,2,3,4,5...) ==> this might gives different and confusing weights to different categiries!! that's a problem
       
   To correct this problem we use the OneHot encoder which replace the numerical values with binary values
       
   ==> So, we might use simply GetDummies
       
   GetDummies = Label Encoder + OneHot Encoder
           
### * For other information on **mlxtend** check: http://joss.theoj.org/papers/10.21105/joss.00638

*******************

# Najed's Remarques

* decision tree: features' importance do not match well the tree!! find out why?

Try to plot the tree with 4 or 5 features only whch are the most important!

* Add more comments! U have to explain more chi2!!

* Absenteeism: try to classify people with the cause! See the main cause!
Add more comments! especially the correspondance between the number and the cause of absence!

