### HR Turnover Analytics

This model is created to predict HR turnover given characteristics such as the following: 
      # sample dict for prediction request
data = {'satisfaction_level': 0.38,
         'last_evaluation': 0.53,
         'number_project': 2,
         'average_montly_hours': 157,
         'time_spend_company': 3,
         'Work_accident': 0,
         'promotion_last_5years': 0,
         'sales': 'support',
         'salary': 'low'}

Our example concerns a big company that wants to understand why some of their best and most experienced employees are leaving prematurely. The company also wishes to predict which valuable employees will leave next using an employee profile such as the above

We use the LGBMClassifier which is decribed as follows according to their documentation:

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

  - Faster training speed and higher efficiency.
  - Lower memory usage.
  - Better accuracy.
  - Support of parallel and GPU learning.
  - Capable of handling large-scale data.

In [16]:
import pickle
import numpy as np
import pandas as pd
from lightgbm import LGBMClassifier
from sklearn.preprocessing import OneHotEncoder

# Load data and save indices of columns
df = pd.read_csv('data.csv')
features = df.drop('left', 1).columns


# Fit and save an OneHotEncoder
columns_to_fit = ['sales', 'salary']
enc = OneHotEncoder(sparse=False).fit(df.loc[:, columns_to_fit])


# Transform variables, merge with existing df and keep column names
column_names = enc.get_feature_names(columns_to_fit)
encoded_variables = pd.DataFrame(enc.transform(df.loc[:, columns_to_fit]), columns=column_names)
df = df.drop(columns_to_fit, 1)
df = pd.concat([df, encoded_variables], axis=1)
    
# Fit model
X, y = df.drop('left', 1), df.loc[:, 'left']
clf = LGBMClassifier().fit(X, y)


### Dump the above objects into the file system

In [10]:
# Save the indices of columns as features
pickle.dump(features, open('../app/data/features.pickle', 'wb'))

# Save the fitted one-hot-encoder object
pickle.dump(enc, open('../app/data/encoder.pickle', 'wb'))

# Save the trained model
pickle.dump(clf, open('../app/data/model.pickle', 'wb'))

### Testing the pickled models from file system

In [17]:
# Initialize files
clf = pickle.load(open('../app/data/model.pickle', 'rb'))
enc = pickle.load(open('../app/data/encoder.pickle', 'rb'))
features = pickle.load(open('../app/data/features.pickle', 'rb'))

 # sample dict for prediction request
data = {'satisfaction_level': 0.38,
         'last_evaluation': 0.53,
         'number_project': 2,
         'average_montly_hours': 157,
         'time_spend_company': 3,
         'Work_accident': 0,
         'promotion_last_5years': 0,
         'sales': 'support',
         'salary': 'low'}

# Extract data in correct order
to_predict = [data[feature] for feature in features]

encoded_features = list(enc.transform(np.array(to_predict[-2:]).reshape(1, -1))[0])

to_predict = np.array(to_predict[:-2] + encoded_features)

# Create and return prediction
prediction = clf.predict(to_predict.reshape(1, -1))

prediction[0]

1