# Business Problem: 
***Get insights from the dataset of INX Future Inc., to find-out why the employees' Performance Index is not as per expectations and what can be done to improve the current situation.***

# Objective: 
- In this notebook we use our Trained Machine Learning Model to predict values for PerformanceRating.
- Here we use 'INX_Future_Inc_Employee_Performance_Processed_Data.xlsx

**Steps in Predict_model**

Step 1 : Import the libraries

Step 2 : Import the Processed data-set

Step 3 : Split the Processed data-set 

Step 4 : Import the Trained Model and Predict PerformanceRating(for X_test)

Step 5 : Concat Predicted and y_test

## Step 1 : Import the libraries

In [1]:
# Import the libraries
import pandas as pd #andas is for data manipulation and analysis. 

# Import Different Models 
from sklearn.linear_model import LogisticRegression
from sklearn import svm, tree
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
import xgboost
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
import joblib 

## Step 2 : Import the Processed data-set

In [2]:
#pd.set_option('display.height', 500)
#pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
df = pd.read_excel('INX_Future_Inc_Employee_Performance_Processed010_Data.xlsx',index=False)
df.head()

Unnamed: 0,EmpNumber,SalaryHike_NewCat,Env_Satis_NewCat,EmpEnvironmentSatisfaction,EmpLastSalaryHikePercent,YearsSinceLastPromotion,EmpWorkLifeBalance,ExperienceYearsInCurrentRole,EmpJobRole,EmpHourlyRate,EmpDepartment_Development,Age,PerformanceRating
0,E1001000,1,2,4,12,0,2,7,13,55,0,32,3
1,E1001006,1,2,4,12,1,3,7,13,42,0,47,3
2,E1001007,2,2,4,21,1,3,13,13,48,0,40,4
3,E1001009,1,1,2,15,12,2,6,8,73,0,41,3
4,E1001010,1,1,1,14,2,3,2,13,84,0,60,3


## Step 3 : Split the Processed data-set and import the Trained Model

In [3]:
# Create train and test splits
target_name = 'PerformanceRating'
X = df.drop('PerformanceRating', axis=1)


y=df[target_name]

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.20, random_state=0, stratify=None)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(960, 12)
(240, 12)
(960,)
(240,)


In [5]:
X_test_EmpNumber = X_test.iloc[:,0:1]
X_test_EmpNumber

Unnamed: 0,EmpNumber
198,E1001351
328,E1001582
45,E1001086
263,E1001468
621,E1002113
...,...
1127,E100863
55,E1001104
500,E1001897
687,E1002227


In [6]:
X_test = X_test.iloc[:,1:]

In [7]:
X_test.head()

Unnamed: 0,SalaryHike_NewCat,Env_Satis_NewCat,EmpEnvironmentSatisfaction,EmpLastSalaryHikePercent,YearsSinceLastPromotion,EmpWorkLifeBalance,ExperienceYearsInCurrentRole,EmpJobRole,EmpHourlyRate,EmpDepartment_Development,Age
198,1,2,4,11,7,2,8,10,51,0,36
328,1,2,3,15,0,4,7,13,56,0,32
45,1,2,3,17,1,3,2,13,93,0,34
263,1,2,3,12,0,3,0,14,76,0,32
621,2,1,2,20,0,2,2,14,77,0,38


## Step 4 : Import the Trained Model and Predict PerformanceRating(for X_test)

In [20]:
# Load the model from the file 
Xbgboostc_from_joblib = joblib.load('Xbgboost_Classifier_INX_performace_predict.pkl')  #joblib.load to deserialize a data stream

In [21]:
# Use the loaded model to make predictions 
Xbgboostc_from_joblib.predict(X_test) 

array([3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 4, 3, 3, 3, 2, 3, 2, 3, 3, 2, 3,
       3, 3, 3, 3, 4, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 4,
       2, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 4, 3, 3,
       2, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4,
       3, 3, 2, 2, 3, 2, 2, 3, 3, 3, 3, 3, 2, 3, 3, 2, 3, 3, 2, 3, 3, 3,
       3, 3, 4, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 4, 2, 3, 2, 3, 3, 4,
       3, 4, 3, 4, 2, 3, 2, 3, 3, 3, 4, 3, 3, 2, 3, 3, 3, 3, 3, 3, 4, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 4, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3,
       3, 3, 2, 3, 3, 4, 2, 3, 2, 3, 3, 3, 4, 3, 3, 2, 3, 3, 3, 3, 3, 3,
       4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 3, 3, 3, 3, 2, 4, 4, 3, 3, 3, 3,
       2, 3, 3, 3, 3, 3, 2, 2, 4, 2, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3],
      dtype=int64)

## Step 5 : Concat Predicted and y_test

In [23]:
performanceRatings_Predictions = pd.DataFrame(Xbgboostc_from_joblib.predict(X_test) )
performanceRatings_Predictions.head()

Unnamed: 0,0
0,3
1,3
2,3
3,3
4,3


In [24]:
performanceRatings_Predictions= performanceRatings_Predictions.set_index(list([X_test_EmpNumber.index]))
performanceRatings_Predictions.head()

Unnamed: 0,0
198,3
328,3
45,3
263,3
621,3


In [25]:
Predictions_DataFrame =pd.concat([X_test_EmpNumber,performanceRatings_Predictions,y_test],axis=1,join='inner',ignore_index=True)
Predictions_DataFrame.head()

Unnamed: 0,0,1,2
198,E1001351,3,3
328,E1001582,3,3
45,E1001086,3,3
263,E1001468,3,3
621,E1002113,3,3


In [26]:
Predictions_DataFrame = Predictions_DataFrame.rename(columns={1: 'Predicted PerformanceRating',0:'EmpNumber',2:'PerformanceRating'})

In [27]:
Predictions_DataFrame

Unnamed: 0,EmpNumber,Predicted PerformanceRating,PerformanceRating
198,E1001351,3,3
328,E1001582,3,3
45,E1001086,3,3
263,E1001468,3,3
621,E1002113,3,3
...,...,...,...
1127,E100863,3,3
55,E1001104,3,3
500,E1001897,3,3
687,E1002227,3,3
