# **Formula Based Attrition Model**
On this notebook, we take our base Attrition formula from the previous notebook and create a formula based model, meaning, a program where we can input OCEAN Personality results and it will give us a percentage for the probability of attrition / turnover.<br>
As usual, let us import the libraries we will use:

In [76]:
import numpy as np
import pandas as pd

First include the variables we need for our formula:

In [2]:
#Formula Variables
m = [-0.0422, -0.2138, -0.0714, -0.1462,  0.3032] # Variable for the means of correlations
minres = -0.4736
maxres = 0.3032
divisor = 3.74

Now we can building a function based model using the formula:<br>
AttRiskFinal = ((((O*Omean + C*Cmean + A*Amean + E*Emean + N*Nmean) - -0.4736) / (0.3032 - -0.4736))) / 3.74

In [77]:
# The input data for our function will be an array. More info below.
# The results input should be a list of lists that includes an id number followed by each of the OCEAN results in that order.
def PredAttRisk(results):
    prediction = []
    for l in results:
        risk = (((m[0]*l[1] + m[1]*l[2] + m[2]*l[3] + m[3]*l[4] + m[4]*l[5]) - minres) / (maxres - minres)) / divisor
        risk = round(risk, 2)
        print('Risk for ID# {} is {}%.'.format(str(l[0]), risk*100)) #The x100 is to transform the prediction into a percentage

In [33]:
# List for singular test.
test1 = [[1, 0.803, 0.886, 0.496, 0.753, 0.426]]

In [34]:
# Testing with 1 list
PredAttRisk(test1)

Risk for ID# 1 is 8.0%.


In [35]:
# Now let us try it with a few more lists at the same time
test2 = [[1, 0.803, 0.886, 0.496, 0.753, 0.426],
         [2, 0.503, 0.766, 0.855, 0.621, 0.519],
         [3, 0.731, 0.432, 0.631, 0.859, 0.622],
         [4, 0.600, 0.616, 0.716, 0.636, 0.563],
         [5, 0.462, 0.368, 0.425, 0.526, 0.942],
        ]

In [36]:
PredAttRisk(test2)

Risk for ID# 1 is 8.0%.
Risk for ID# 2 is 10.0%.
Risk for ID# 3 is 13.0%.
Risk for ID# 4 is 12.0%.
Risk for ID# 5 is 19.0%.


In [39]:
#Testing with a dataframe
df1 = pd.read_csv('DATA_OCEAN/big_five_scores.csv')
df1.head()

Unnamed: 0,case_id,country,age,sex,agreeable_score,extraversion_score,openness_score,conscientiousness_score,neuroticism_score
0,1,South Afri,24,1,0.753333,0.496667,0.803333,0.886667,0.426667
1,3,UK,24,2,0.733333,0.68,0.786667,0.746667,0.59
2,4,USA,36,2,0.88,0.77,0.86,0.896667,0.296667
3,5,UK,19,1,0.69,0.616667,0.716667,0.636667,0.563333
4,6,UK,17,1,0.6,0.713333,0.646667,0.633333,0.513333


In [79]:
test3 = []
for ind in df1.head(15).index: 
     test3.append([df1['case_id'][ind], 
                   df1['openness_score'][ind], 
                   df1['conscientiousness_score'][ind], 
                   df1['extraversion_score'][ind], 
                   df1['agreeable_score'][ind], 
                   df1['neuroticism_score'][ind]]) 

In [80]:
PredAttRisk(test3)

Risk for ID# 1 is 8.0%.
Risk for ID# 3 is 10.0%.
Risk for ID# 4 is 5.0%.
Risk for ID# 5 is 11.0%.
Risk for ID# 6 is 11.0%.
Risk for ID# 7 is 13.0%.
Risk for ID# 8 is 9.0%.
Risk for ID# 9 is 10.0%.
Risk for ID# 10 is 13.0%.
Risk for ID# 11 is 12.0%.
Risk for ID# 12 is 11.0%.
Risk for ID# 13 is 4.0%.
Risk for ID# 14 is 12.0%.
Risk for ID# 15 is 11.0%.
Risk for ID# 16 is 16.0%.
