## Case Study-Titan Insurance Company-

The Titan Insurance Company has just installed a new incentive payment scheme for its lift policy sales force. It wants to have an early view of the success or failure of the new scheme. Indications are that the sales force is selling more policies, but sales always vary in an unpredictable pattern from month to month and it is not clear that the scheme has made a significant difference.

Life Insurance companies typically measure the monthly output of a salesperson as the total sum assured for the policies sold by that person during the month. For example, suppose salesperson X has, in the month, sold seven policies for which the sums assured are £1000, £2500, £3000, £5000, £10000, £35000. X's output for the month is the total of these sums assured, £61,500. Titan's new scheme is that the sales force receives low regular salaries but are paid large bonuses related to their output (i.e. to the total sum assured of policies sold by them). The scheme is expensive for the company, but they are looking for sales increases which more than compensate. The agreement with the sales force is that if the scheme does not at least break even for the company, it will be abandoned after six months.

The scheme has now been in operation for four months. It has settled down after fluctuations in the first two months due to the changeover.

To test the effectiveness of the scheme, Titan have taken a random sample of 30 salespeople measured their output in the penultimate month prior to changeover and then measured it in the fourth month after the changeover (they have deliberately chosen months not too close to the changeover). The outputs of the salespeople are shown in Table 1

In [2]:
import pandas as pd
import os
import seaborn as sns
import numpy as np
from scipy.stats import ttest_ind, ttest_1samp,wilcoxon,levene,shapiro
from statsmodels.stats.power import ttest_power

In [3]:
"""
* The table 1 given in the project is saved as csv & imported using dataframes 
* The following assumes 'Jupyter Notebook' is run where the dataset 'Titan Insurance Case Study Dataset.csv' resides
* Configure os path accordingly if your current working directory is different
"""
df_titan_insurance = pd.read_csv(os.path.join('', 'Titan Insurance Case Study Dataset.csv'),index_col=0)
df_titan_insurance.head(10)

Unnamed: 0_level_0,Old Scheme (in thousands),New Scheme (in thousands)
SALESPERSON,Unnamed: 1_level_1,Unnamed: 2_level_1
1,57,62
2,103,122
3,59,54
4,75,82
5,84,84
6,73,86
7,35,32
8,110,104
9,44,38
10,82,107


## 1.Find the mean of old scheme and new scheme column. (5 points)

In [4]:
old_scheme_mean = df_titan_insurance['Old Scheme (in thousands)'].mean()
new_scheme_mean = df_titan_insurance['New Scheme (in thousands)'].mean()
print("The Old Scheme Mean is : {0} \nThe New Scheme Mean is : {1}".format(old_scheme_mean,new_scheme_mean))
df_titan_insurance.describe()

The Old Scheme Mean is : 68.03333333333333 
The New Scheme Mean is : 72.03333333333333


Unnamed: 0,Old Scheme (in thousands),New Scheme (in thousands)
count,30.0,30.0
mean,68.033333,72.033333
std,20.45598,24.062395
min,28.0,32.0
25%,54.0,55.0
50%,67.0,74.0
75%,81.5,85.75
max,110.0,122.0


## 2.Use the five percent significance test over the data to determine the p value to check new scheme has significantly raised outputs? (10 points)

In [5]:
df_old_scheme = df_titan_insurance['Old Scheme (in thousands)']
df_new_scheme = df_titan_insurance['New Scheme (in thousands)']

# Test of Normality - H0 p-value >= 0.05 | H1 p-vale < 0.05
def accept_or_reject(p_value, xfor):
    if(p_value < (5/100)):
        print('Rejecting Null Hypothesis for {0} as p_value {1} is less than 0.05'.format(xfor,p_value))
    else:
        print('May Accept Null Hypothesis for {0} as p_value {1} is more than 0.05'.format(xfor,p_value))
        
# Old Scheme
tstat_old, p_val_old = shapiro(df_old_scheme)
accept_or_reject(p_val_old,"Old Scheme")
        
# New Scheme
tstat_new, p_val_new = shapiro(df_new_scheme)
accept_or_reject(p_val_new,"New Scheme")

# Output : p_value > 0.05 - The data is normal 

May Accept Null Hypothesis for Old Scheme as p_value 0.9813673496246338 is more than 0.05
May Accept Null Hypothesis for New Scheme as p_value 0.5057376623153687 is more than 0.05


In [6]:
#Test of variances - H0 : Variances are equal | H1 - Variances are not equal
display(levene(df_old_scheme,df_new_scheme))
# Output : p_value > 0.05 - Hence 2 variances are equal 

LeveneResult(statistic=1.063061539437244, pvalue=0.30679836081811235)

In [10]:
t_stat_ind, p_value_ind = ttest_ind(df_old_scheme,df_new_scheme)
print(ttest_ind(df_old_scheme,df_new_scheme))

t_stat_1samp, p_value_1samp = ttest_1samp(df_old_scheme - df_new_scheme, 0) 
print(ttest_1samp(df_old_scheme - df_new_scheme, 0))

print('Based on the p_values , the new scheme has significantly raised outputs')

Ttest_indResult(statistic=-0.6937067608923764, pvalue=0.49063515686248105)
Ttest_1sampResult(statistic=-1.5559143823544377, pvalue=0.13057553961337662)
Based on the p_values , the new scheme has significantly raised outputs


## 3.What conclusion does the test (p-value) lead to? (2.5 points)

In [37]:
# H0 - Null Hypothesis - New Scheme may have raised outputs
# H1 - ALternate Hypothesis - New Scheme does not have raised outputs or it is similar to the old scheme.
# P_value is > 0.05 - Hence we fail to reject Null Hypothesis , we may accept the null hypothesis. 

## 4. Suppose it has been calculated that in order for Titan to break even, the average output must increase by £5000 in the scheme compared to the old scheme. If this figure is alternative hypothesis, what is:

    2. The probability of a type 1 error? (2.5 points)
    3. What is the p- value of the hypothesis test if we test for a difference of $5000? (10 points)
    4. Power of the test (5 points)

In [23]:
# H0: The mean diff > 5 (μ1 - μ2 > 5),
# H1 : The mean diff < 5 (μ1 - μ2 < 5),

# Standard Error = sqrt[((s1*s1)/n1) + ((s2*s2)/n2)]
# t_statistic = [ (x1 - x2) - d ] / Standard Error

old_scheme_std = df_titan_insurance['Old Scheme (in thousands)'].std()
new_scheme_std = df_titan_insurance['New Scheme (in thousands)'].std()
n1 = df_titan_insurance['Old Scheme (in thousands)'].count()
n2 = df_titan_insurance['New Scheme (in thousands)'].count()

print(old_scheme_std,new_scheme_std)

std_error = np.sqrt(((old_scheme_std * old_scheme_std)/n1)+((new_scheme_std*new_scheme_std)/n2))

print('The standard error is : {0}'.format(std_error))

tstat = (((new_scheme_mean - old_scheme_mean)-5)/std_error)

print('The t_statistic value is : {0}'.format(tstat))
#  P(T < -0.173426) = 0.4453
# Based on the above mentioned p_value analysis - the p_value is > 0.05 we should not reject the null hypothesis.

20.455980212074454 24.06239494677769
The standard error is : 5.766125148981461
The t_statistic value is : -0.1734266902230941


In [25]:
# Caculating power of test
pow_test = (np.mean(df_new_scheme) - np.mean(df_old_scheme)) / np.sqrt(((n1-1)*np.var(df_old_scheme)+(n1-1)*np.var(df_new_scheme)) / n2+n1-2)
print (pow_test)
print(ttest_power(pow_test, nobs=60, alpha=0.05, alternative='two-sided'))

0.12909555513169643
0.16610169779030406
