The Titan Insurance Company has just installed a new incentive payment scheme for its lift policy sales force. It wants to have an early view of the success or failure of the new scheme. Indications are that the sales force is selling more policies, but sales always vary in an unpredictable pattern from month to month and it is not clear that the scheme has made a significant difference.
Life Insurance companies typically measure the monthly output of a salesperson as the total sum assured for the policies sold by that person during the month. For example, suppose salesperson X has, in the month, sold seven policies for which the sums assured are £1000, £2500, £3000, £5000, £10000, £35000. X's output for the month is the total of these sums assured, £61,500. Titan's new scheme is that the sales force receives low regular salaries but are paid large bonuses related to their output (i.e. to the total sum assured of policies sold by them). The scheme is expensive for the company, but they are looking for sales increases which more than compensate. The agreement with the sales force is that if the scheme does not at least break even for the company, it will be abandoned after six months.
The scheme has now been in operation for four months. It has settled down after fluctuations in the first two months due to the changeover.
To test the effectiveness of the scheme, Titan have taken a random sample of 30 salespeople measured their output in the penultimate month prior to changeover and then measured it in the fourth month after the changeover (they have deliberately chosen months not too close to the changeover). The outputs of the salespeople are shown in Table 1

SALESPERSON	   Old Scheme (in thousands)	New Scheme (in thousands)
1				57							62
2				103							122
3				59							54
4				75							82
5				84							84
6				73							86
7				35							32
8				110							104
9				44							38
10				82							107
11				67							84
12				64							85
13				78							99
14				53							39
15				41							34
16				39							58
17				80							73
18				87							53
19				73							66
20				65							78
21				28							41
22				62							71
23				49							38
24				84							95
25				63							81
26				77							58
27				67							75
28				101							94
29				91							100
30				50							68



In [66]:
import numpy as np
import pandas as pd

df = pd.read_csv("TitanSales.csv" , index_col="SALESPERSON")
df

Unnamed: 0_level_0,Old Scheme (in thousands),New Scheme (in thousands)
SALESPERSON,Unnamed: 1_level_1,Unnamed: 2_level_1
1,57,62
2,103,122
3,59,54
4,75,82
5,84,84
6,73,86
7,35,32
8,110,104
9,44,38
10,82,107


In [67]:
df.columns

Index(['Old Scheme (in thousands)', 'New Scheme (in thousands)'], dtype='object')

In [68]:
df.dtypes

Old Scheme (in thousands)    int64
New Scheme (in thousands)    int64
dtype: object

In [69]:
oldscheme = df["Old Scheme (in thousands)"]
newscheme = df["New Scheme (in thousands)"]


Find the mean of old scheme and new scheme column. (5 points)

In [70]:
oldmean = oldscheme.mean()
print("Mean of Old Scheme is " , oldmean)

Mean of Old Scheme is  68.03333333333333


In [71]:
newmean = newscheme.mean()
print("Mean of New Scheme is " , newmean)

Mean of New Scheme is  72.03333333333333


In [72]:
#check if the distribution is normal
from scipy.stats import ttest_1samp, ttest_ind, mannwhitneyu, levene, shapiro , ttest_rel , wilcoxon
from statsmodels.stats.power import ttest_power

t_stat, p_value = shapiro(oldscheme)
print(p_value)

0.9813658595085144


In [73]:
t_stat, p_value = shapiro(newscheme)
print(p_value)

0.5057420134544373


###### p_value for both old_scheme and new_scheme is greater than 0.05. Hence, we conclude that Data in both oldscheme and newscheme are normally distributed

Use the five percent significance test over the data to determine the p value to check new scheme has significantly raised outputs? (10 points)

Null Hypothesis : There is no significant difference between Old Scheme and New Scheme

Alternate Hypothesis : There is a significant difference

In [74]:
t_statistic, p_value = ttest_rel(oldscheme , newscheme)
print(p_value/2)
#we divide p value by 2 because this is a one tailed test

0.06528776980668831


##### P value (0.065) is more than 0.05 so Null hypothesis can be accepted. Hence, there is no significance difference between the old and new scheme

Suppose it has been calculated that in order for Titan to break even, the average output must increase by £5000 in the scheme compared to the old scheme. If this figure is alternative hypothesis, what is:

a) The probability of a type 1 error? (2.5 points)
b) What is the p- value of the hypothesis test if we test for a difference of $5000? (10 points)
c) Power of the test (5 points)

Q> Probability of Type 1 Error
#####  Probabilty of Type 1 error is alpha value i.e, 0.05

 Q> What is the p- value of the hypothesis test if we test for a difference of $5000? (10 points)

M2 = Average sums assured by salesperson AFTER changeover.
M1 = Average sums assured by salesperson BEFORE changeover. 

MD = M2 - M1   
Null Hypothesis: MD ≤ 5000  
Alternate Hypothesis: MD > 5000



In [75]:
m1 = oldmean
m2 = newmean
df['Difference'] = newscheme - oldscheme
diff = df["Difference"]
md = diff.mean()
t_statistic, p_value = ttest_1samp(diff, 5)
print("P Value is " ,p_value/2)
#we divide p value by 2 because this is a one tailed test

P Value is  0.3500667456306643


##### Since P value is greater than 0.05 we do not reject the null hypothesis.  Hence, we need not increase the avg output by 5000 for titan to  break even

  Q> Power of the test (5 points)

In [76]:
import statsmodels.stats.power as p
from statsmodels.stats.power import ttest_power


In [77]:
effect_size2 = ((newmean-oldmean) - 5) / diff.std()
effect_size2
# 5 being the hypothesised mean
# 4 (nemwean - oldmean) being the actual value of differences of mean

-0.07101745039685363

In [78]:
nobs = diff.count()
alpha = 0.05
print("Power of Test is : " , ttest_power(-0.07, nobs=30, alpha=0.05, alternative='larger'))

Power of Test is :  0.021720239625820925
