# Case Study on Testing of Hypothesis

Background:
    
A company started to invest in digital marketing as a new way of their product promotions.For that they collected data and decided to carry out a study on it.

In [56]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from scipy.stats import ttest_rel
from scipy.stats import ttest_ind
import warnings
warnings.filterwarnings("ignore")

In [57]:
# Reading the data
ReadFile = pd.read_csv('Sales_add.csv')
ReadFile

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402
5,Month-6,Region - A,Manager - B,137163,256948
6,Month-7,Region - C,Manager - C,130625,222106
7,Month-8,Region - A,Manager - A,131140,230637
8,Month-9,Region - B,Manager - C,171259,226261
9,Month-10,Region - C,Manager - B,141956,193735


In [58]:
# Check for Null values
ReadFile.isnull().sum()

Month                             0
Region                            0
Manager                           0
Sales_before_digital_add(in $)    0
Sales_After_digital_add(in $)     0
dtype: int64

# 1. The company wishes to clarify whether there is any increase in sales after stepping into digital marketing

Hypothesis:
The company wishes to clarify whether there is any increase in sales after stepping into digital marketing.
    
H0: There is no significant increase in the sales after stepping into digital marketing.
    
Ha: There is a significant increase in sales after stepping into digital marketing.

In [59]:
# Set alpha value as 0.05
alpha=.05

# Perform paired/dependent t-test to check whether there is an increase in sales
BSales = ReadFile['Sales_before_digital_add(in $)']
ASales = ReadFile['Sales_After_digital_add(in $)']

t_score,p_value = ttest_rel(BSales, ASales)
print('Results are:', 'T-Score:', t_score, '  P Value:', p_value)

Results are: T-Score: -12.09070525287017   P Value: 6.336667004575778e-11


Inference:

Since the P-Value (6.336667004575778e-11) is less than alpha (0.05), we are rejecting null hypothesis (H0) and fail to reject alternate hypothesis (Ha).

Conclusion: 

According to hypothesis testing we can conclude that there is a significant increase in sales after stepping into digital marketing.

# 2. The company needs to check whether there is any dependency between the features “Region” and “Manager”

Hypothesis: The company needs to check whether there is any dependency between the features “Region” and “Manager”.

H0: There is no dependency between the features "Region" and "Manager".

Ha: There is a dependency between the features "Region" and "Manager".

In [60]:
# Categorical Variables here are "Region and "Manager"
# To compute the Chi-square test statistic, we would need to construct a contingency table -
# using crosstab function from pandas
ReadFile1 = pd.crosstab(ReadFile.Region, ReadFile.Manager, margins=True)
ReadFile1

Manager,Manager - A,Manager - B,Manager - C,All
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Region - A,4,3,3,10
Region - B,4,1,2,7
Region - C,1,3,1,5
All,9,7,6,22


In [61]:
# Observed values from the contigency table
observed_values = ReadFile1.values
print('Observed values:-\n', observed_values)

Observed values:-
 [[ 4  3  3 10]
 [ 4  1  2  7]
 [ 1  3  1  5]
 [ 9  7  6 22]]


In [62]:
# Chi-Square test statistic
value = stats.chi2_contingency(ReadFile1)
value

(3.050566893424036,
 0.962256341757093,
 9,
 array([[ 4.09090909,  3.18181818,  2.72727273, 10.        ],
        [ 2.86363636,  2.22727273,  1.90909091,  7.        ],
        [ 2.04545455,  1.59090909,  1.36363636,  5.        ],
        [ 9.        ,  7.        ,  6.        , 22.        ]]))

In [63]:
# Expected values
expected_value = value[3]
expected_value

array([[ 4.09090909,  3.18181818,  2.72727273, 10.        ],
       [ 2.86363636,  2.22727273,  1.90909091,  7.        ],
       [ 2.04545455,  1.59090909,  1.36363636,  5.        ],
       [ 9.        ,  7.        ,  6.        , 22.        ]])

In [64]:
# Calculating Degrees of freedom
alpha=0.05
from scipy.stats import chi2

no_of_rows = len(ReadFile1.iloc[0:3,0])
no_of_columns = len(ReadFile1.iloc[0,0:3])
ddof = (no_of_rows - 1) * (no_of_columns - 1)
print('Degrees of Freedom:', ddof)

Degrees of Freedom: 4


In [67]:
# Calculating Chi-Square Statistics and critical value
chi_square = sum([(o - e)**2./e for o, e in zip(observed_values, expected_value)])
chi_square_statistics = chi_square[0] + chi_square[1]
print('Chi-Square statistics:', chi_square_statistics)

critical_value = chi2.ppf(q = 1 - alpha, df = ddof)
print('Critical_value:', critical_value)

Chi-Square statistics: 2.921995464852608
Critical_value: 9.487729036781154


In [68]:
# Calculating P-Value
p_value = 1 - chi2.cdf(x = chi_square_statistics, df = ddof)
print('P-Value:', p_value)
print('Significance Level:', alpha)
print('Degrees of freedom:', ddof)

P-Value: 0.5709629929220089
Significance Level: 0.05
Degrees of freedom: 4


Inference:

Since the P-Value (0.5709629929220089) is greater than alpha (0.05), we are failing to rejecting null hypothesis (H0) and rejecting alternate hypothesis (Ha).

Conclusion: 

According to our hypothesis testing, we can conclude that there is no relationship between "Region" and "Manager".

                                                    -- The End -- 