#                Case Study on Testing of Hypothesis
A company started to invest in digital marketing as a new way of their product
promotions.
For that they collected data and decided to carry out a study on it.

● The company wishes to clarify whether there is any increase in sales after
stepping into digital marketing.

● The company needs to check whether there is any dependency between the
features “Region” and “Manager”.

Help the company to carry out their study with the help of data provided.



In [1]:
# import the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from scipy.stats import ttest_ind
from scipy.stats import chi2_contingency

In [2]:
# Reading dataset to python
df=pd.read_csv('Sales_add.csv')
df

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402
5,Month-6,Region - A,Manager - B,137163,256948
6,Month-7,Region - C,Manager - C,130625,222106
7,Month-8,Region - A,Manager - A,131140,230637
8,Month-9,Region - B,Manager - C,171259,226261
9,Month-10,Region - C,Manager - B,141956,193735


# Descriptive Analytics

In [3]:
df.shape

(22, 5)

There are 22 rows and 5 columns

In [4]:
df.describe()

Unnamed: 0,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
count,22.0,22.0
mean,149239.954545,231123.727273
std,14844.042921,25556.777061
min,130263.0,187305.0
25%,138087.75,214960.75
50%,147444.0,229986.5
75%,157627.5,250909.0
max,178939.0,276279.0


It is clear that the average sales have increased from 149239.95$ to 231123.72$ after the introduction of digital marketing

# To clarify whether there is any increase in sales after stepping into digital marketing.

Ho : Sales after digital marketing  will be less than or equal to the sales before digital marketing

Ha : Sales after digital marketing will be greater than the sales before digital marketing.


In [5]:
sales_before=df[['Sales_before_digital_add(in $)']]
sales_after=df[['Sales_After_digital_add(in $)']]

In [6]:
alpha=0.05
_,p = ttest_ind(sales_before,sales_after)
print("p value :", p)

p value : [2.61436801e-16]


In [7]:
if p < alpha:
    print("Rejecting H0")
else:
    print("Failing to reject H0")

Rejecting H0


From the t-test , we can conclude that there is an increase in sales after stepping into digital marketing 

# To check whether there is any dependency between the features “Region” and “Manager”.

Ho : The  features “Region” and “Manager”  are independent.

Ha : The  features “Region” and “Manager” are dependent.

In [8]:
alpha : 0.05
crosstab = pd.crosstab(df['Region'], df['Manager'], normalize='index')
crosstab

Manager,Manager - A,Manager - B,Manager - C
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Region - A,0.4,0.3,0.3
Region - B,0.571429,0.142857,0.285714
Region - C,0.2,0.6,0.2


In [10]:
c, p, dof, expected = chi2_contingency(crosstab)
print('chi square statistic:',c)
print('pvalue:',p)
print('degrees of freedom:',dof)
print('expected value: ', expected)

chi square statistic: 0.5097129666190809
pvalue: 0.9725485584250712
degrees of freedom: 4
expected value:  [[0.39047619 0.34761905 0.26190476]
 [0.39047619 0.34761905 0.26190476]
 [0.39047619 0.34761905 0.26190476]]


In [11]:
if p < alpha:
    print("Rejecting H0")
else:
    print("Failing to reject H0")

Failing to reject H0


Here, p value > alpha. So we can conclude that the two features "Region" and "Manager"  are independent