# XYZ Company has offices in four different zones. The company wishes to investigate the following :

> The mean sales generated by each zone.

> Total sales generated by all the zones for each month.

> Check whether all the zones generate the same amount of sales.

Help the company to carry out their study with the help of data provided.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from statsmodels.stats import weightstats as stests

In [4]:
dataset=pd.read_csv('D:\Sibina\ICT academy\Case Studies\Case Study#05\Sales_data_zone_wise.csv')
dataset.head()

Unnamed: 0,Month,Zone - A,Zone - B,Zone - C,Zone - D
0,Month - 1,1483525,1748451,1523308,2267260
1,Month - 2,1238428,1707421,2212113,1994341
2,Month - 3,1860771,2091194,1282374,1241600
3,Month - 4,1871571,1759617,2290580,2252681
4,Month - 5,1244922,1606010,1818334,1326062


In [5]:
dataset.shape

(29, 5)

In [6]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29 entries, 0 to 28
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Month     29 non-null     object
 1   Zone - A  29 non-null     int64 
 2   Zone - B  29 non-null     int64 
 3   Zone - C  29 non-null     int64 
 4   Zone - D  29 non-null     int64 
dtypes: int64(4), object(1)
memory usage: 1.3+ KB


# 1. The mean sales generated by each zone.

In [7]:
d=dataset.describe()
d[1:2]

Unnamed: 0,Zone - A,Zone - B,Zone - C,Zone - D
mean,1540493.0,1755560.0,1772871.0,1842927.0


Inference:
Zone-A is underperforming compared to sales in other regions.

Zone-D has the highest sales mean.

# 2. Total sales generated by all the zones for each month.

In [8]:
dataset["Monthly Sales(Sum of all zones)"] = dataset['Zone - A']+dataset['Zone - B']+dataset['Zone - C']+dataset['Zone - D']
dataset

Unnamed: 0,Month,Zone - A,Zone - B,Zone - C,Zone - D,Monthly Sales(Sum of all zones)
0,Month - 1,1483525,1748451,1523308,2267260,7022544
1,Month - 2,1238428,1707421,2212113,1994341,7152303
2,Month - 3,1860771,2091194,1282374,1241600,6475939
3,Month - 4,1871571,1759617,2290580,2252681,8174449
4,Month - 5,1244922,1606010,1818334,1326062,5995328
5,Month - 6,1534390,1573128,1751825,2292044,7151387
6,Month - 7,1820196,1992031,1786826,1688055,7287108
7,Month - 8,1625696,1665534,2161754,2363315,7816299
8,Month - 9,1652644,1873402,1755290,1422059,6703395
9,Month - 10,1852450,1913059,1754314,1608387,7128210


# 3. Check whether all the zones generate the same amount of sales.

To check this, One way ANOVA test can be used

Null hypothesis(Ho):The Zones generate same amount of sales.

Alternate Hypothesis(Ha):The Zones generate different amount of sales.

Let the significance level be 5%.

In [10]:
alpha=0.05
dfn=4-1;
dfd=29-1
critical_value=stats.f.ppf(alpha, dfn, dfd)

In [11]:
statistic,pvalue=stats.f_oneway(dataset['Zone - A'],dataset['Zone - B'],dataset['Zone - C'],dataset['Zone - D'])
if pvalue<alpha:
    print('We Reject Null Hypothesis(Ho); The Zones generate different amount of sales')
else:
    print('We accept Null Hypothesis(Ho); The Zones generate same amount of sales')

We Reject Null Hypothesis(Ho); The Zones generate different amount of sales


In [12]:
print('Critical value(F)=',critical_value)
print('Test Statistic Value(F)=',statistic)
print("Level of Significance=",alpha)
print('p_value=',pvalue)

Critical value(F)= 0.11597074260606918
Test Statistic Value(F)= 5.672056106843581
Level of Significance= 0.05
p_value= 0.0011827601694503335


In [13]:
data2=pd.DataFrame(dataset.drop(['Month','Monthly Sales(Sum of all zones)'],axis=1).mean())
data2.columns=['Mean Sales']
data2

Unnamed: 0,Mean Sales
Zone - A,1540493.0
Zone - B,1755560.0
Zone - C,1772871.0
Zone - D,1842927.0


Inference: Mean sales in the four zones are different. This can be observed from the above set of results.