# Case Study #05 - ANOVA
    XYZ Company has offices in four different zones. The company wishes to investigate the following :
        ● The mean sales generated by each zone.
        ● Total sales generated by all the zones for each month.
        ● Check whether all the zones generate the same amount of sales.
    Help the company to carry out their study with the help of data provided.

In [21]:
import numpy as np
import pandas as pd
import scipy.stats as stats
from scipy.stats import f_oneway

In [22]:
#reading the data_set
data = pd.read_csv('Sales_data_zone_wise.csv')
data.head()

Unnamed: 0,Month,Zone - A,Zone - B,Zone - C,Zone - D
0,Month - 1,1483525,1748451,1523308,2267260
1,Month - 2,1238428,1707421,2212113,1994341
2,Month - 3,1860771,2091194,1282374,1241600
3,Month - 4,1871571,1759617,2290580,2252681
4,Month - 5,1244922,1606010,1818334,1326062


In [23]:
#performing primary analysis
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29 entries, 0 to 28
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Month     29 non-null     object
 1   Zone - A  29 non-null     int64 
 2   Zone - B  29 non-null     int64 
 3   Zone - C  29 non-null     int64 
 4   Zone - D  29 non-null     int64 
dtypes: int64(4), object(1)
memory usage: 1.3+ KB


In [24]:
data.describe()

Unnamed: 0,Zone - A,Zone - B,Zone - C,Zone - D
count,29.0,29.0,29.0,29.0
mean,1540493.0,1755560.0,1772871.0,1842927.0
std,261940.1,168389.9,333193.7,375016.5
min,1128185.0,1527574.0,1237722.0,1234311.0
25%,1305972.0,1606010.0,1523308.0,1520406.0
50%,1534390.0,1740365.0,1767047.0,1854412.0
75%,1820196.0,1875658.0,2098463.0,2180416.0
max,2004480.0,2091194.0,2290580.0,2364132.0


## 1. Mean sales generated by each zone 

In [25]:
#mean sales in zone A
print('The average sales in Zone A is: ',data['Zone - A'].mean())
#mean sales in zone B
print('The average sales in Zone B is: ',data['Zone - B'].mean())
#mean sales in zone C
print('The average sales in Zone C is: ',data['Zone - C'].mean())
#mean sales in zone D
print('The average sales in Zone D is: ',data['Zone - D'].mean())

The average sales in Zone A is:  1540493.1379310344
The average sales in Zone B is:  1755559.5862068965
The average sales in Zone C is:  1772871.0344827587
The average sales in Zone D is:  1842926.7586206896


## 2. Total sales generated by all the zones for each month.

In [32]:
zones = ['Zone - A', 'Zone - B', 'Zone - C', 'Zone - D']
#including total sales column to the dataset
data['Total_Sales'] = data[zones].sum(axis=1)
#Total sales Generated by each Zone
data[['Month', 'Total_Sales']].set_index([pd.Index(range(1,30))])

Unnamed: 0,Month,Total_Sales
1,Month - 1,7022544
2,Month - 2,7152303
3,Month - 3,6475939
4,Month - 4,8174449
5,Month - 5,5995328
6,Month - 6,7151387
7,Month - 7,7287108
8,Month - 8,7816299
9,Month - 9,6703395
10,Month - 10,7128210


## 3. Checking whether all zones generate same sales amount 
    Using One-way ANOVA :
    Level of significance, alpha = 0.05
    Null Hypothesis, H0 : Same amount of sales are generated in 4 Zones.
    Alternate Hypothesis, Ha : Different amount of sales are generated in 4 Zones.

In [29]:
f_value, p_value = stats.f_oneway(data['Zone - A'], data['Zone - B'], data['Zone - C'], data['Zone - D'])
print('f value is:',f_value)
print('p value is:',p_value)

f value is: 5.672056106843581
p value is: 0.0011827601694503335


#### Summary:
    ● p Value is smaller than 0.05 (significant value)
    ● All zones doesn't generate the same amount of sales