# Hypothesis Test - I

## Concept session

## Demo - 6.3: Right Tailed Hypothesis Test

According to the US Department of Agriculture, the average size of farms increased in 2019 compared to 2018. In 2018, the mean farm size was 445.51 acres; In 2019, the average size was 446.92 acres. Suppose an agribusiness researcher believes that the average farm size is higher than the 2018 average of 446.92 acres.

To test this notion, data analyst Maya randomly selected 35 farms in the United States and ascertained the average size of each state from county records.

Use a 5% level of significance to test her hypothesis. Consider that the number of acres per farm is normally distributed in the population.

In [29]:
import numpy as np
import statistics as st
from scipy.stats import norm
import pandas as pd
import math

### Establish the null and alternate hypothesis

In [None]:
ho: the average size of US farm is equal to 445.5 in 2019
ha: the average size of US Farm is more than 445.5 in 2019

In [20]:
farm_df.columns

Index(['State', '2018_Number_of_farms', '2019_Number_of_farms',
       '2018_Land_in_farms(in1,000acres)', '2019_Land_in_farms(in1,000acres)',
       ' 2018_Average_farm_size(acres)', ' 2019_Average_farm_size(acres)'],
      dtype='object')

### Read data from the source

In [5]:
farm_df=pd.read_csv("DS1_C5_S6_Hypothesis_I_Concept_FarmSize_Data.csv")
farm_df.head()

Unnamed: 0,State,2018_Number_of_farms,2019_Number_of_farms,"2018_Land_in_farms(in1,000acres)","2019_Land_in_farms(in1,000acres)",2018_Average_farm_size(acres),2019_Average_farm_size(acres)
0,Alabama,39700,38800,8500,8300,214,214
1,Alaska,1000,1050,850,850,850,810
2,Arizona,19200,19000,26200,26200,1365,1379
3,Arkansas,42500,42300,13900,14000,327,331
4,California,69400,69900,24300,24300,350,348


### Determine the appropriate statistical test

In [10]:

mean_p=445.51
std_p=466.827


### Set the value of alpha

In [1]:
# given that 5% 0f significance level
alpha=0.05

### Establish the decision rule

i. If p-value < alpha : Rejection of Null Hypothesis(H0)                                                                      
ii. If -z-critical > z-statistics > +z-critical : Rejection of Null Hypothesis(H0)


### Gather the sample data

### Calculate sample mean

In [7]:
mean_s=st.mean(farm_df[' 2019_Average_farm_size(acres)'])


In [8]:
mean_s

444

### Analyze the data

In [11]:
n=50
z_statistics=(mean_s-mean_p)/(std_p/math.sqrt(n))
z_statistics

-0.02287209693508902

In [12]:
p_value=norm.sf(abs(z_statistics))
p_value

0.490876148995736

In [15]:
# right tail test we have to consider 1 alpha
z_critical=norm.ppf(1-0.05)
z_critical

1.6448536269514722

### Reach a statistical conclusion

#To find the p-value associated with a z-score in Python, we can use the scipy.stats.norm.sf() function, which uses the following syntax:

scipy.stats.norm.sf(abs(x))

p= 0.490876148995736 > alpha =0.05: Accept the Null Hypothesis

z_statistics=-0.02287209693508902 < z_critical=1.6448536269514722 : Accept the Null Hypothesis

### Make a business decision

## Demo - 6.4: Left tailed Hypothesis Test

A survey was conducted among managing directors of manufacturing plants in Glasgow, rated between 1-5 Likert scale. The mean of the survey response was 4.30 with a population standard deviation of 0.574. U.S. supply chain analysts believe that American manufacturing managers would not rate highly and conduct a hypothesis to prove their theory. Determine whether U.S. managers rate significantly lesser than the mean 4.30 ascertained in the U.K with a 10% confidence level. Use the following ratings from U.S. managers for the test.
![](rating.png)


### Establish the null and alternate hypothesis

ho: the average rating is equal to 4.3   
ha: the average rating is less than 4.3

### Determine the appropriate statistical test

In [17]:
p_mean=4.30
p_sd=0.574
s_size=32 # sample size


### Set the value of alpha

In [48]:
# confidence level =90%
# significance level = 100 -90=10% =0.1
alpha=0.1

### Establish the decision rule

i. If p-value < α : Rejection of Null Hypothesis(H0)                                                            
ii. If -z-critical > z-statistic > +z-critical : Rejection of Null Hypothesis(H0)

### Calculate sample  mean

In [49]:
sample_ratings=[3,4,5,5,4,5,5,4,4,4,4,4,4,4,4,5,4,4,4,3,4,4,4,3,5,4,4,5,4,4,4,5]
s_mean1=st.mean(sample_ratings)
s_mean1

4.15625

### Analyze the data

In [50]:
z_statistics=(s_mean1-p_mean)/(p_sd/math.sqrt(s_size))
z_statistics

-1.4166773490671232

In [61]:
p_value=norm.sf(abs(z_statistics))   # we can use cdf at place of sf nd then there is no need to use absolute function (abs)
p_value

0.07828864121333116

In [53]:
z_critical=norm.ppf(alpha)
z_critical

-1.2815515655446004

In [56]:
p_value<alpha

True

In [18]:
z_statistics<z_critical   


True

### Reach a statistical conclusion

p_value<alpha which led to the rejection of null hypothesis                                                   
z_statistic <  z_critical which led to rejection


### Make the business decision

## Demo - 6.5: Hypothesis Test with Two Samples

A random sample of the annual salary of 33 advertising managers is selected from the United States. The advertising managers are contacted by telephone and asked about their annual salary. A similar random sample was selected for 35 sales managers.

Christopher, a business analyst, tests whether there is a difference between the average wage of an advertising manager and a sales manager. Use the 5% significance level for the test.

### Read data from the source

In [20]:
wages_df=pd.read_csv("DS1_C5_S6_Hypothesis_I_Concept_Wages_Data.csv")

In [21]:
wages_df

Unnamed: 0,Advertising Manager,Sales Manager
0,74.256,71.492
1,96.234,67.814
2,89.807,56.47
3,93.261,72.401
4,103.03,71.804
5,74.195,46.394
6,75.932,54.449
7,80.742,59.676
8,39.672,63.369
9,45.652,43.649


### Calculate Sample statistic

In [22]:
am1=wages_df.iloc[:33,0]
am1

0      74.256
1      96.234
2      89.807
3      93.261
4     103.030
5      74.195
6      75.932
7      80.742
8      39.672
9      45.652
10     93.083
11     63.384
12     57.791
13     65.145
14     96.767
15     77.242
16     67.056
17     64.276
18     74.194
19     65.360
20     73.904
21     54.270
22     59.045
23     68.508
24     71.115
25     67.574
26     59.621
27     62.483
28     69.319
29     35.394
30     86.741
31     57.351
32     56.780
Name: Advertising Manager, dtype: float64

### Establish the null and alternate hypothesis

H0: μ = μ2                                                                                                               
Ha: μ != μ2

### Determine the appropriate statistical test

In [26]:
s_mean1=st.mean(am1)
print('\nmean of advertising manager is ',s_mean1)

s_sd1=st.stdev(am1)
print('\nstandard deviation of advertising manager is ',s_sd1)

s_var1=s_sd1**2
print('\nvariance of advertising manager is ',s_var1)

n=len(am1)
print('\nsize is ',n)


mean of advertising manager is  70.27830303030304

standard deviation of advertising manager is  16.179630804434414

variance of advertising manager is  261.780452967803

size is  33


In [27]:
s_mean2=st.mean(wages_df['Sales Manager'])
print('\nmean of sales manager is ',s_mean2)

s_sd2=st.stdev(wages_df['Sales Manager'])
print('\nstandard deviation of Sales manager is ',s_sd2)

s_var2=s_sd2**2
print('\nvariance of Sales Manager is ',s_var2)

n2=len(wages_df['Sales Manager'])
print('\nsize is ',n2)



mean of sales manager is  61.52468571428572

standard deviation of Sales manager is  13.298461504938054

variance of Sales Manager is  176.84907839831928

size is  35


### Set the value of alpha

In [28]:
 # 5% significance level is given
alpha=0.05

### Establish the decision rule

i. If p-value < α/2 : Rejection of Null Hypothesis(H0)                                                              
ii. If -z-critical > z-statistic > +z-critical : Rejection of Null Hypothesis(H0)


### Construct a 95% confidence interval to estimate the difference in the mean between the two departments.

### Analyze the data

In [32]:
z_statistics=(s_mean1-s_mean2)/(math.sqrt(s_var1/n+s_var2/n2))
z_statistics

2.429165013110487

In [33]:
p_value=norm.sf(abs(z_statistics))*2
p_value

0.015133642738636223

In [34]:
z_critical=norm.ppf(1-alpha/2)
z_critical

1.959963984540054

### Reach a statistical conclusion

i. If p-value < α/2 : Rejection of Null Hypothesis(H0)                                                     
ii. If -z-critical > z-statistic > +z-critical : Rejection of Null Hypothesis(H0)

    
from our study:


In [38]:
print(p_value<alpha/2)         # hence rejection will take place here
print(z_statistics>z_critical)   # hence rejection will take place

True
True


### Make a business decision

# Learning Consolidation

## Hypothesis Test with Type II Error

The recent fact reported by the New York Stock Exchange is that the average age of a female shareholder is 44 years. Evan, a stock exchange broker, compares the above-reported data with a randomly selected sample of 68 women from Chicago. Suppose the average age for shareholders in the sample is 45.1 years, with a population standard deviation of 8.7 years.

You test to determine whether Evan’s sample data differ significantly enough from the 44-year figure released by the New York Stock Exchange. The sample data declare that Chicago female shareholders are different in age from female shareholders in general. Use alpha=0.05. If no significant difference is noted, what is the probability of committing a Type II error if the average age of a female Chicago shareholder is 45 years?

### Establish the null and alternate hypothesis

### Determine the appropriate statistical test

### Set the value of alpha

### Establish the decision rule

### Analyze the data

### Reach a statistical conclusion

### Make a business decision

### Analyze the data with the average of a female is 45 