In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import scipy.stats as stats
from   scipy.stats               import ttest_1samp, ttest_ind
import statsmodels.stats.api as sm

#### The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square feet) for A shingles and 31 for B shingles.

In [2]:
a_b_data = pd.read_csv('A+&+B+shingles.csv')
a_b_data.head()

Unnamed: 0,A,B
0,0.44,0.14
1,0.61,0.15
2,0.47,0.31
3,0.3,0.16
4,0.15,0.37


In [3]:
a_b_data.isnull().sum()

A    0
B    5
dtype: int64

In [4]:
row, col = a_b_data.shape
print('Total Number of Rows:', row, '\n''Total Number of columns:',col)

Total Number of Rows: 36 
Total Number of columns: 2


#### 3.1 Do you think there is evidence that means moisture contents in both types of shingles are within the permissible limits? State your conclusions clearly showing all steps.

In [5]:
a_b_data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
A,36.0,0.316667,0.135731,0.13,0.2075,0.29,0.3925,0.72
B,31.0,0.273548,0.137296,0.1,0.16,0.23,0.4,0.58


#### Step 1: Define Null and alternate hypothesis for sample A


Testing whether the moisture content is greater the permissible limit

The null hypothesis states that the moisture content of sample A is less than or equal to the permissible limit, 𝜇 ≤ 0.35

The alternative hypothesis states that the moisture content of sample A is greater than permissible limit, 𝜇 > 0.35

𝐻0 : 𝜇 ≤ 0.35

𝐻𝐴 : 𝜇 > 0.35

#### Step 2: Decide the significance level

In [6]:
α = 0.05

#### Step 3: Identify the test statistic

We have two samples (A and B) and we do not know the population standard deviation. Sample sizes for both samples are not the same. The sample size is , n > 30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample test for A sample or One tail test for sample A

In [7]:
a_b_data.mean()

A    0.316667
B    0.273548
dtype: float64

In [8]:
a_b_data.std()

A    0.135731
B    0.137296
dtype: float64

#### Step 4: Calculate the p - value and test statistic

In [9]:
n_A = 36
𝜇 = 0.35
mean_A = Xbar_A =  0.316667
std_A = s_A = 0.135731

**Test statistics**

In [10]:
t_test_A = (mean_A - 𝜇)*6/s_A
t_test_A

-1.4734880020039638

**P Value**

In [11]:
P_value_A = (1 - stats.t.cdf(t_test_A,35))
P_value_A

0.9252214399379082

**If P value is greater than α then we have to accept null hypothesis**

In [12]:
P_value_A > α
# Fail to reject H0

True

### OR

In [13]:
t_statistic, p_value_A = ttest_1samp(a_b_data['A'],0.35,  alternative='greater')
print('tstat',t_statistic)    
print('P Value',p_value_A)

tstat -1.4735046253382782
P Value 0.9252236685509249


**If the p-value is greater than alpha, you accept the null hypothesis. If it is less than alpha, you reject the null hypothesis.**

In [14]:
print ("one-sample t-test p-value =", p_value_A)

if (p_value_A) > α:
     print('We do not have enough evidence to reject the null hypothesis in favour of alternative hypothesis')
    
else:
    print('We have enough evidence to reject the null hypothesis in favour of alternative hypothesis')
print('We conclude that the moisture content is less than permissible limit in sample A.')

one-sample t-test p-value = 0.9252236685509249
We do not have enough evidence to reject the null hypothesis in favour of alternative hypothesis
We conclude that the moisture content is less than permissible limit in sample A.


#### step 1: Define Null and alternate hypothesis for sample B

Testing whether the moisture content is greater the permissible limit

The null hypothesis states that the moisture content of sample B is less than or equal to the permissible limit, 𝜇 ≤ 0.35

The alternative hypothesis states that the moisture content of sample B is greater than permissible limit, 𝜇 > 0.35

𝐻0 : 𝜇 ≤ 0.35

𝐻𝐴 : 𝜇 > 0.35

#### Step 2: Decide the significance level

α = 0.05

#### Step 3: Identify the test statistic

We have two samples (A and B) and we do not know the population standard deviation. Sample sizes for both samples are not the same. The sample size is , n > 30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample test for B sample. one tail test for Sample B

#### Step 4: Calculate the p - value and test statistic

In [15]:
n_B = 31
𝜇 = 0.35
mean_B = Xbar_B = 0.273548
std_B = s_B = 0.137296

**Test statistics**

In [16]:
t_test_B = (mean_B - 𝜇)*(np.sqrt(31))/s_B
t_test_B

-3.100357774932122

**P Value**

In [17]:
P_value_B = (1 - stats.t.cdf(t_test_B,30))
P_value_B

0.9979096635457595

### OR

In [18]:
t_statistic, p_value_B = ttest_1samp(a_b_data['B'].dropna(),0.35, alternative='greater')
print('tstat',t_statistic)    
print('P Value',p_value_B)

tstat -3.1003313069986995
P Value 0.9979095225996808


#### Step 5: Decide to reject or accept null hypothesis

**If P value is greater than α then we have to accept null hypothesis**

In [19]:
P_value_B > 𝛼
# Fail to reject H0

True

In [20]:
print ("one-sample t-test p-value =", p_value_B)

if (p_value_B) < 𝛼:
    print('We have enough evidence to reject the null hypothesis in favour of alternative hypothesis')
    
else:
    print('We do not have enough evidence to reject the null hypothesis in favour of alternative hypothesis')
print('We conclude that the moisture content is less than permissible limit in sample B.')

one-sample t-test p-value = 0.9979095225996808
We do not have enough evidence to reject the null hypothesis in favour of alternative hypothesis
We conclude that the moisture content is less than permissible limit in sample B.


#### 3.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis and conduct the test of the hypothesis. What assumption do you need to check before the test for equality of means is performed?

#### step 1: Define Null and alternate hypothesis

In testing whether the mean for shingles A and Shingles B are the same, the null hypothesis states that the mean of shingle A to mean of shingle B are the same. The alternative hypothesis states that the means are different.

H0: 𝜇(A) = 𝜇(B)
    
HA: 𝜇(A) ≠ 𝜇(B)

#### Step 2: Decide the significance level

𝛼 = 0.05 

#### Step 3: Identify the test statistic

We have two samples and we do not know the population standard deviation. Sample sizes of both samples are not same. The sample size is , n > 30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for two sample test.

#### Step 4: Calculate the P_value and test statistic

In [21]:
t_statistic, p_value_A_B  = ttest_ind(a_b_data['A'],a_b_data['B'],nan_policy='omit')
print('tstat',t_statistic)    
print('P_Value',p_value_A_B)

tstat 1.2896282719661123
P_Value 0.2017496571835306


#### Step 5: Decide to reject or accept null hypothesis

In [22]:
print ("two-sample t-test P_value=", p_value_A_B)

if p_value_A_B < 𝛼:
    print('We have enough evidence to reject the null hypothesis in favour of alternative hypothesis')

else:
    print('We do not have enough evidence to reject the null hypothesis in favour of alternative hypothesis')
print('We conclude that mean for shingles A and shingles B are same')

two-sample t-test P_value= 0.2017496571835306
We do not have enough evidence to reject the null hypothesis in favour of alternative hypothesis
We conclude that mean for shingles A and shingles B are same
