# A & B Shingles Case Study - Hypothesis Testing

An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of moisture the shingles contain when they are packaged. Customers may feel that they have purchased a product lacking in quality if they find moisture and wet shingles inside the packaging.   In some cases, excessive moisture can cause the granules attached to the shingles for texture and colouring purposes to fall off the shingles resulting in appearance problems. To monitor the amount of moisture present, the company conducts moisture tests. A shingle is weighed and then dried. The shingle is then reweighed, and based on the amount of moisture taken out of the product, the pounds of moisture per 100 square feet is calculated. The company claims that the mean moisture content cannot be greater than 0.35 pound per 100 square feet.
The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square feet) for A shingles and 31 for B shingles.

For the A shingles, the null and alternative hypothesis to test whether the population mean moisture content is less than 0.35 pound per 100 square feet  is given:<br>
     Ho <= 0.35<br>
     H1 > 0.35<br>

For the B shingles, the null and alternative hypothesis to test whether the population mean moisture content is less than 0.35 pound per 100 square feet is given:<br>
     Ho <= 0.35<br>
     H1 > 0.35<br>


# Load the neccessary Libraries.

In [11]:
# Import required Libraries/packages/Modules 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import scipy.stats as stats
from scipy.stats import ttest_1samp, ttest_ind
import statsmodels.stats.api as sm

# Import the dataset and Load the dataset.

In [2]:
# Define Shingles_df has variable for ABC-Shingles dataset  and import the dataset
Shingles_df = pd.read_csv('A & B shingles-1.csv')

In [3]:
Shingles_df.head()

Unnamed: 0,A,B
0,0.44,0.14
1,0.61,0.15
2,0.47,0.31
3,0.3,0.16
4,0.15,0.37


In [4]:
Shingles_df.tail(8)

Unnamed: 0,A,B
28,0.36,0.36
29,0.29,0.22
30,0.27,0.39
31,0.4,
32,0.29,
33,0.43,
34,0.34,
35,0.37,


In [5]:
Shingles_df.isnull().sum()

A    0
B    5
dtype: int64

In [6]:
Shingles_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       36 non-null     float64
 1   B       31 non-null     float64
dtypes: float64(2)
memory usage: 704.0 bytes


In [7]:
Shingles_df.describe()

Unnamed: 0,A,B
count,36.0,31.0
mean,0.316667,0.273548
std,0.135731,0.137296
min,0.13,0.1
25%,0.2075,0.16
50%,0.29,0.23
75%,0.3925,0.4
max,0.72,0.58


## a) For the A shingles, form the null and alternate hypothesis to test whether the population mean moisture content is less than 0.35 pound per 100 square feet.

* **In testing the company would like to show that the mean moisture content is less than 0.35 pound per 100 square feet.**
* The company claims that moisture content is less than 0.35 lbs/100 sqft.
    - Null hypothesis is the claim or the status quo. Only under strong evidence, the null hypothesis is to be rejected.
    - Null hypothesis states that mean moisture content is:
         $ \mu <= 0.35 $ pound per 100 square feet
    - Alternate hypothesis states that
         $ \mu > 0.35 $ pound per 100 square feet
* $$ H0: \mu <= 0.35 $$
  $$ Ha: \mu > 0.35 $$ 
  
* **The test for population mean is done by using one-sample t-test. Since the alternative is a greater-than type alternative, the rejection region will be on the right hand side. If the p-value of the test is less than 0.05, null hypotheis is rejected.**

## Test the above hypothesis formed above

In [8]:
t_statistic, p_value  = stats.ttest_1samp(Shingles_df['A'],0.35) #by default t-test is a two sided ttest in python
#t-distribution moves left to right (-ve to +ve side)

print(' The test statistic is {}'.format(t_statistic),'\n')

p_value_greater = 1- (p_value/2) #p_value given will be 2 sided p-value
# hence dividing pvalue by 2 and subtracting it by 1 will give pvalue of right side of t-distribution,
#since alternate lies on right side

if p_value_greater>0.05:
    print('The p-value is {} which is greater than the level of significance, hence we fail to reject the Null hypothesis'.format(p_value_greater))
else:
    print('The p-value is {} which is less than the level of significance, hence we reject the Null hypothesis'.format(p_value_greater))

 The test statistic is -1.4735046253382782 

The p-value is 0.9252236685509249 which is greater than the level of significance, hence we fail to reject the Null hypothesis


**Conclusion: Null hypothesis that mean moisture content is less than 0.35 lbs/100 sqft cannot be rejected.**

## b) For the B shingles, form the null and alternate hypothesis to test whether the population mean moisture content is less than 0.35 pound per 100 square feet.

    - Null hypothesis states that mean moisture content is:
         $ \mu <= 0.35 $ pound per 100 square feet
    - Alternate hypothesis states that
         $ \mu > 0.35 $ pound per 100 square feet
* $$ H0: \mu <= 0.35 $$
  $$ Ha: \mu > 0.35 $$ 


## Test the above hypothesis formed above

In [9]:
t_statistic, p_value  = stats.ttest_1samp(Shingles_df['B'].dropna(),0.35) #by default t-test is a two sided ttest in python
#t-distribution moves left to right (-ve to +ve side)

print(' The test statistic is {}'.format(t_statistic),'\n')

p_value_greater = 1- (p_value/2) #p_value given will be 2 sided p-value
# hence dividing pvalue by 2 and subtracting it by 1 will give pvalue of right side of t-distribution,
#since alternate lies on right side

if p_value_greater>0.05:
    print('The p-value is {} which is greater than the level of significance, hence we fail to reject the Null hypothesis'.format(p_value_greater))
else:
    print('The p-value is {} which is less than the level of significance, hence we reject the Null hypothesis'.format(p_value_greater))

 The test statistic is -3.1003313069986995 

The p-value is 0.9979095225996808 which is greater than the level of significance, hence we fail to reject the Null hypothesis


**Conclusion: Null hypothesis that mean moisture content is less than 0.35 lbs/100 sqft cannot be rejected.**

# 3.1	Do you think that the population means for shingles A and B are equal? Form the hypothesis and conduct the test of the hypothesis. What assumption do you need to check before the test for equality of means is performed?

* Null hypothesis states that the population mean of shingles A and B is equal. $ \mu A = \mu B $

* Alternate hypothesis states that the population mean of shingles A and B is not equal. $ \mu A != \mu B $

$$ H0:  \mu A = \mu B $$
$$ HA:  \mu A != \mu B $$

### Assumptions for t-test

1. The first assumptions made regarding t-tests concerns the scale of measurement. The assumption for a t-test is that the scale of measurement applied to the data collected follows a continuous or ordinal scale.
2. The second assumption made is that of a simple random sample, that the data is collected from a randomly selected portion of the total population.
3. The distribution of the moisture content in both populations follows a normal distribution.
4. The fourth assumption is a reasonably large sample size is used.
5. The final assumption is homogeneity of variance. Homogeneous or equal, varaiance exists when the standard deviations of samples are approxiamtely equal.

In [10]:
t_statistic,p_value =ttest_ind(Shingles_df['A'],Shingles_df['B'],axis=0, equal_var=True, nan_policy='omit')

print('One sample t-test \nt statistics: {} p value: {}'.format(t_statistic,p_value),'\n')

alpha_value =0.05 # Level of significance
print('Level of significance: %.2f' %alpha_value,'\n')
print('Our one-sample t-test p-value',p_value,'\n')

if p_value < alpha_value:
    print('We have evidence to reject the null hypothesis since p-value < level of significance')
else:
    print('We have no evidence to reject the null hypothesis since p-value > level of significance')

One sample t-test 
t statistics: 1.2896282719661123 p value: 0.2017496571835306 

Level of significance: 0.05 

Our one-sample t-test p-value 0.2017496571835306 

We have no evidence to reject the null hypothesis since p-value > level of significance


# 3.2	What assumption about the population distribution is needed in order to conduct the hypothesis tests above?

* **The assumption about the population distribution is that it is normally distributed.**

# Thank you