# Problem

Suppose you're trying to measure the Selenium toxicity in your tap water, and obtain the following values for each day: <br>
    
day	selenium  <br> 
1	   0.051 <br>
2	    0.0505 <br>
3	    0.049 <br>
4	    0.0516 <br>
5	    0.052 <br>
6	    0.0508 <br>
7	    0.0506 <br>

    
The maxiumum level for safe drinking water is 0.05 mg/L -- using this as your alpha, does the selenium tap level exceed the legal limit?

## Solution:

In [1]:
import pandas as pd
from scipy.stats import ttest_1samp

To start, we will build a dataframe and fill it with the data given:

In [3]:
df = pd.DataFrame(columns=['day', 'selenium'])

In [4]:
df = df.append({'day': '1', 'selenium': 0.051}, ignore_index=True)
df = df.append({'day': '2', 'selenium': 0.0505}, ignore_index=True)
df = df.append({'day': '3', 'selenium': 0.049}, ignore_index=True)
df = df.append({'day': '4', 'selenium': 0.0516}, ignore_index=True)
df = df.append({'day': '5', 'selenium': 0.052}, ignore_index=True)
df = df.append({'day': '6', 'selenium': 0.0508}, ignore_index=True)
df = df.append({'day': '7', 'selenium': 0.0506}, ignore_index=True)

In [5]:
df

Unnamed: 0,day,selenium
0,1,0.051
1,2,0.0505
2,3,0.049
3,4,0.0516
4,5,0.052
5,6,0.0508
6,7,0.0506


### Step 1: Stating the Hypotheses:

Null Hypothesis (H0): Sample mean (x̅) <= Hypothesized Population mean (µ) <br>
Alternate Hypothesis(H1): Sample mean (x̅) > Hypothesized Population mean (µ) <br>
<br>
Therefore: <br>
    H0 <= 0.05mg/L <br>
    H1 > 0.05mg/L

We want to know if the selenium levels exceed the legal limit, therefore we are only interested if, statistically speaking, the selenium levels are greater than 0.05mg/L. To test this hypothesis, we will use a One-sided T-test.

### Step 2: Compute the Test Statistic:

The chi-squared test statistic is calculated as follows: 
    
    

$$t = \frac{Z}{s} =  \frac{\bar{X} - μ}{\frac{\hat{\sigma}}{\sqrt{n}}}$$

where s is the standard error

### Step 3: Find the T-critical statistic and the p-value 

Since we are using a one-sided t-test, we need to divide the calculated p-value by 2, since the ttest_1samp will perform a two-sided test by default

In [6]:
x =  df['selenium']
tscore, pvalue = ttest_1samp(x, popmean=0.05)
print("t Statistic: ", tscore)  
print("P Value: ", pvalue/2)

t Statistic:  2.173499949434694
P Value:  0.03635505933982123


### Step 4: Decide to reject or accept null hypothesis

The p value is 0.036 and is < than 0.05, therefore we cannot accept the null hypothesis. The conclusion would be that the selenium level in the tap water **exceeds** the legal limit