In [50]:
import pandas as pd
import numpy as np


data = pd.read_csv('data/data.csv', sep = ',')

## <font color = green>One-tail test </font>
***

### Problem

A famous soft drink manufacturer claims that a 350 ml can of its main product contains, ** at most **, ** 37 grams of sugar **. This claim leads us to understand that the average amount of sugar in a can of soda must be ** equal to or less than 37 g **.

A suspicious consumer with knowledge in statistical inference decides to test the manufacturer's claim and randomly selects, in a set of different establishments, ** a sample of 25 cans ** of the soft drink in question. Using the correct equipment, the consumer obtained the amounts of sugar in all 25 cans of his sample.

** Assuming that this population is distributed approximately as normal and considering a significance level of 5%, is it possible to accept the manufacturer's claim as valid? **



### Constructing table $t$ of Student
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html

In [51]:
import pandas as pd
from scipy.stats import t as t_student

tabela_t_student = pd.DataFrame(
    [], 
    index=[i for i in range(1, 31)],
    columns = [i / 100 for i in range(10, 0, -1)]
)

for index in tabela_t_student.index:
    for column in tabela_t_student.columns:
        tabela_t_student.loc[index, column] = t_student.ppf(1 - float(column) / 2, index)

index=[('Degrees of freedom(n - 1)', i) for i in range(1, 31)]
tabela_t_student.index = pd.MultiIndex.from_tuples(index)

columns = [("{0:0.3f}".format(i / 100), "{0:0.3f}".format((i / 100) / 2)) for i in range(10, 0, -1)]
tabela_t_student.columns = pd.MultiIndex.from_tuples(columns)

tabela_t_student.rename_axis(['Two-tailed', 'One-tail'], axis=1, inplace = True)

tabela_t_student

Unnamed: 0_level_0,Two-tailed,0.100,0.090,0.080,0.070,0.060,0.050,0.040,0.030,0.020,0.010
Unnamed: 0_level_1,One-tail,0.050,0.045,0.040,0.035,0.030,0.025,0.020,0.015,0.010,0.005
Degrees of freedom(n - 1),1,6.31375,7.02637,7.91582,9.05789,10.5789,12.7062,15.8945,21.2049,31.8205,63.6567
Degrees of freedom(n - 1),2,2.91999,3.10398,3.31976,3.57825,3.89643,4.30265,4.84873,5.64278,6.96456,9.92484
Degrees of freedom(n - 1),3,2.35336,2.47081,2.60543,2.7626,2.95051,3.18245,3.48191,3.89605,4.5407,5.84091
Degrees of freedom(n - 1),4,2.13185,2.2261,2.33287,2.45589,2.60076,2.77645,2.99853,3.29763,3.74695,4.60409
Degrees of freedom(n - 1),5,2.01505,2.09784,2.19096,2.29739,2.42158,2.57058,2.75651,3.00287,3.36493,4.03214
Degrees of freedom(n - 1),6,1.94318,2.0192,2.10431,2.20106,2.31326,2.44691,2.61224,2.82893,3.14267,3.70743
Degrees of freedom(n - 1),7,1.89458,1.96615,2.04601,2.13645,2.24088,2.36462,2.51675,2.71457,2.99795,3.49948
Degrees of freedom(n - 1),8,1.85955,1.92799,2.00415,2.09017,2.18915,2.306,2.44898,2.63381,2.89646,3.35539
Degrees of freedom(n - 1),9,1.83311,1.89922,1.97265,2.05539,2.15038,2.26216,2.39844,2.5738,2.82144,3.24984
Degrees of freedom(n - 1),10,1.81246,1.87677,1.9481,2.02833,2.12023,2.22814,2.35931,2.52748,2.76377,3.16927


<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img004.png' width='250px'>

The cells in the table above are $ t $ values for an area or probability in the upper tail of the $ t $ distribution.

---

The ** one-tailed tests ** check the variables in relation to a floor or a ceiling and evaluate the maximum or minimum values expected for the parameters under study and the chance that the sample statistics are lower or higher than a given limit.

<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img008.png' width='700px'>

### Data of the Problem

In [52]:
sample = [37.27, 36.42, 34.84, 34.60, 37.49, 
           36.53, 35.49, 36.90, 34.52, 37.30, 
           34.99, 36.55, 36.29, 36.06, 37.42, 
           34.47, 36.70, 35.86, 36.80, 36.92, 
           37.04, 36.39, 37.32, 36.64, 35.45]

In [53]:
sample = pd.DataFrame(sample, columns=['Sample'])
sample

Unnamed: 0,Sample
0,37.27
1,36.42
2,34.84
3,34.6
4,37.49
5,36.53
6,35.49
7,36.9
8,34.52
9,37.3


In [54]:
sample_average = sample.mean()[0]
sample_average

36.250400000000006

In [55]:
sample_std = sample.std()[0]
sample_std

0.9667535018469453

In [56]:
mean = 37
significance = 0.05
confidence = 1 - significance
n = 25
degrees_of_freedom = n -1

### ** Step 1 ** - formulation of hypotheses $ H_0 $ and $ H_1 $

#### <font color = 'red'> Remember, the null hypothesis always contains the equality claim </font>

### $H_0: \mu \leq 37$

### $H_1: \mu > 37$

---

### ** Step 2 ** - choose the appropriate sample distribution
<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img003.png' width=70%>

### Is the sample size greater than 30?
#### Ans .: No

### Can we say that the population is distributed approximately like a normal one?
#### Ans .: Yes

### Is the population standard deviation known?
#### Ans .: No

### **Step 3** - fixing the test significance ($\alpha$)

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html

In [57]:
from scipy.stats import t as t_student

In [58]:
tabela_t_student[22:25]

Unnamed: 0_level_0,Two-tailed,0.100,0.090,0.080,0.070,0.060,0.050,0.040,0.030,0.020,0.010
Unnamed: 0_level_1,One-tail,0.050,0.045,0.040,0.035,0.030,0.025,0.020,0.015,0.010,0.005
Degrees of freedom(n - 1),23,1.71387,1.76991,1.83157,1.90031,1.97825,2.06866,2.17696,2.31323,2.49987,2.80734
Degrees of freedom(n - 1),24,1.71088,1.76667,1.82805,1.89646,1.97399,2.0639,2.17154,2.30691,2.49216,2.79694
Degrees of freedom(n - 1),25,1.70814,1.76371,1.82483,1.89293,1.9701,2.05954,2.16659,2.30113,2.48511,2.78744


### Obtaining $t_{\alpha}$

In [59]:
t_alpha = t_student.ppf(confidence, degrees_of_freedom)
t_alpha

1.7108820799094275

![Região de Aceitação](https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img009.png)

---

### ** Step 4 ** - calculation of the test statistic and verification of this value with the test acceptance and rejection areas

# $$t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}}$$

In [60]:
t = (sample_average - mean) / (sample_std / np.sqrt(n))
t

-3.876893119952045

![Estatística-Teste](https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img010.png)

---

### ** Step 5 ** - Acceptance or rejection of the null hypothesis

<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img013.png' width=90%>

### <font color = 'red'> Critical value criterion </font>

> ### Superior Single Tail Test
> ### Reject $ H_0 $ if $t\geq t_{\alpha} $

In [61]:
t >= t_alpha

False

### <font color = 'green'> Conclusion: With a 95% confidence level we cannot reject $ H_0 $, meaning the manufacturer's claim is true. </font>

### <font color = 'red'> $ p $ value criterion </font>

> ### Superior Single Tail Test
> ### Reject $ H_0 $ if $ p\leq\alpha $

In [62]:
t

-3.876893119952045

In [63]:
p_value = t_student.sf(t, df = 24)
p_value

0.999640617030382

In [65]:
p_value <= significance

False

https://www.statsmodels.org/dev/generated/statsmodels.stats.weightstats.DescrStatsW.html

In [66]:
from statsmodels.stats.weightstats import DescrStatsW

In [68]:
test = DescrStatsW(sample)

In [71]:
[t], [p_value], df = test.ttest_mean(value= mean, alternative='larger')
print(t)
print(p_value)
print(df)

-3.87689311995208
0.9996406170303819
24.0


In [72]:
p_value <= significance

False

---