# Single-tailed test

In [1]:
import pandas as pd
from scipy.stats import t as t_student
import numpy as np
from statsmodels.stats.weightstats import DescrStatsW

## <font color='red'>Problema</font>

Nossa fábrica de vinho agora está produzindo mii garrafas de 350 ml de seu principal produto que contém, no máximo, 37 gramas de açúcar. Esta alegação nos leva a entender que a quantidade média de açúcar em uma garrafa deve ser igual ou menor que 37 g.

Um consumidor desconfiado e com conhecimentos em inferência estatística resolve testar a alegação do fabricante e seleciona, aleatóriamente, em um conjunto de estabelecimentos distintos, uma amostra de 25 garrafas. Utilizando o equipamento correto o consumidor obteve as quantidades de açúcar em todas as 25 amostras.

Assumindo que essa população se distribua aproximadamente como uma normal e considerando um nível de significância de 5%, é possível aceitar como válida a alegação do fabricante?

## Student's t-distribution

### building the table

In [2]:
t_student_table = pd.DataFrame(
                      [],
                      index=[i for i in range(1,31)],
                      columns=[i / 100 for i in range(10,0,-1)]
                      )

for index in t_student_table.index:
    for column in t_student_table.columns:
        t_student_table.loc[index, column] = \
            t_student.ppf(1 - float(column) / 2, index)

index = [('Graus de Liberdade (n - 1)', i) for i in range(1, 31)]
t_student_table.index = pd.MultiIndex.from_tuples(index)

columns = [('{0:0.3f}'.format(i / 100),
           '{0:0.3f}'.format((i / 100) / 2))
           for i in range(10, 0, -1)]
t_student_table.columns = pd.MultiIndex.from_tuples(columns)

t_student_table.rename_axis(['Two-tailed', 'Single-tailed'],
                            axis=1, inplace=True)

t_student_table

Unnamed: 0_level_0,Two-tailed,0.100,0.090,0.080,0.070,0.060,0.050,0.040,0.030,0.020,0.010
Unnamed: 0_level_1,Single-tailed,0.050,0.045,0.040,0.035,0.030,0.025,0.020,0.015,0.010,0.005
Graus de Liberdade (n - 1),1,6.31375,7.02637,7.91582,9.05789,10.5789,12.7062,15.8945,21.2049,31.8205,63.6567
Graus de Liberdade (n - 1),2,2.91999,3.10398,3.31976,3.57825,3.89643,4.30265,4.84873,5.64278,6.96456,9.92484
Graus de Liberdade (n - 1),3,2.35336,2.47081,2.60543,2.7626,2.95051,3.18245,3.48191,3.89605,4.5407,5.84091
Graus de Liberdade (n - 1),4,2.13185,2.2261,2.33287,2.45589,2.60076,2.77645,2.99853,3.29763,3.74695,4.60409
Graus de Liberdade (n - 1),5,2.01505,2.09784,2.19096,2.29739,2.42158,2.57058,2.75651,3.00287,3.36493,4.03214
Graus de Liberdade (n - 1),6,1.94318,2.0192,2.10431,2.20106,2.31326,2.44691,2.61224,2.82893,3.14267,3.70743
Graus de Liberdade (n - 1),7,1.89458,1.96615,2.04601,2.13645,2.24088,2.36462,2.51675,2.71457,2.99795,3.49948
Graus de Liberdade (n - 1),8,1.85955,1.92799,2.00415,2.09017,2.18915,2.306,2.44898,2.63381,2.89646,3.35539
Graus de Liberdade (n - 1),9,1.83311,1.89922,1.97265,2.05539,2.15038,2.26216,2.39844,2.5738,2.82144,3.24984
Graus de Liberdade (n - 1),10,1.81246,1.87677,1.9481,2.02833,2.12023,2.22814,2.35931,2.52748,2.76377,3.16927


<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img004.png' width='250px'>

As células da tabela acima são valores de $t$ para uma área ou probabilidade na cauda superior da distribuição $t$.

Os **testes unicaudais** verificam as variáveis em relação a um piso ou a um teto e avaliam os valores máximos ou mínimos esperados para os parâmetros em estudo e a chance de as estatísticas amostrais serem inferiores ou superiores a dado limite.

<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img008.png' width='700px'>

## Problem data

In [3]:
sample = [37.27, 36.42, 34.84, 34.60, 37.49, 
          36.53, 35.49, 36.90, 34.52, 37.30, 
          34.99, 36.55, 36.29, 36.06, 37.42, 
          34.47, 36.70, 35.86, 36.80, 36.92, 
          37.04, 36.39, 37.32, 36.64, 35.45]

sample_df = pd.DataFrame(sample, columns=['Sample'])
sample_df

Unnamed: 0,Sample
0,37.27
1,36.42
2,34.84
3,34.6
4,37.49
5,36.53
6,35.49
7,36.9
8,34.52
9,37.3


In [4]:
sample_mean = sample_df.mean()

sample_std = sample_df.std()

In [5]:
# from problem statement
mean = 37
significance = 0.05
confidence = 1 - significance
n = 25
degrees_of_freedom = n - 1

#### <font color='red'>Lembre-se, a hipótese nula sempre contém a alegação de igualdade</font>

### $H_0: \mu \leq 37$

### $H_1: \mu > 37$

## Step 2: Choosing the appropriate sample distribution

<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img003.png' width=70%>

## Step 3: Fixing the test significance ($\alpha$)

In [6]:
t_student_table[22:25]

Unnamed: 0_level_0,Two-tailed,0.100,0.090,0.080,0.070,0.060,0.050,0.040,0.030,0.020,0.010
Unnamed: 0_level_1,Single-tailed,0.050,0.045,0.040,0.035,0.030,0.025,0.020,0.015,0.010,0.005
Graus de Liberdade (n - 1),23,1.71387,1.76991,1.83157,1.90031,1.97825,2.06866,2.17696,2.31323,2.49987,2.80734
Graus de Liberdade (n - 1),24,1.71088,1.76667,1.82805,1.89646,1.97399,2.0639,2.17154,2.30691,2.49216,2.79694
Graus de Liberdade (n - 1),25,1.70814,1.76371,1.82483,1.89293,1.9701,2.05954,2.16659,2.30113,2.48511,2.78744


- Pay attention in the line 24
- The value in this case is 1.71088

## Getting $t_{\alpha}$

without use the t_student_table

In [7]:
t_alpha = t_student.ppf(confidence, degrees_of_freedom)

t_alpha

1.7108820799094275

![Região de Aceitação](https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img009.png)

## Step 4: calculation of the test statistic and verification of this value with the acceptance and rejection areas of the test

# $$t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}}$$

In [8]:
t = (sample_mean - mean) / (sample_std / np.sqrt(n))
t

Sample   -3.876893
dtype: float64

![Estatística-Teste](https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img010.png)

As the $t$ is on the acceptance area, then the null hypothesis is valid.

## Step 5: Acceptance or rejection of the null hypothesis

<img src='https://caelum-online-public.s3.amazonaws.com/1229-estatistica-parte3/01/img013.png' width=90%>

### <font color='red'>Critério do valor crítico</font>

> ### Teste Unicaudal Superior
> ### Rejeitar $H_0$ se $t \geq t_{\alpha}$

In [9]:
t >= t_alpha

Sample    False
dtype: bool

As we see in the figure, t is on acceptence area. So t < t_alpha, which means accept the null hypothesis.

### <font color='green'> Conclusion: With a 95% confidence level, we cannot reject H0, that is, the manufacturer's claim is true. </font>

### <font color='red'>Critério do valor $p$</font>

> ### Teste Unicaudal Superior
> ### Rejeitar $H_0$ se o valor $p\leq\alpha$

In [10]:
p_value = t_student.sf(t,df=24)

p_value

array([0.99964062])

In [11]:
p_value <= significance

array([False])

As p_value > significance, then we accept the null hypothesis.

## A simple way to do the single-tailed test 

In [12]:
test = DescrStatsW(sample_df)

In [13]:
test.ttest_mean(value=mean, alternative='larger')

(array([-3.87689312]), array([0.99964062]), 24.0)

In [14]:
t, p_value, df = test.ttest_mean(value=mean, alternative='larger')

In [15]:
p_value

array([0.99964062])