## Lab | Inferential statistics - T-test & P-value

### Instructions
1. One tailed t-test - In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file files_for_lab/ttest_machine.xlsx.. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other?

2. Matched Pairs Test - In this challenge we will compare dependent samples of data describing our Pokemon (file files_for_lab/pokemon.csv). Our goal is to see whether there is a significant difference between each Pokemon's defense and attack scores. Our hypothesis is that the defense and attack scores are equal. Compare the two columns to see if there is a statistically significant difference between them and comment your result.

#### Inferential statistics - ANOVA
Note: The following lab is divided in 2 sections.

#### Part 1
In this activity, we will look at another example. Your task is to understand the problem and write down all the steps to set up ANOVA. After the next lesson, we will ask you to solve this problem using Python. Here are the steps that you would need to work on: - Null hypothesis - Alternate hypothesis - Level of significance - Test statistic - P-value - F table

#### Context
In this challenge,we will return to the Pokemon dataset. We want to understand whether there are significant differences among various types of pokemons' Total value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA. (file files_for_lab/pokemon.csv) First let's obtain the unique values of the pokemon types. Second we will create a list named pokemon_totals to contain the Total values of each unique type of pokemons. Third we run ANOVA test on pokemon_totals.

#### State the null hypothesis
State the alternate hypothesis
What is the significance level
What are the degrees of freedom of model, error terms, and total DoF

#### Part 2
What conclusions can you draw from the experiment and why?
Interpret the ANOVA test result. Is the difference significant?

In [3]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
# Part 1

In [5]:
pack_data = pd.read_csv('/Users/leozinho.air/Desktop/Ironhack/class_27/lab-t-tests-p-values/files_for_lab/ttest_machine.txt', sep = ' ')
pack_data

Unnamed: 0,New_machine,Old_machine
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5
5,42.8,43.5
6,43.2,43.1
7,42.3,41.7
8,41.8,44.0
9,42.7,44.1


In [6]:
# H0 -> Nm mean = Om mean
# H1 -> Nm mean != Om mean


# Importing Libraries
from scipy.stats import ttest_ind


# Perform independent t-test
t_statistic, p_value = ttest_ind(pack_data['New_machine'], pack_data['Old_machine'])

# Specify the significance level (e.g., 0.05)
alpha = 0.05

# Print the p-value and t-statistic
print("p-value:", p_value)
print("t-statistic:", t_statistic)

# Check if p-value is less than alpha for a two-tailed test
if p_value < alpha:
    print("Since the p-value (0.003) is lower than significance level (0.05) I reject the null hypothesis. It's 95% sure that there is a significant difference between the means.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence of a significant difference.")


p-value: 0.0032111425007745158
t-statistic: -3.3972307061176026
Since the p-value (0.003) is lower than significance level (0.05) I reject the null hypothesis. It's 95% sure that there is a significant difference between the means.


In [7]:
# Part 2 - Pokemon dataset


In [8]:
pokemon = pd.read_csv('/Users/leozinho.air/Desktop/Ironhack/class_27/lab-t-tests-p-values/files_for_lab/pokemon.txt')
pokemon


Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


In [11]:
# Are pokemon attack and defense score statistically differents?
attack = pokemon['Attack']
defense = pokemon['Defense']

from scipy.stats import ttest_rel


# Perform relative t-test
t_statistic, p_value = ttest_rel(pack_data['New_machine'], pack_data['Old_machine'])

# Specify the significance level (e.g., 0.05)
alpha = 0.05

# Print the p-value and t-statistic
print("p-value:", p_value)
print("t-statistic:", t_statistic)

# Check if p-value is less than alpha for a two-tailed test
if p_value < alpha:
    print("Since the p-value (0.001) is lower than significance level (0.05) I reject the null hypothesis. It's 95% sure that there is a significant difference between pokemon attack - defense .")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence of a significant difference.")



p-value: 0.013540335651632521
t-statistic: -3.0614273841115844
Since the p-value (0.001) is lower than significance level (0.05) I reject the null hypothesis. It's 95% sure that there is a significant difference between pokemon attack - defense .


In [12]:
# In this challenge,we will return to the Pokemon dataset. 
# We want to understand whether there are significant differences among various types of pokemons' Total value, i.e. Grass vs Poison vs Fire vs Dragon...
# There are many types of pokemons which makes it a perfect use case for ANOVA.
# (file files_for_lab/pokemon.csv) First let's obtain the unique values of the pokemon types. 
# Second we will create a list named pokemon_totals to contain the Total values of each unique type of pokemons. 
# Third we run ANOVA test on pokemon_totals.

pokemon_totals = pokemon[['Type 1', 'Total']]
pokemon_totals.columns = ['Type_1', 'Total']

# Importing Libraries
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Model Fitting
model = ols('Total ~ C(Type_1)', data=pokemon_totals).fit()

# ANOVA Table
table = sm.stats.anova_lm(model)
print(table)
print('\n')

# Extracting p-value from the ANOVA table
p_value_anova = table['PR(>F)'][0]

# Significance level (alpha)
alpha = 0.05

# Check if p-value is less than alpha for a two-tailed test
if p_value_anova < alpha:
    print(f"Since the p-value ({p_value_anova:.9f}) is lower than significance level ({alpha:.2f}), I reject the null hypothesis. It's 95% sure that there is a significant difference between the pokemon types.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence of a significant difference.")

              df        sum_sq       mean_sq         F        PR(>F)
C(Type_1)   17.0  1.053322e+06  61960.121260  4.638767  2.077215e-09
Residual   782.0  1.044519e+07  13357.022421       NaN           NaN


Since the p-value (0.000000002) is lower than significance level (0.05), I reject the null hypothesis. It's 95% sure that there is a significant difference between the pokemon types.
