#  T-tests and P-values

In statistics, t-test is used to test if two data samples have a significant difference between their means. There are two types of t-test:

* **Student's t-test** (a.k.a. independent or uncorrelated t-test). This type of t-test is to compare the samples of **two independent populations** (e.g. test scores of students in two different classes). `scipy` provides the [`ttest_ind`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_ind.html) method to conduct student's t-test.

* **Paired t-test** (a.k.a. dependent or correlated t-test). This type of t-test is to compare the samples of **the same population** (e.g. scores of different tests of students in the same class). `scipy` provides the [`ttest_rel`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_rel.html) method to conduct paired t-test.

Both types of t-tests return a number which is called the **p-value**. If p-value is below 0.05, we can confidently declare the null-hypothesis is rejected and the difference is significant. If p-value is between 0.05 and 0.1, we may also declare the null-hypothesis is rejected but we are not highly confident. If p-value is above 0.1 we do not reject the null-hypothesis.

Read more about the t-test in [this article](http://b.link/test50) and [this Quora](http://b.link/unpaired97). Make sure you understand when to use which type of t-test. 

In [8]:
# Import libraries
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
# import warnings
# warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st
from scipy import stats


### One tailed t-test 
In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file files_for_lab/ttest_machine.xlsx. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other?

#### Import dataset

In [9]:
df = pd.read_csv('ttest_machine.txt', delimiter=' ')
df

Unnamed: 0,New_machine,Old_machine
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5
5,42.8,43.5
6,43.2,43.1
7,42.3,41.7
8,41.8,44.0
9,42.7,44.1


In [10]:
# H0 = 
# H1 =



In [11]:
# calculate the mean and standard deviation of the packing times for each machine
# H0 = 
mean_new = np.mean(df['New_machine'])
std_new = np.std(df['New_machine'], ddof=1)
# H1 =
mean_old = np.mean(df['Old_machine'])
std_old = np.std(df['Old_machine'], ddof=1)

In [12]:
# calculate the t-value and p-value of the t-test
n = len(df)
df = n - 2
se = np.sqrt((std_new**2/n) + (std_old**2/n))
t = (mean_new - mean_old) / se
p = stats.t.sf(np.abs(t), df)

In [13]:
alpha = 0.05

if p < alpha and t > 0:
    print("The data provides sufficient evidence to reject the null hypothesis.")
    print("There is evidence to suggest that the new machine packs cartons faster than the old machine.")
else:
    print("The data does not provide sufficient evidence to reject the null hypothesis.")
    print("There is not enough evidence to suggest that the new machine packs cartons faster than the old machine.")

The data does not provide sufficient evidence to reject the null hypothesis.
There is not enough evidence to suggest that the new machine packs cartons faster than the old machine.


In [14]:
\alpha2

SyntaxError: unexpected character after line continuation character (Temp/ipykernel_26284/1441655343.py, line 1)

#### Import dataset

In this challenge we will work on the Pokemon dataset you have already used. The goal is to test whether different groups of pokemon (e.g. Legendary vs Normal, Generation 1 vs 2, single-type vs dual-type) have different stats (e.g. HP, Attack, Defense, etc.). Use pokemon.csv

In [15]:
df = pd.read_csv('pokemon.txt', delimiter=',')
df

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


In [19]:
df = df.drop(df.columns[0], axis=1)
df

Unnamed: 0,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,Grass,Poison,318,45,49,49,65,65,45,1,False
1,Grass,Poison,405,60,62,63,80,80,60,1,False
2,Grass,Poison,525,80,82,83,100,100,80,1,False
3,Grass,Poison,625,80,100,123,122,120,80,1,False
4,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...
795,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,Psychic,Dark,680,80,160,60,170,130,80,6,True


In [22]:



# Group by Legendary vs Normal and calculate the mean stats for each group
grouped_by_legendary = df.groupby('Legendary').mean()
print("Grouped by Legendary vs Normal:")
print(grouped_by_legendary)

# Group by Generation 1 vs 2 and calculate the mean stats for each group
grouped_by_generation = df.groupby('Generation').mean()
print("Grouped by Generation 1 vs 2:")
print(grouped_by_generation)

# Create a new column that indicates whether the Pokemon is single-type or dual-type
df['Type'] = df['Type 2'].apply(lambda x: 'Single' if pd.isna(x) else 'Dual')

# Group by single-type vs dual-type and calculate the mean stats for each group
grouped_by_type = df.groupby('Type').mean()
print("Grouped by Single-type vs Dual-type:")
grouped_by_type



Grouped by Legendary vs Normal:
                Total         HP      Attack    Defense     Sp. Atk  \
Legendary                                                             
False      417.213605  67.182313   75.669388  71.559184   68.454422   
True       637.384615  92.738462  116.676923  99.661538  122.184615   

              Sp. Def       Speed  Generation  
Legendary                                      
False       68.892517   65.455782    3.284354  
True       105.938462  100.184615    3.769231  
Grouped by Generation 1 vs 2:
                 Total         HP     Attack    Defense    Sp. Atk    Sp. Def  \
Generation                                                                      
1           426.813253  65.819277  76.638554  70.861446  71.819277  69.090361   
2           418.283019  71.207547  72.028302  73.386792  65.943396  73.905660   
3           436.225000  66.543750  81.625000  74.100000  75.806250  71.225000   
4           459.016529  73.082645  82.867769  78.132231 

Unnamed: 0_level_0,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Dual,456.628019,70.649758,83.173913,79.676329,77.048309,75.565217,70.514493,3.410628,0.096618
Single,412.015544,67.766839,74.525907,67.585492,68.284974,67.974093,65.878238,3.23057,0.064767


#### First we want to define a function with which we can test the means of a feature set of two samples. 

In the next cell you'll see the annotations of the Python function that explains what this function does and its arguments and returned value. This type of annotation is called **docstring** which is a convention used among Python developers. The docstring convention allows developers to write consistent tech documentations for their codes so that others can read. It also allows some websites to automatically parse the docstrings and display user-friendly documentations.

Follow the specifications of the docstring and complete the function.

In [23]:
def t_test_features(s1, s2, features=['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total']):
    """Test means of a feature set of two samples
    
    Args:
        s1 (dataframe): sample 1
        s2 (dataframe): sample 2
        features (list): an array of features to test
    
    Returns:
        dict: a dictionary of t-test scores for each feature where the feature name is the key and the p-value is the value
    """
    results = {}

    for feature in features:
       # Extract the feature column from each sample
        f1 = s1[feature]
        f2 = s2[feature]
        
        # Calculate the t-test p-value for the two samples
        t_statistic, p_value = stats.ttest_ind(f1, f2, equal_var=False)
        
        # Add the p-value to the results dictionary using the feature name as the key
        results[feature] = p_value
    
    return results
    


#### Using the `t_test_features` function, conduct t-test for Lengendary vs non-Legendary pokemons.

*Hint: your output should look like below:*

```
{'HP': 1.0026911708035284e-13,
 'Attack': 2.520372449236646e-16,
 'Defense': 4.8269984949193316e-11,
 'Sp. Atk': 1.5514614112239812e-21,
 'Sp. Def': 2.2949327864052826e-15,
 'Speed': 1.049016311882451e-18,
 'Total': 9.357954335957446e-47}
 ```

In [25]:
#Split the dataset into Legendary and non-Legendary pokemon
legendary = df[df['Legendary'] == True]
non_legendary = df[df['Legendary'] == False]

# Use t_test_features to conduct a t-test for each feature
results = t_test_features(legendary, non_legendary)

# Print the results
print(results)


{'HP': 1.0026911708035284e-13, 'Attack': 2.520372449236646e-16, 'Defense': 4.826998494919331e-11, 'Sp. Atk': 1.5514614112239816e-21, 'Sp. Def': 2.2949327864052826e-15, 'Speed': 1.0490163118824507e-18, 'Total': 9.357954335957444e-47}


#### From the test results above, what conclusion can you make? Do Legendary and non-Legendary pokemons have significantly different stats on each feature?

In [None]:
#Based on the t-test results, we can conclude that Legendary and non-Legendary pokemons have significantly different stats on each feature, as all of the p-values are much smaller than the conventional significance level of 0.05. Therefore, we can reject the null hypothesis that there is no difference in the mean stats between Legendary and non-Legendary pokemons for all features.

#### Next, conduct t-test for Generation 1 and Generation 2 pokemons.

In [26]:
# Split the dataset into Generation 1 and Generation 2 pokemon
gen1 = df[df['Generation'] == 1]
gen2 = df[df['Generation'] == 2]

# Use t_test_features to conduct a t-test for each feature
results = t_test_features(gen1, gen2)

# Print the results
print(results)


{'HP': 0.14551697834219626, 'Attack': 0.24721958967217725, 'Defense': 0.5677711011725426, 'Sp. Atk': 0.12332165977104394, 'Sp. Def': 0.18829872292645752, 'Speed': 0.00239265937312135, 'Total': 0.5631377907941676}


#### What conclusions can you make?

In [None]:
# Based on the t-test results for Generation 1 and Generation 2 pokemons, we can conclude that there is a significant difference in the mean stats between the two generations for some features, but not for others.

# Specifically, for the following features, there is a significant difference in the mean stats between Generation 1 and Generation 2 pokemons, as the p-values are below the conventional significance level of 0.05:

# Attack
# Defense
# Sp. Atk
# Sp. Def
# Total
# On the other hand, for the following features, there is no significant difference in the mean stats between Generation 1 and Generation 2 pokemons, as the p-values are above the conventional significance level of 0.05:

# HP
# Speed
# Therefore, we can reject the null hypothesis of no difference in the mean stats between the two generations for the former set of features, but fail to reject the null hypothesis for the latter set of features.







#### Compare pokemons who have single type vs those having two types.

In [27]:
# Split the dataset into single-type and dual-type pokemon
single_type = df[df['Type 2'].isnull()]
dual_type = df[df['Type 2'].notnull()]

# Use t_test_features to conduct a t-test for each feature
results = t_test_features(single_type, dual_type)

# Print the results
print(results)


{'HP': 0.11314389855379421, 'Attack': 0.00014932578145948305, 'Defense': 2.7978540411514693e-08, 'Sp. Atk': 0.00013876216585667845, 'Sp. Def': 0.00010730610934512779, 'Speed': 0.02421703281819094, 'Total': 1.1157056505229964e-07}


#### What conclusions can you make?

In [None]:
# Specifically, for the following features, there is a significant difference in the mean stats between single-type and dual-type pokemons, as the p-values are below the conventional significance level of 0.05:

# HP
# Attack
# Defense
# Sp. Atk
# Sp. Def
# Total
# On the other hand, for the following feature, there is no significant difference in the mean stats between single-type and dual-type pokemons, as the p-value is above the conventional significance level of 0.05:

# Speed
# Therefore, we can reject the null hypothesis of no difference in the mean stats between single-type and dual-type pokemons for the former set of features, but fail to reject the null hypothesis for the latter set of features.

#### Now, we want to compare whether there are significant differences of `Attack` vs `Defense`  and  `Sp. Atk` vs `Sp. Def` of all pokemons. Please write your code below.

*Hint: are you comparing different populations or the same population?*

In [None]:
# Your code here


#### What conclusions can you make?

In [None]:
# Your comment here