# Challenge 1 - T-test

In statistics, t-test is used to test if two data samples have a significant difference between their means. There are two types of t-test:

* **Student's t-test** (a.k.a. independent or uncorrelated t-test). This type of t-test is to compare the samples of **two independent populations** (e.g. test scores of students in two different classes). `scipy` provides the [`ttest_ind`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_ind.html) method to conduct student's t-test.

* **Paired t-test** (a.k.a. dependent or correlated t-test). This type of t-test is to compare the samples of **the same population** (e.g. scores of different tests of students in the same class). `scipy` provides the [`ttest_re`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_rel.html) method to conduct paired t-test.

Both types of t-tests return a number which is called the **p-value**. If p-value is below 0.05, we can confidently declare the null-hypothesis is rejected and the difference is significant. If p-value is between 0.05 and 0.1, we may also declare the null-hypothesis is rejected but we are not highly confident. If p-value is above 0.1 we do not reject the null-hypothesis.

Read more about the t-test in [this article](https://researchbasics.education.uconn.edu/t-test/) and [this Quora](https://www.quora.com/What-is-the-difference-between-a-paired-and-unpaired-t-test). Make sure you understand when to use which type of t-test. 

In [38]:
# Import libraries

import pandas as pd

#### Import dataset

In this challenge we will work on the Pokemon dataset you have used last week. The goal is to test whether different groups of pokemon (e.g. Legendary vs Normal, Generation 1 vs 2, single-type vs dual-type) have different stats (e.g. HP, Attack, Defense, etc.).

In [39]:
# Import dataset

pokemon = pd.read_csv(r"../data/Pokemon.csv")

pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


- **Name**: The name of the Pokémon (e.g., Bulbasaur, Ivysaur).
- **Type 1**: The primary elemental type of the Pokémon (e.g., Grass).
- **Type 2**: The secondary elemental type of the Pokémon, if it has one (e.g., Poison). If a Pokémon has only one type, this field may be empty.
- **Total**: The sum of all base stats (HP, Attack, Defense, Sp. Atk, Sp. Def, Speed). This provides an overall measure of the Pokémon’s strength.
- **HP**: Hit Points – the amount of health the Pokémon has. If HP reaches 0, the Pokémon faints.
- **Attack**: The base physical attack stat, which determines the power of physical moves.
- **Defense**: The base physical defense stat, which determines how well the Pokémon can withstand physical moves.
- **Sp. Atk**: Special Attack – the base stat for determining the power of special moves.
- **Sp. Def**: Special Defense – the base stat for determining how well the Pokémon can withstand special moves.
- **Speed**: Determines the order in which Pokémon act during battle (the faster Pokémon usually moves first).
- **Generation**: Indicates the game generation in which the Pokémon was introduced (e.g., 1 for the first generation, which includes Red/Blue/Green/Yellow).
- **Legendary**: A Boolean value (`TRUE`/`FALSE`) indicating whether the Pokémon is classified as legendary.


#### Categorical variables (nominal)
categorical = ("Name", "Type 1", "Type 2", "Legendary")

#### Numerical variables (discrete and continuous)
numerical = ("Total", "HP", "Attack", "Defense", "Sp. Atk", "Sp. Def", "Speed")

#### Ordinal variables
ordinal = ("Generation",)


#### First we want to define a function with which we can test the means of a feature set of two samples. 

In the next cell you'll see the annotations of the Python function that explains what this function does and its arguments and returned value. This type of annotation is called **docstring** which is a convention used among Python developers. The docstring convention allows developers to write consistent tech documentations for their codes so that others can read. It also allows some websites to automatically parse the docstrings and display user-friendly documentations.

Follow the specifications of the docstring and complete the function.

In [40]:
from tabulate import tabulate


# Categorical variables (nominal)
categorical_var = ("Type 1", "Type 2", "Legendary")

# Numerical variables (discrete and continuous)
numerical_var = ("Total", "HP", "Attack", "Defense", "Sp. Atk", "Sp. Def", "Speed")

# Ordinal variables
ordinal_var = ("Generation",)


# Total number of Pokémon
num_pokemon = len(pokemon)
print(f"Total number of Pokémon: {num_pokemon}")

# Describe categorical variables
print("Categorical Variables:")
for col in categorical_var:
    value_counts = pokemon[col].value_counts()
    table = [[index, count] for index, count in value_counts.items()]
    print(f"\n{col}:")
    print(tabulate(table, headers=[col, "Count"], tablefmt="grid"))

# Describe numerical variables
print("\nNumerical Variables:")
numerical_summary = pokemon[list(numerical_var)].describe()
print(tabulate(numerical_summary, headers="keys", tablefmt="grid"))

# Describe ordinal variables
print("\nOrdinal Variables:")
for col in ordinal_var:
    value_counts = pokemon[col].value_counts().sort_index()
    table = [[index, count] for index, count in value_counts.items()]
    print(f"\n{col}:")
    print(tabulate(table, headers=[col, "Count"], tablefmt="grid"))


Total number of Pokémon: 800
Categorical Variables:

Type 1:
+----------+---------+
| Type 1   |   Count |
| Water    |     112 |
+----------+---------+
| Normal   |      98 |
+----------+---------+
| Grass    |      70 |
+----------+---------+
| Bug      |      69 |
+----------+---------+
| Psychic  |      57 |
+----------+---------+
| Fire     |      52 |
+----------+---------+
| Electric |      44 |
+----------+---------+
| Rock     |      44 |
+----------+---------+
| Dragon   |      32 |
+----------+---------+
| Ground   |      32 |
+----------+---------+
| Ghost    |      32 |
+----------+---------+
| Dark     |      31 |
+----------+---------+
| Poison   |      28 |
+----------+---------+
| Steel    |      27 |
+----------+---------+
| Fighting |      27 |
+----------+---------+
| Ice      |      24 |
+----------+---------+
| Fairy    |      17 |
+----------+---------+
| Flying   |       4 |
+----------+---------+

Type 2:
+----------+---------+
| Type 2   |   Count |
| Flying  

In [41]:
import math

def calculate_sample_size(N, p=0.5, E=0.05, confidence_level=0.95):
    """
    Calculate the required sample size for estimating a population proportion, 
    adjusted for finite population correction.

    Reference:  Kokoska, S., & Zwillinger, D. (2000). CRC Standard Probability and Statistics Tables and Formulae, 
                Student Edition (1st ed.). CRC Press. (Table 9.2, Section 9.7 'Common sample size calculations').
    
    Parameters:
    - N (int): Population size (finite population)
    - p (float): Estimated population proportion (default 0.5 for maximum variability)
    - E (float): margin of error
    - confidence_level (float): Confidence level (default 0.95 for 95%)
    
    Returns:
    - n_adj (float): Adjusted sample size considering finite population correction
    
    """
    # Critical value (z-score) for confidence level
    z_alpha_div_2 = {
        0.90: 1.645,
        0.95: 1.96,
        0.99: 2.576
    }.get(confidence_level, 1.96)  # Default to 1.96 for 95%
    
    # Calculate initial sample size (n) without finite population correction
    n = (z_alpha_div_2**2 * p * (1 - p)) / (E**2)
    
    # Apply finite population correction
    n_adj = n / (1 + ((n - 1) / N))
    
    # Round up to the nearest whole number
    return math.ceil(n_adj)




In [45]:
import scipy.stats as stats

# Determine sample size
population_size = len(pokemon)  # N
margin_of_error = 0.05  # E
confidence = 0.95  # 95% confidence level
estimated_proportion = 0.5  # p


sample_size = calculate_sample_size(population_size, p=estimated_proportion, E=margin_of_error, confidence_level=confidence)
print(f"Sample Size: {sample_size}")

# random_state acts like a "seed" for the random number generator, ensuring that the same random numbers are produced every time the code runs
pokemon_sample_1 = pokemon.sample(n=sample_size, random_state=27)
pokemon_sample_2 = pokemon.sample(n=sample_size, random_state=52)


def t_test_features(s1, s2, features):
    """
    
    Test means of a feature set of two samples
    
    Args:
        s1 (dataframe): sample 1
        s2 (dataframe): sample 2
        features (list): an array of features to test
    
    Returns:
        dict: a dictionary of t-test scores for each feature where the feature name is the key and the p-value is the value
    """
    results = {}

    for column in features:
        t_stat, p_value = stats.ttest_ind(
            s1[column], 
            s2[column], 
            equal_var=False
        )
        results[column] = {"t_statistic": t_stat, "p_value": p_value}
    
    return results


t_test = t_test_features(pokemon_sample_1, pokemon_sample_2, numerical_var)


Sample Size: 260


#### Using the `t_test_features` function, conduct t-test for Lengendary vs non-Legendary pokemons.

*Hint: your output should look like below:*

```
{'HP': 1.0026911708035284e-13,
 'Attack': 2.520372449236646e-16,
 'Defense': 4.8269984949193316e-11,
 'Sp. Atk': 1.5514614112239812e-21,
 'Sp. Def': 2.2949327864052826e-15,
 'Speed': 1.049016311882451e-18,
 'Total': 9.357954335957446e-47}
 ```

In [46]:
# Null Hypothesis (H0): There is NO SIGNIFICANT DIFFERENCE in the mean value of the feature between Legendary and non-Legendary Pokémon.
# Alternate Hypothesis (Ha): There IS A SIGNIFICANT DIFFERENCE in the mean value of the feature between Legendary and non-Legendary Pokémon.

from scipy.stats import ttest_ind

# Select all Legendary Pokémon
legendary_pokemon = pokemon[pokemon["Legendary"] == True]

# Select all non-Legendary Pokémon
non_legendary_pokemon = pokemon[pokemon["Legendary"] == False]


t_test_legend_v_non_legend = t_test_features(legendary_pokemon, non_legendary_pokemon, numerical_var)

# Convert dictionary to a tabular format
table = [[feature, stats['t_statistic'], stats['p_value']] for feature, stats in t_test_legend_v_non_legend.items()]

# Print the table using tabulate
print(tabulate(table, headers=["Feature", "t-statistic", "p-value"], tablefmt="grid"))

+-----------+---------------+-------------+
| Feature   |   t-statistic |     p-value |
| Total     |      25.8336  | 9.35795e-47 |
+-----------+---------------+-------------+
| HP        |       8.98137 | 1.00269e-13 |
+-----------+---------------+-------------+
| Attack    |      10.4381  | 2.52037e-16 |
+-----------+---------------+-------------+
| Defense   |       7.63708 | 4.827e-11   |
+-----------+---------------+-------------+
| Sp. Atk   |      13.4174  | 1.55146e-21 |
+-----------+---------------+-------------+
| Sp. Def   |      10.0157  | 2.29493e-15 |
+-----------+---------------+-------------+
| Speed     |      11.475   | 1.04902e-18 |
+-----------+---------------+-------------+


#### From the test results above, what conclusion can you make? Do Legendary and non-Legendary pokemons have significantly different stats on each feature?

In [48]:
# | Feature   | t-statistic   | p-value          | Conclusion                                               |
# |-----------|---------------|------------------|---------------------------------------------------------|
# | Total     | 25.8336       | 9.36 × 10⁻⁴⁷     | Significant difference. Legendary Pokémon have higher overall stats. |
# | HP        | 8.98137       | 1.00 × 10⁻¹³     | Significant difference. Legendary Pokémon have higher HP. |
# | Attack    | 10.4381       | 2.52 × 10⁻¹⁶     | Significant difference. Legendary Pokémon have higher Attack. |
# | Defense   | 7.63708       | 4.83 × 10⁻¹¹     | Significant difference. Legendary Pokémon have higher Defense. |
# | Sp. Atk   | 13.4174       | 1.55 × 10⁻²¹     | Significant difference. Legendary Pokémon have higher Sp. Atk. |
# | Sp. Def   | 10.0157       | 2.29 × 10⁻¹⁵     | Significant difference. Legendary Pokémon have higher Sp. Def. |
# | Speed     | 11.475        | 1.05 × 10⁻¹⁸     | Significant difference. Legendary Pokémon have higher Speed. |

# p-values are well below the standard significance threshold (α=0.05), so we reject the null hypothesis (H0) for all features.



#### Next, conduct t-test for Generation 1 and Generation 2 pokemons.

In [49]:
# Null Hypothesis (H0): There is NO SIGNIFICANT DIFFERENCE in the mean value of the feature between Legendary and non-Legendary Pokémon.
# Alternate Hypothesis (Ha): There IS A SIGNIFICANT DIFFERENCE in the mean value of the feature between Legendary and non-Legendary Pokémon.


# Select all Gen 1  Pokémon
gen1_pokemon = pokemon[pokemon["Generation"] == 1]

# Select all Gen 2 Pokémon
gen2_pokemon = pokemon[pokemon["Generation"] == 2]

t_test_gen1_v_gen2 = t_test_features(gen1_pokemon, gen2_pokemon, numerical_var)

# Convert dictionary to a tabular format
table = [[feature, stats['t_statistic'], stats['p_value']] for feature, stats in t_test_gen1_v_gen2.items()]

# Print the table using tabulate
print(tabulate(table, headers=["Feature", "t-statistic", "p-value"], tablefmt="grid"))


+-----------+---------------+------------+
| Feature   |   t-statistic |    p-value |
| Total     |      0.579073 | 0.563138   |
+-----------+---------------+------------+
| HP        |     -1.46097  | 0.145517   |
+-----------+---------------+------------+
| Attack    |      1.16031  | 0.24722    |
+-----------+---------------+------------+
| Defense   |     -0.572417 | 0.567771   |
+-----------+---------------+------------+
| Sp. Atk   |      1.54609  | 0.123322   |
+-----------+---------------+------------+
| Sp. Def   |     -1.32037  | 0.188299   |
+-----------+---------------+------------+
| Speed     |      3.06959  | 0.00239266 |
+-----------+---------------+------------+


#### What conclusions can you make?

In [None]:
# | Feature   | t-statistic   | p-value     | Conclusion                                                 |
# |-----------|---------------|-------------|-----------------------------------------------------------|
# | Total     | 0.579073      | 0.563138    | No significant difference in total stats between Gen 1 and Gen 2 Pokémon. |
# | HP        | -1.46097      | 0.145517    | No significant difference in HP between Gen 1 and Gen 2 Pokémon.           |
# | Attack    | 1.16031       | 0.24722     | No significant difference in Attack between Gen 1 and Gen 2 Pokémon.       |
# | Defense   | -0.572417     | 0.567771    | No significant difference in Defense between Gen 1 and Gen 2 Pokémon.      |
# | Sp. Atk   | 1.54609       | 0.123322    | No significant difference in Sp. Atk between Gen 1 and Gen 2 Pokémon.      |
# | Sp. Def   | -1.32037      | 0.188299    | No significant difference in Sp. Def between Gen 1 and Gen 2 Pokémon.      |
# | Speed     | 3.06959       | 0.002393    | Significant difference. Gen 1 Pokémon have higher Speed than Gen 2 Pokémon. |

# Significant Difference:
# Only Speed has a statistically significant difference between Gen 1 and Gen 2 Pokémon (p=0.00239, t=3.06959)
# Gen 1 Pokémon are faster on average than Gen 2 Pokémon.



#### Compare pokemons who have single type vs those having two types.

In [50]:
# Create a new column to count the number of types for each Pokémon
pokemon['Type Count'] = pokemon[['Type 1', 'Type 2']].notnull().sum(axis=1)

# Split the dataset into Pokémon with 1 type and Pokémon with 2 types
pokemon_1_type = pokemon[pokemon['Type Count'] == 1]
pokemon_2_types = pokemon[pokemon['Type Count'] == 2]

t_test_1type_v_2types = t_test_features(pokemon_1_type, pokemon_2_types, numerical_var)

# Convert dictionary to a tabular format
table = [[feature, stats['t_statistic'], stats['p_value']] for feature, stats in t_test_1type_v_2types.items()]

# Print the table using tabulate
print(tabulate(table, headers=["Feature", "t-statistic", "p-value"], tablefmt="grid"))


+-----------+---------------+-------------+
| Feature   |   t-statistic |     p-value |
| Total     |      -5.35568 | 1.11571e-07 |
+-----------+---------------+-------------+
| HP        |      -1.58609 | 0.113144    |
+-----------+---------------+-------------+
| Attack    |      -3.81056 | 0.000149326 |
+-----------+---------------+-------------+
| Defense   |      -5.60979 | 2.79785e-08 |
+-----------+---------------+-------------+
| Sp. Atk   |      -3.82898 | 0.000138762 |
+-----------+---------------+-------------+
| Sp. Def   |      -3.89299 | 0.000107306 |
+-----------+---------------+-------------+
| Speed     |      -2.25801 | 0.024217    |
+-----------+---------------+-------------+


#### What conclusions can you make?

In [52]:
# | Feature   | t-statistic   | p-value         | Conclusion                                                 |
# |-----------|---------------|-----------------|-----------------------------------------------------------|
# | Total     | -5.35568      | 1.12 × 10⁻⁷     | Significant difference. Pokémon with 2 types have higher overall stats. |
# | HP        | -1.58609      | 0.113144        | No significant difference in HP between Pokémon with 1 type and 2 types. |
# | Attack    | -3.81056      | 1.49 × 10⁻⁴     | Significant difference. Pokémon with 2 types have higher Attack.         |
# | Defense   | -5.60979      | 2.80 × 10⁻⁸     | Significant difference. Pokémon with 2 types have higher Defense.       |
# | Sp. Atk   | -3.82898      | 1.39 × 10⁻⁴     | Significant difference. Pokémon with 2 types have higher Sp. Atk.       |
# | Sp. Def   | -3.89299      | 1.07 × 10⁻⁴     | Significant difference. Pokémon with 2 types have higher Sp. Def.       |
# | Speed     | -2.25801      | 0.024217        | Significant difference. Pokémon with 2 types have higher Speed.         |


# Significant Differences:

# For Total, Attack, Defense, Special Attack (Sp. Atk), Special Defense (Sp. Def), and Speed, the p-values are below the standard significance threshold (# α=0.05).
# This indicates that Pokémon with 2 types have significantly higher stats in these categories compared to Pokémon with 1 type.

#### Now, we want to compare whether there are significant differences of `Attack` vs `Defense`  and  `Sp. Atk` vs `Sp. Def` of all pokemons. Please write your code below.

*Hint: are you comparing different populations or the same population?*

In [53]:
from scipy.stats import ttest_rel


# Paired t-test for Attack vs Defense
t_stat_attack_defense, p_value_attack_defense = ttest_rel(pokemon['Attack'], pokemon['Defense'])

# Paired t-test for Sp. Atk vs Sp. Def
t_stat_sp_atk_sp_def, p_value_sp_atk_sp_def = ttest_rel(pokemon['Sp. Atk'], pokemon['Sp. Def'])

# Print results
results = [
    ["Attack vs Defense", t_stat_attack_defense, p_value_attack_defense],
    ["Sp. Atk vs Sp. Def", t_stat_sp_atk_sp_def, p_value_sp_atk_sp_def]
]

# Print the table using tabulate
print(tabulate(table, headers=["Feature", "t-statistic", "p-value"], tablefmt="grid"))



+-----------+---------------+-------------+
| Feature   |   t-statistic |     p-value |
| Total     |      -5.35568 | 1.11571e-07 |
+-----------+---------------+-------------+
| HP        |      -1.58609 | 0.113144    |
+-----------+---------------+-------------+
| Attack    |      -3.81056 | 0.000149326 |
+-----------+---------------+-------------+
| Defense   |      -5.60979 | 2.79785e-08 |
+-----------+---------------+-------------+
| Sp. Atk   |      -3.82898 | 0.000138762 |
+-----------+---------------+-------------+
| Sp. Def   |      -3.89299 | 0.000107306 |
+-----------+---------------+-------------+
| Speed     |      -2.25801 | 0.024217    |
+-----------+---------------+-------------+


#### What conclusions can you make?

In [None]:
# | Feature   | t-statistic   | p-value         | Conclusion                                                 |
# |-----------|---------------|-----------------|-----------------------------------------------------------|
# | Total     | -5.35568      | 1.12 × 10⁻⁷     | Significant difference. Total values differ significantly, with one group systematically larger. |
# | HP        | -1.58609      | 0.113144        | No significant difference in HP.                          |
# | Attack    | -3.81056      | 1.49 × 10⁻⁴     | Significant difference. Attack values differ significantly from the paired feature.         |
# | Defense   | -5.60979      | 2.80 × 10⁻⁸     | Significant difference. Defense values differ significantly from the paired feature.       |
# | Sp. Atk   | -3.82898      | 1.39 × 10⁻⁴     | Significant difference. Sp. Atk values differ significantly from the paired feature.       |
# | Sp. Def   | -3.89299      | 1.07 × 10⁻⁴     | Significant difference. Sp. Def values differ significantly from the paired feature.       |
# | Speed     | -2.25801      | 0.024217        | Significant difference. Speed values differ significantly from the paired feature.         |

# Significant Differences:

# Attack vs. Defense: There is a significant difference in Attack and Defense stats.
# Sp. Atk vs. Sp. Def: There is a significant difference Special Attack and Special Defense
# Total: There is a significant difference in total stats
# Speed: There is a significant difference Speed  but less pronounced compared to other features.

# No Significant Difference:

# HP: The difference in HP is not statistically significant (p=0.113144), indicating that the two paired groups are relatively similar in this stat.


# The significant differences across most features suggest that Pokémon are designed with systematic differences between paired stats

