#  ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](http://b.link/anova24).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](http://b.link/eda14) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA. 

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as st

In [2]:
data = pd.read_csv('pokemon.txt')
data.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [29]:
unique_types = data['Type 1'].unique()
print(f"There are '{len(unique_types)}' types.")

There are '18' types.


#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`. Be sure to loop through BOTH `Type 1` and `Type 2` to cover all occurrances of each unique type.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [31]:
pokemon_totals = [
    data[(data['Type 1'] == type1) & (data['Type 2'] == type2)].Total 
        for type1 in unique_types 
        for type2 in unique_types 
            if data[(data['Type 1'] == type1) & (data['Type 2'] == type2)].shape[0] > 0
]

print(f"There are '{len(pokemon_totals)}' groups.")

There are '136' groups.


#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](http://b.link/scipy44).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [40]:
st.f_oneway(*pokemon_totals)

F_onewayResult(statistic=1.860127135409291, pvalue=7.899373718417435e-06)

#### Interpret the ANOVA test result. Is the difference significant?

In [6]:
# H0: means of all groups are equal

# `p_values` is lower than 0.05, so H0 CAN be rejected. 
# `Total` means over different types are different.

In [41]:
[totals.mean() for totals in pokemon_totals]

[414.06666666666666,
 525.0,
 380.0,
 523.3333333333334,
 422.5,
 474.0,
 630.0,
 431.6666666666667,
 397.0,
 422.0,
 600.0,
 438.0,
 441.6666666666667,
 492.85714285714283,
 537.0,
 410.0,
 634.0,
 600.0,
 551.5,
 346.6666666666667,
 426.6666666666667,
 395.0,
 433.9,
 335.0,
 556.6666666666666,
 481.0,
 428.75,
 407.5,
 511.6666666666667,
 610.0,
 493.8333333333333,
 530.0,
 404.0,
 384.0,
 455.0,
 269.0,
 347.9166666666667,
 395.5,
 345.0,
 550.0,
 435.0,
 236.0,
 509.7142857142857,
 419.5,
 405.0,
 410.0,
 423.0,
 330.0,
 590.0,
 527.5,
 371.9583333333333,
 320.0,
 330.0,
 505.0,
 395.0,
 494.0,
 436.0,
 411.6666666666667,
 520.0,
 520.0,
 520.0,
 385.0,
 431.0,
 440.0,
 520.0,
 610.0,
 441.6666666666667,
 537.6,
 770.0,
 471.0,
 400.0,
 455.0,
 393.0,
 430.0,
 387.3333333333333,
 508.0,
 535.0,
 475.0,
 400.0,
 495.0,
 575.0,
 500.0,
 600.0,
 600.0,
 397.0,
 638.6666666666666,
 600.0,
 680.0,
 449.6666666666667,
 425.0,
 417.6666666666667,
 425.0,
 380.0,
 600.0,
 580.0,
 440.0,
 