# Challenge 2 - ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](https://keydifferences.com/difference-between-t-test-and-anova.html).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](https://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA.

In [1]:
# Import libraries

import pandas as pd

In [2]:
# Import dataset

pokemon = pd.read_csv('../../lab-df-calculation-and-transformation/your-code/Pokemon.csv')

pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [74]:
# Your code here
# unique_types = 
list(pokemon["Type 1"].unique())
list(pokemon['Type 2'].unique())
x = list(pokemon["Type 1"].unique()) + list(pokemon["Type 2"].unique())
unique_types = set(x)
len(unique_types) # you should see 19

19

In [133]:
unique_types2 = list(pokemon["Type 1"].unique())
unique_types2

['Grass',
 'Fire',
 'Water',
 'Bug',
 'Normal',
 'Poison',
 'Electric',
 'Ground',
 'Fairy',
 'Fighting',
 'Psychic',
 'Rock',
 'Ghost',
 'Ice',
 'Dragon',
 'Dark',
 'Steel',
 'Flying']

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [184]:
poke = pd.DataFrame(pokemon[['Name','Type 1','Total']])
poke.set_index('Type 1',inplace=True)
pokemon.set_index('Name',inplace=True)
poke.head()
pokemon

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Bulbasaur,1,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,2,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False
VenusaurMega Venusaur,3,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmander,4,Fire,,309,39,52,43,60,50,65,1,False
Charmeleon,5,Fire,,405,58,64,58,80,65,80,1,False
Charizard,6,Fire,Flying,534,78,84,78,109,85,100,1,False
CharizardMega Charizard X,6,Fire,Dragon,634,78,130,111,130,85,100,1,False
CharizardMega Charizard Y,6,Fire,Flying,634,78,104,78,159,115,100,1,False
Squirtle,7,Water,,314,44,48,65,50,64,43,1,False


In [186]:
# Your code here
pokemon_totals = []
pc =[]
for i in unique_types2:
#     b = poke.loc[i]
    r = pokemon[pokemon['Type 1']==i]['Total']
    pokemon_totals.append(r)
#     pc.append(b)
print(pokemon_totals)
# print(pc)
len(pokemon_totals) # you should see 18

[Name
Bulbasaur                318
Ivysaur                  405
Venusaur                 525
VenusaurMega Venusaur    625
Oddish                   320
                        ... 
Chespin                  313
Quilladin                405
Chesnaught               530
Skiddo                   350
Gogoat                   531
Name: Total, Length: 70, dtype: int64, Name
Charmander                   309
Charmeleon                   405
Charizard                    534
CharizardMega Charizard X    634
CharizardMega Charizard Y    634
Vulpix                       299
Ninetales                    505
Growlithe                    350
Arcanine                     555
Ponyta                       410
Rapidash                     500
Magmar                       495
Flareon                      525
Moltres                      580
Cyndaquil                    309
Quilava                      405
Typhlosion                   534
Slugma                       250
Magcargo                     410
Magb

18

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [188]:
# Your code here
import scipy.stats
scipy.stats.f_oneway(*pokemon_totals)

F_onewayResult(statistic=4.63876748166055, pvalue=2.077215448842098e-09)

#### Interpret the ANOVA test result. Is the difference significant?

In [None]:
# Your comment here
#No it is not