#  ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](http://b.link/anova24).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](http://b.link/eda14) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA. 

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import scipy.stats as st

In [2]:
# Load the data:
pokemon = pd.read_csv('pokemon.txt')
pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [3]:
# Your code here
display(pokemon['Type 1'].unique())
display(pokemon['Type 2'].unique())

unique_values = list( set(pokemon['Type 1'].unique()) | set(pokemon['Type 2'].unique()) )
unique_values

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying'], dtype=object)

array(['Poison', nan, 'Flying', 'Dragon', 'Ground', 'Fairy', 'Grass',
       'Fighting', 'Psychic', 'Steel', 'Ice', 'Rock', 'Dark', 'Water',
       'Electric', 'Fire', 'Ghost', 'Bug', 'Normal'], dtype=object)

['Electric',
 'Ground',
 'Rock',
 'Ghost',
 'Normal',
 'Ice',
 'Psychic',
 'Water',
 'Grass',
 'Fire',
 nan,
 'Steel',
 'Poison',
 'Bug',
 'Fairy',
 'Fighting',
 'Flying',
 'Dragon',
 'Dark']

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`. Be sure to loop through BOTH `Type 1` and `Type 2` to cover all occurrances of each unique type.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [4]:
# Your code here
unique_values = [item for item in unique_values if not(pd.isna(item)) == True] # https://www.pythonpool.com/python-remove-nan-from-list/#:~:text=Python%20Remove%20nan%20from%20List%20Using%20Pandas%20isnull()%20function,will%20import%20the%20pandas%20library.&text=Then%20we%20shall%20use%20list,over%20the%20list%20'my_list'.
# unique_values

pokemon_totals = []

for type in unique_values:
    type1_totals = list( pokemon[pokemon['Type 1']==type]['Total'] )
    type2_totals = list( pokemon[pokemon['Type 2']==type]['Total'] )
    type_totals = type1_totals + type2_totals
    pokemon_totals.append(type_totals)
display(len(pokemon_totals))
pokemon_totals

18

[[320,
  485,
  325,
  465,
  330,
  480,
  490,
  525,
  580,
  205,
  280,
  365,
  510,
  610,
  360,
  580,
  295,
  475,
  575,
  405,
  405,
  263,
  363,
  523,
  405,
  535,
  540,
  440,
  520,
  520,
  520,
  520,
  520,
  295,
  497,
  428,
  275,
  405,
  515,
  580,
  580,
  289,
  481,
  431,
  330,
  460,
  319,
  472,
  471,
  680],
 [300,
  450,
  265,
  405,
  320,
  425,
  345,
  485,
  430,
  330,
  500,
  290,
  340,
  520,
  300,
  500,
  670,
  770,
  330,
  525,
  535,
  510,
  328,
  508,
  292,
  351,
  519,
  471,
  303,
  483,
  600,
  600,
  505,
  505,
  300,
  390,
  495,
  385,
  210,
  430,
  510,
  610,
  250,
  450,
  300,
  410,
  405,
  535,
  635,
  266,
  305,
  460,
  560,
  288,
  468,
  525,
  424,
  475,
  300,
  410,
  600,
  700,
  530,
  384,
  509,
  423,
  600],
 [300,
  390,
  495,
  385,
  355,
  495,
  355,
  495,
  515,
  615,
  410,
  300,
  410,
  600,
  700,
  375,
  440,
  440,
  355,
  495,
  355,
  495,
  580,
  350,
  495,
  35

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](http://b.link/scipy44).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [5]:
# Your code here
st.f_oneway(*pokemon_totals)

F_onewayResult(statistic=6.617538296005535, pvalue=2.6457458815984803e-15)

#### Interpret the ANOVA test result. Is the difference significant?

In [6]:
# Your comment here
# The p-value of the ANOVA test is essentially 0, suggesting that there IS a significant difference in Totals
# between the different Pokemon types.