#  ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](http://b.link/anova24).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](http://b.link/eda14) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA. 

In [57]:
import pandas as pd
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import f_oneway

In [58]:
file = pd.read_csv('pokemon.txt')

file

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [59]:
uType1 = fle['Type 1'].unique()                  #extracting unique types

uType2 = fle['Type 2'].unique()

unique_types = pd.unique(list(uType1) + list(uType2))

unique_types

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying', nan], dtype=object)

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`. Be sure to loop through BOTH `Type 1` and `Type 2` to cover all occurrances of each unique type.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [60]:
unique_types = unique_types[~pd.isnull(unique_types)] # extractng the nan group
unique_types

pokemon_groups = []
Total = file['Total']

for pokemon_type in unique_types:                               # iterating through the unique groups
    type1_rows = file.loc[file['Type 1'] == pokemon_type]
    type2_rows = file.loc[file['Type 2'] == pokemon_type]
    
    rows = pd.concat([ type1_rows,  type2_rows])             # concatenate the rows and append to the empty list; pokemon_groups
    total_values = rows['Total'].values
    pokemon_groups.append((pokemon_type, total_values))
    
pokemon_groups = [item for item in pokemon_groups if not pd.isnull(item)]
for pokemon_type, total_values in pokemon_groups:
    print(f'Type:{pokemon_type}\tTotal: {total_values}')

Type:Grass	Total: [318 405 525 625 320 395 490 300 390 490 325 520 435 318 405 525 490 250
 340 460 180 425 310 405 530 630 220 340 480 295 460 400 335 475 460 318
 405 525 280 515 275 450 454 334 494 594 535 525 600 600 308 413 528 316
 498 280 480 280 480 461 294 464 305 489 580 313 405 530 350 531 285 405
 600 220 340 480 355 495 424 520 310 380 500 335 475 309 474 335 335 335
 335 494 494 494 494]
Type:Fire	Total: [309 405 534 634 634 299 505 350 555 410 500 495 525 580 309 405 534 250
 410 365 580 680 310 405 530 630 305 460 560 470 309 405 534 540 600 308
 418 528 316 498 315 480 540 484 307 409 534 382 499 369 507 600 330 500
 600 770 520 600 275 370 520 360 550 680]
Type:Water	Total: [314 405 530 630 320 500 300 385 510 335 515 315 490 590 325 475 305 525
 325 475 295 440 320 450 340 520 200 540 640 535 525 314 405 530 330 460
 250 420 500 210 430 490 430 380 300 480 465 540 580 310 405 535 635 220
 340 480 270 430 305 460 560 400 500 288 468 308 468 200 540 345 485 485
 485 33

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](http://b.link/scipy44).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [None]:
#Null hypothesis or H0: means total values are not significantly different
#Alternative hyp or H1: means total values are significantly different

In [74]:
total_values = [group[1] for group in pokemon_groups]    # extracting total values for each group

f_value, p_value = f_oneway(*total_values)

print('Test Results:')
print('F-values:', f_value)
print('p-values:', p_value)

Test Results:
F-values: 6.617538296005535
p-values: 2.6457458815984803e-15


#### Interpret the ANOVA test result. Is the difference significant?

In [6]:
The large F-value indicates a relative large difference between the group means relative to the variation within the groups,
also, the very small pvalue suggests a significantly large difference in the means of all groups. For these reasons, the Null
Hypothesis is rejected.
