In this challenge you will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

### Import all required libraries

In [1]:
# import libraries

import numpy as np
import pandas as pd

### Import data set

Import data set `Pokemon.csv` from the `your-code` directory of this lab. Read the data into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [2]:
# import data set

pokemon = pd.read_csv('Pokemon.csv')

### Print first 10 rows of `pokemon`

In [3]:
# enter your code here

pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


When you look at a data set, you often wonder what each column means. Many open-source data sets provide descriptions for the various columns of the data set. For the `Pokemon.csv` data set, fortunately, the owner provided such descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the data descriptions in this file:

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | Sum of all stats that come after this, a general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

### Print the distinct values in `Type 1` and `Type 2` combined

In [4]:
# enter your code here

pd.unique(pokemon[['Type 1', 'Type 2']].values.ravel())

array(['Grass', 'Poison', 'Fire', nan, 'Flying', 'Dragon', 'Water', 'Bug',
       'Normal', 'Electric', 'Ground', 'Fairy', 'Fighting', 'Psychic',
       'Rock', 'Steel', 'Ice', 'Ghost', 'Dark'], dtype=object)

### Cleanup `Name` that contain "Mega"

Check out the pokemon names in the first 10 rows. You find there are junk texts before pokemons whose names are like "Mega XXX". For instance, "VenusaurMega Venusaur" (#3) should be "Mega Venusaur", and "CharizardMega Charizard X" (#6) should be "Mega Charizard X".

In [5]:
# enter your code here

pokemon.Name = pokemon.Name.str.replace(".*(?=Mega)", "")

# test your fixed dataframe

pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`

For instance, pokemon #1 has the Attack score 49 and Defense score 49. The corresponding `A/D Ratio` should be 49/49=1.

In [6]:
pokemon['A/D Ratio'] = pokemon['Attack'] / pokemon['Defense']

pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462


### Print the pokemon with the highest A/D Ratio

In [7]:
pokemon[pokemon['A/D Ratio'] == pokemon['A/D Ratio'].max()]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0


### Print the pokemon with the lowest A/D Ratio

In [8]:
pokemon[pokemon['A/D Ratio'] == pokemon['A/D Ratio'].min()]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
230,213,Shuckle,Bug,Rock,505,20,10,230,10,230,5,2,False,0.043478


### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

If both `Type 1` and `Type 2` values are present, the `Combo Type` value should join `Type 1` and `Type 2` with a dash (e.g. `Grass-Poison`).

If `Type 2` is `nan`, the `Combo Type` value should be the same as `Type 1`.

In [9]:
def add_combo_type(row):
    if pd.isnull(row['Type 2']):
        return row['Type 1']
    return ''.join([row['Type 1'], '-', row['Type 2']])

pokemon['Combo Type'] = pokemon.apply(lambda row: add_combo_type(row), axis=1)

pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0,Grass-Poison
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127,Grass-Poison
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952,Grass-Poison
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008,Grass-Poison
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302,Fire
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448,Fire
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923,Fire-Flying
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171,Fire-Dragon
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333,Fire-Flying
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462,Water


### Print the pokemons' `Combo Type` where their `A/D Ratio` are among the top 5 

In [10]:
top_5_adratio = pokemon.nlargest(5, 'A/D Ratio')

top_5_adratio

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0,Psychic
347,318,Carvanha,Water,Dark,305,45,90,20,65,20,65,3,False,4.5,Water-Dark
19,15,Mega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,3.75,Bug-Poison
453,408,Cranidos,Rock,,350,67,125,40,30,30,58,4,False,3.125,Rock
348,319,Sharpedo,Water,Dark,460,70,120,40,95,40,95,3,False,3.0,Water-Dark


### For the above 5 pokemons, calculate the aggregated `Attack` and `Defense` for each of the `Combo Type`.

In [11]:
pokemon.loc[pokemon['Combo Type'].isin(top_5_adratio['Combo Type'])].groupby('Combo Type').sum()

Unnamed: 0_level_0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
Combo Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Bug-Poison,2390,4175,645,820,697,510,712,791,28,0.0,15.791863
Psychic,14515,17653,2757,2468,2555,3745,3131,2997,127,9.0,44.239448
Rock,3691,3685,604,930,965,365,525,296,35,1.0,11.340818
Water-Dark,2086,2963,415,720,391,533,381,523,19,0.0,13.751694


In [12]:
pokemon.describe()

Unnamed: 0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,A/D Ratio
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375,1.164547
std,208.343798,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129,0.552604
min,1.0,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0,0.043478
25%,184.75,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0,0.828771
50%,364.5,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0,1.074176
75%,539.25,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0,1.416667
max,721.0,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0,9.0


### `Total` formula hypothesis testing

From the data descriptions you may have noticed there is a column called `Total` which indicates how strong the pokemon is. Make a hypothesis how `Total` is calculated and test your hypothesis. 

The general guideline is first examine the data carefully and make a guess how `Total` might have been calculated. You can write a math formula and convert it to a function. Then calculate the results based on your formula and store the results in a new column called `Guessed Total`. Next compare whether `Guessed Total` and `Total` contain the same values. If values match, congratuations you have verified your hypothesis! Otherwise, revise your formula, update the values in `Guessed Total`, and compare again.

In [22]:
pokemon['Guessed Total'] = pokemon['HP'] + pokemon['Attack'] + pokemon['Defense'] + pokemon['Sp. Atk'] + pokemon['Sp. Def'] + pokemon['Speed']

pokemon['Total'].equals(pokemon['Guessed Total'])

True