# Challenge 1

In this challenge you will be working on pokemons... You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

![Pokemon](pokemon.jpg)

Follow the instructions below and enter your code.

### Import all required libraries

In [13]:
import numpy as np
import pandas as pd

### Import data set

Import data set `Pokemon.csv` from the `your-code` directory of this lab. Read the data into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [14]:
pokemon = pd.read_csv('Pokemon.csv')
pokemon = pokemon.set_index('#')
pokemon.head()

Unnamed: 0_level_0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


### Print first 10 rows of `pokemon`

When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.

For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions as follows:

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | Sum of all stats that come after this, a general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

### Print the distinct values in `Type 1` and `Type 2` combined

In [15]:
pokemon['Type 1'].unique()
#El Type 1 y 2 tienen los mismos valores

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying'], dtype=object)

Check out the pokemon names in the first 10 rows. You find there are junk texts in the pokemon names which contain "Mega". For instance, "VenusaurMega Venusaur" should be "Mega Venusaur", and "CharizardMega Charizard X" should be "Mega Charizard X".

### Cleanup Name that contain "Mega"

In [34]:
pokemon['Name_bis']=pokemon['Name'].str.extract(r'(Mega\s*\w*)',expand=True)
pokemon['Name'] = np.where(pokemon['Name_bis'].notnull(), pokemon['Name_bis'], pokemon['Name'])
pokemon_clean = pokemon.iloc[:,1:13]
display(pokemon_clean.head())
display(pokemon_clean.tail())

Unnamed: 0_level_0,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0
2,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127
3,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952
3,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008
4,Fire,,309,39,52,43,60,50,65,1,False,1.209302


Unnamed: 0_level_0,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
719,Rock,Fairy,600,50,100,150,100,150,50,6,True,0.666667
719,Rock,Fairy,700,50,160,110,160,110,110,6,True,1.454545
720,Psychic,Ghost,600,80,110,60,150,130,70,6,True,1.833333
720,Psychic,Dark,680,80,160,60,170,130,80,6,True,2.666667
721,Fire,Water,600,80,110,120,130,90,70,6,True,0.916667


### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`

For instance, pokemon #1 has the Attack score 49 and Defense score 49. The corresponding `A/D Ratio` is 49/49=1.

In [27]:
pokemon['A/D Ratio'] = pokemon['Attack'] / pokemon['Defense']
pokemon['A/D Ratio'].head()

#
1    1.000000
2    0.984127
3    0.987952
3    0.813008
4    1.209302
Name: A/D Ratio, dtype: float64

### Print the pokemon with the highest `A/D Ratio`

In [18]:
pokemon['A/D Ratio'].max()

9.0

### Print the pokemon with the lowest A/D Ratio

In [19]:
pokemon['A/D Ratio'].min()

0.043478260869565216

### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

Conditions:

* If `Type 2` value is a valid string, the `Combo Type` value should be `<Type 1>-<Type 2>` (e.g. `Grass-Poison`).

* If `Type 2` value is `NaN`, the `Combo Type` value should be the same as `Type 1` which always exists.

*Hint: Consider using function and `apply`.*

In [20]:
pokemon['Combo Type'] = np.where(pokemon['Type 2'].notnull(), pokemon["Type 1"] + '-' + pokemon["Type 2"], pokemon['Type 1'])
pokemon.head(10)

Unnamed: 0_level_0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0,Grass-Poison
2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127,Grass-Poison
3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952,Grass-Poison
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008,Grass-Poison
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302,Fire
5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448,Fire
6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923,Fire-Flying
6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171,Fire-Dragon
6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333,Fire-Flying
7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462,Water


### Print `Combo Type` for pokemons whose `A/D Ratio` are among the top 5 

In [21]:
top5 = pokemon.sort_values('A/D Ratio', ascending=False).head(5)
top5

Unnamed: 0_level_0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0,Psychic
318,Carvanha,Water,Dark,305,45,90,20,65,20,65,3,False,4.5,Water-Dark
15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,3.75,Bug-Poison
408,Cranidos,Rock,,350,67,125,40,30,30,58,4,False,3.125,Rock
319,Sharpedo,Water,Dark,460,70,120,40,95,40,95,3,False,3.0,Water-Dark


### For the 5 `Combo Type` values printed from the previous question, calculate the aggregated `Attack` scores for each `Combo Type`.

In [45]:
pokemon_group = pokemon.groupby('Combo Type').agg({'Attack': sum})
display(pokemon_group.head())
print(type(pokemon_group))

Unnamed: 0_level_0,Attack
Combo Type,Unnamed: 1_level_1
Bug,856
Bug-Electric,124
Bug-Fighting,310
Bug-Fire,145
Bug-Flying,982


<class 'pandas.core.frame.DataFrame'>


In [43]:
#df.loc[df['column_name'] == some_value]
pokemon[pokemon['Combo Type'] == 'Water-Grass']

Unnamed: 0_level_0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type,Name_bis
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
270,Lotad,Water,Grass,220,40,30,30,40,50,30,3,False,1.0,Water-Grass,
271,Lombre,Water,Grass,340,60,50,50,60,70,50,3,False,1.0,Water-Grass,
272,Ludicolo,Water,Grass,480,80,70,70,90,100,70,3,False,1.0,Water-Grass,


### `Total` formula hypothesis testing

From the data descriptions you may have noticed there is a column called `Total` which indicates how strong the pokemon is. Make a hypothesis how `Total` is calculated and test your hypothesis. 

The general guideline is first examine the data carefully and make a guess how `Total` might have been calculated. You can write a math formula and convert it to a function. Then calculate the results based on your formula and store the results in a new column called `Guessed Total`. Next compare whether `Guessed Total` and `Total` contain the same values. If values match, congratuations you have verified your hypothesis! Otherwise, revise your formula, update the values in `Guessed Total`, and compare again.

In [25]:
# enter your code here
