# Challenge 1

In this challenge you will be working on pokemons... You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

![Pokemon](pokemon.jpg)

Follow the instructions below and enter your code.

### Import all required libraries

In [1]:
import numpy as np
import pandas as pd

### Import data set

Import data set `Pokemon.csv` from the `your-code` directory of this lab. Read the data into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [2]:
pokemon = pd.read_csv('Pokemon.csv')
pokemon = pokemon.set_index('#')
pokemon.head()

Unnamed: 0_level_0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


### Print first 10 rows of `pokemon`

When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.

For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions as follows:

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | Sum of all stats that come after this, a general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

### Print the distinct values in `Type 1` and `Type 2` combined

In [3]:
pokemon['Type 1'].unique()
#El Type 1 y 2 tienen los mismos valores

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying'], dtype=object)

Check out the pokemon names in the first 10 rows. You find there are junk texts in the pokemon names which contain "Mega". For instance, "VenusaurMega Venusaur" should be "Mega Venusaur", and "CharizardMega Charizard X" should be "Mega Charizard X".

### Cleanup Name that contain "Mega"

In [4]:
pokemon['Name'].str.extract(r'(Mega)')
pokemon['Name']

#
1                      Bulbasaur
2                        Ivysaur
3                       Venusaur
3          VenusaurMega Venusaur
4                     Charmander
5                     Charmeleon
6                      Charizard
6      CharizardMega Charizard X
6      CharizardMega Charizard Y
7                       Squirtle
8                      Wartortle
9                      Blastoise
9        BlastoiseMega Blastoise
10                      Caterpie
11                       Metapod
12                    Butterfree
13                        Weedle
14                        Kakuna
15                      Beedrill
15         BeedrillMega Beedrill
16                        Pidgey
17                     Pidgeotto
18                       Pidgeot
18           PidgeotMega Pidgeot
19                       Rattata
20                      Raticate
21                       Spearow
22                        Fearow
23                         Ekans
24                         Arbok
        

### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`

For instance, pokemon #1 has the Attack score 49 and Defense score 49. The corresponding `A/D Ratio` is 49/49=1.

In [5]:
pokemon['A/D Ratio'] = pokemon['Attack'] / pokemon['Defense']
pokemon['A/D Ratio']

#
1      1.000000
2      0.984127
3      0.987952
3      0.813008
4      1.209302
5      1.103448
6      1.076923
6      1.171171
6      1.333333
7      0.738462
8      0.787500
9      0.830000
9      0.858333
10     0.857143
11     0.363636
12     0.900000
13     1.166667
14     0.500000
15     2.250000
15     3.750000
16     1.125000
17     1.090909
18     1.066667
18     1.000000
19     1.600000
20     1.350000
21     2.000000
22     1.384615
23     1.363636
24     1.231884
         ...   
700    1.000000
701    1.226667
702    1.017544
703    0.333333
704    1.428571
705    1.415094
706    1.428571
707    0.879121
708    1.458333
709    1.447368
710    0.942857
710    0.942857
710    0.942857
710    0.942857
711    0.737705
711    0.696721
711    0.778689
711    0.819672
712    0.811765
713    0.635870
714    0.857143
715    0.875000
716    1.378947
717    1.378947
718    0.826446
719    0.666667
719    1.454545
720    1.833333
720    2.666667
721    0.916667
Name: A/D Ratio, Lengt

### Print the pokemon with the highest `A/D Ratio`

In [6]:
pokemon['A/D Ratio'].max()

9.0

### Print the pokemon with the lowest A/D Ratio

In [7]:
pokemon['A/D Ratio'].min()

0.043478260869565216

### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

Conditions:

* If `Type 2` value is a valid string, the `Combo Type` value should be `<Type 1>-<Type 2>` (e.g. `Grass-Poison`).

* If `Type 2` value is `NaN`, the `Combo Type` value should be the same as `Type 1` which always exists.

*Hint: Consider using function and `apply`.*

In [8]:
pokemon['Combo Type'] = np.where(pokemon['Type 2'].notnull(), pokemon["Type 1"] + '-' + pokemon["Type 2"], pokemon['Type 1'])
pokemon.head(10)

Unnamed: 0_level_0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0,Grass-Poison
2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127,Grass-Poison
3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952,Grass-Poison
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008,Grass-Poison
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302,Fire
5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448,Fire
6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923,Fire-Flying
6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171,Fire-Dragon
6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333,Fire-Flying
7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462,Water


### Print `Combo Type` for pokemons whose `A/D Ratio` are among the top 5 

In [9]:
pokemon.sort()

AttributeError: 'DataFrame' object has no attribute 'sort'

### For the 5 `Combo Type` values printed from the previous question, calculate the aggregated `Attack` scores for each `Combo Type`.

In [10]:
# enter your code here


### `Total` formula hypothesis testing

From the data descriptions you may have noticed there is a column called `Total` which indicates how strong the pokemon is. Make a hypothesis how `Total` is calculated and test your hypothesis. 

The general guideline is first examine the data carefully and make a guess how `Total` might have been calculated. You can write a math formula and convert it to a function. Then calculate the results based on your formula and store the results in a new column called `Guessed Total`. Next compare whether `Guessed Total` and `Total` contain the same values. If values match, congratuations you have verified your hypothesis! Otherwise, revise your formula, update the values in `Guessed Total`, and compare again.

In [3]:
# enter your code here
