# Challenge 1

In this challenge you will be working on pokemons... You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

![Pokemon](pokemon.jpg)

Follow the instructions below and enter your code.

### Import all required libraries

In [1]:
import numpy as np
import pandas as pd

### Import data set

Import data set `Pokemon.csv` from the `your-code` directory of this lab. Read the data into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [2]:
pokemon = pd.read_csv('Pokemon.csv')

### Print first 10 rows of `pokemon`

In [3]:
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.

For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions as follows:

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | Sum of all stats that come after this, a general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

In [4]:
pokemon.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
#             800 non-null int64
Name          800 non-null object
Type 1        800 non-null object
Type 2        414 non-null object
Total         800 non-null int64
HP            800 non-null int64
Attack        800 non-null int64
Defense       800 non-null int64
Sp. Atk       800 non-null int64
Sp. Def       800 non-null int64
Speed         800 non-null int64
Generation    800 non-null int64
Legendary     800 non-null bool
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


### Print the distinct values in `Type 1` and `Type 2` combined

In [5]:
pd.concat([pokemon['Type 1'], pokemon['Type 2']]).unique()

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying', nan], dtype=object)

Check out the pokemon names in the first 10 rows. You find there are junk texts in the pokemon names which contain "Mega". For instance, "VenusaurMega Venusaur" should be "Mega Venusaur", and "CharizardMega Charizard X" should be "Mega Charizard X".

### Cleanup `Name` that contain "Mega"

In [6]:
pokemon['Name'].str.split['Mega']
pokemon[-pokemon.Name.str.contains("Mega")]
pokemon['Name'].str.contains("Mega")
pokemon['Name'].str.contains("Mega").value_counts()

= pokemon['Name'].astype('int')

pokemon['Name'].str
pokemon['Name'] = pokemon['Name'].str.split['Mega']

pokemon['Name'] = pokemon.loc[pokemon['Name'].str.startswith('Mega'), "Name"] 

SyntaxError: invalid syntax (<ipython-input-6-78f4f6851a8f>, line 6)

In [7]:
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`

For instance, pokemon #1 has the Attack score 49 and Defense score 49. The corresponding `A/D Ratio` is 49/49=1.

In [6]:
pokemon['A/D Ratio'] = pokemon['Attack'] / pokemon['Defense']

pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462


### Print the pokemon with the highest `A/D Ratio`

In [7]:
pokemon[pokemon['A/D Ratio'] == pokemon['A/D Ratio'].max()]['Name']

429    DeoxysAttack Forme
Name: Name, dtype: object

### Print the pokemon with the lowest A/D Ratio

In [8]:
pokemon[pokemon['A/D Ratio'] == pokemon['A/D Ratio'].min()]['Name']

230    Shuckle
Name: Name, dtype: object

### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

Conditions:

* If `Type 2` value is a valid string, the `Combo Type` value should be `<Type 1>-<Type 2>` (e.g. `Grass-Poison`).

* If `Type 2` value is `NaN`, the `Combo Type` value should be the same as `Type 1` which always exists.

*Hint: Consider using function and `apply`.*

In [12]:
def types(type1,type2):
    if type2.isnull():
        type3 = type1
    else:
        type3 = type1 + '-' + type2 
    return type3

pokemon['Combo Type'] = pokemon.apply(lambda x: types(x['Type 1'], x['Type 2']), axis=1)

AttributeError: ("'str' object has no attribute 'isnull'", 'occurred at index 0')

pokemon['Type 2'] = pokemon['Type 2'].fillna(' ')
pokemon['Combo Type'] = pokemon['Type 1'] + '-' + pokemon['Type 2']
pokemon.head(10)

def types(type1,type2):
    if type2.isnull:
        type3 = type1
    else:
        type3 = type1 + '-' + type2 
    return type3

if pokemon["Type 2"].isnull():
    pokemon['Combo Type'] = pokemon['Type 1']
else: 
    pokemon['Combo Type'] = pokemon['Type 1'] + '-' + pokemon['Type 2']
pokemon.head(10)

In [11]:
# enter your code here
pokemon['Combo Type'] = pokemon['Type 1'] + '-' + pokemon['Type 2']

# test transformed data
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0,Grass-Poison
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127,Grass-Poison
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952,Grass-Poison
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008,Grass-Poison
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302,
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448,
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923,Fire-Flying
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171,Fire-Dragon
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333,Fire-Flying
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462,


### Print `Combo Type` for pokemons whose `A/D Ratio` are among the top 5 

In [11]:
# enter your code here


### For the 5 `Combo Type` values printed from the previous question, calculate the aggregated `Attack` scores for each `Combo Type`.

In [12]:
# enter your code here


### `Total` formula hypothesis testing

From the data descriptions you may have noticed there is a column called `Total` which indicates how strong the pokemon is. Make a hypothesis how `Total` is calculated and test your hypothesis. 

The general guideline is first examine the data carefully and make a guess how `Total` might have been calculated. You can write a math formula and convert it to a function. Then calculate the results based on your formula and store the results in a new column called `Guessed Total`. Next compare whether `Guessed Total` and `Total` contain the same values. If values match, congratuations you have verified your hypothesis! Otherwise, revise your formula, update the values in `Guessed Total`, and compare again.

In [13]:
# enter your code here
