# Challenge 1

In this challenge you will be working on pokemons... You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

![Pokemon](pokemon.jpg)

Follow the instructions below and enter your code.

### Import all required libraries

In [2]:
import matplotlib, re
import pandas as pd

### Import data set

Import data set `Pokemon.csv` from the `your-code` directory of this lab. Read the data into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [3]:
pokemon = pd.read_csv('Pokemon.csv')

### Print first 10 rows of `pokemon`

In [4]:
pokemon[:10]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.

For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions as follows:

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | Sum of all stats that come after this, a general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

### Print the distinct values in `Type 1` and `Type 2` combined

In [13]:
#  I understand that a one value would be "Grass, Poison"
pokemon.groupby(['Type 1', 'Type 2']).size().reset_index().rename(columns={0:'count'}).head()


Unnamed: 0,Type 1,Type 2,count
0,Bug,Electric,2
1,Bug,Fighting,2
2,Bug,Fire,2
3,Bug,Flying,14
4,Bug,Ghost,1


Check out the pokemon names in the first 10 rows. You find there are junk texts in the pokemon names which contain "Mega". For instance, "VenusaurMega Venusaur" should be "Mega Venusaur", and "CharizardMega Charizard X" should be "Mega Charizard X".

### Cleanup `Name` that contain "Mega"

In [6]:
# pokemon.Name.replace({'\*******Mega': 'Mega'}) 
# pokemon['Name'].replace({'Mega':'XXXX'}) 
pokemon['Name']= pokemon['Name'].replace('.*Mega','Mega', regex=True)

# type(pokemon['Name'][0])
# re.search()
pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`

For instance, pokemon #1 has the Attack score 49 and Defense score 49. The corresponding `A/D Ratio` is 49/49=1.

In [7]:
# enter your code here
pokemon['A/D Ratio'] = pokemon.Attack / pokemon.Defense

# test transformed data
pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302


### Print the pokemon with the highest `A/D Ratio`

In [8]:
pokemon['A/D Ratio'].max()

9.0

### Print the pokemon with the lowest A/D Ratio

In [9]:
pokemon['A/D Ratio'].min()

0.043478260869565216

### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

Conditions:

* If `Type 2` value is a valid string, the `Combo Type` value should be `<Type 1>-<Type 2>` (e.g. `Grass-Poison`).

* If `Type 2` value is `NaN`, the `Combo Type` value should be the same as `Type 1` which always exists.

*Hint: Consider using function and `apply`.*

In [16]:
pokemon['Combo Type'] = pokemon['Type 1'] + pokemon['Type 2']
pokemon.head()
# numpywhere    type2 is Nan type2 = Nan

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0,GrassPoison
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127,GrassPoison
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952,GrassPoison
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008,GrassPoison
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302,


### Print `Combo Type` for pokemons whose `A/D Ratio` are among the top 5 

In [11]:
top5ADRatio = pokemon.nlargest(5, 'A/D Ratio')['Combo Type']
top5ADRatio
# pokemon.where(pokemon['A/D Ratio'] )

# animals['wtratiozerocheck'] = np.where(animals['brainwt'] != 0, animals['bodywt'] / animals['brainwt'], 0)

429    NaN
347    NaN
19     NaN
453    NaN
348    NaN
Name: Combo Type, dtype: object

### For the 5 `Combo Type` values printed from the previous question, calculate the aggregated `Attack` scores for each `Combo Type`.

In [12]:
top5 = pokemon.nlargest(5, 'A/D Ratio')

# pokemon.where(nlargest(5, 'A/D Ratio')['Combo Type'])
pokemon.where(pokemon['Combo Types' in top5ADRatio])
#               groupby['Combo Type'].

KeyError: False

### `Total` formula hypothesis testing

From the data descriptions you may have noticed there is a column called `Total` which indicates how strong the pokemon is. Make a hypothesis how `Total` is calculated and test your hypothesis. 

The general guideline is first examine the data carefully and make a guess how `Total` might have been calculated. You can write a math formula and convert it to a function. Then calculate the results based on your formula and store the results in a new column called `Guessed Total`. Next compare whether `Guessed Total` and `Total` contain the same values. If values match, congratuations you have verified your hypothesis! Otherwise, revise your formula, update the values in `Guessed Total`, and compare again.

In [None]:
# enter your code here


Tests related to the lab, but not part of it *Note for reviewer: no need to check this.

In [24]:
# Tests 
pokemon[pokemon['Attack']>80].head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952,GrassPoison
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008,GrassPoison
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923,FireFlying
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171,FireDragon
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333,FireFlying


In [56]:
Nopokemon = [pokemon['Attack']>80]
# Nopokemon
# type(Nopokemon)

TypeError: list indices must be integers or slices, not str

In [39]:
lista = [1, 3,'heos','paisa dapsodifasd', 6]
lista
# type(lista)

[1, 3, 'heos', 'paisa dapsodifasd', 6]

Porqué Nopokemon es una lista y sale con índices a la derecha, mientras que 'lista' (abajo), no tiene índices?

In [51]:
lista[0:2]

[1, 3]

Y  porqué en esta lista sí puedo hacer slice pero en Nopokemon no? *sale la lista entera igual

In [57]:
Nopokemon[0:2]

[0      False
 1      False
 2       True
 3       True
 4      False
 5      False
 6       True
 7       True
 8       True
 9      False
 10     False
 11      True
 12      True
 13     False
 14     False
 15     False
 16     False
 17     False
 18      True
 19      True
 20     False
 21     False
 22     False
 23     False
 24     False
 25      True
 26     False
 27      True
 28     False
 29      True
        ...  
 770    False
 771     True
 772    False
 773    False
 774    False
 775    False
 776     True
 777    False
 778    False
 779     True
 780    False
 781    False
 782    False
 783    False
 784     True
 785     True
 786     True
 787     True
 788    False
 789     True
 790    False
 791    False
 792     True
 793     True
 794     True
 795     True
 796     True
 797     True
 798     True
 799     True
 Name: Attack, Length: 800, dtype: bool]