# Data Collection

1. Go to https://pokeapi.co/ for the pokemon api
2. Open https://www.postman.com/ and create your account
3. Copy the link provided on the homepage of pokeapi. You don't have to change it for now.
    * There should be a clickable link on the homepage of pokeapi showing exactly what you have copied. Click it and see what happens.
    * Open a new tab on chrome and paste the link you copied on the address bar. Click enter and see what happens.
    * On the postman.com page, click new -> HTTP. Check if the method is "GET", if not, choose "GET". Then paste the copied api link to the textbox following "GET".

## Task 1: Write code in python to find all pokemons, iterate the list and build a data frame with pokemons'
* Name,
* Type 1,
* Type 2,
* Generation (generation id where the pokemon was first introduced),
* Legendary

Hint 1: First find a directory of all pokemons, and then interate the directory to find the properties of each pokemon as requested.

Hint 2: Use an api call such as http://pokeapi.co/api/v2/pokemon/?limit=100 to explore the pokemons. Count at least 800 pokemons in your collected data. Below is an example of making api calls in python.

In [None]:
#pip install requests

In [None]:
import json
import requests


def get_data(api):
    response = requests.get(f"{api}")
    if response.status_code == 200:
        print("sucessfully fetched the data")
        return response.json()
    else:
        return f"Hello person, there's a {response.status_code} error with your request"



In [None]:
api_call_result = get_data("http://pokeapi.co/api/v2/pokemon/?limit=100")

sucessfully fetched the data


In [None]:
#api_call_result

In [None]:
type(api_call_result["count"])

int

In [None]:
type(api_call_result["results"])

list

In [None]:
pokemon_api_list = api_call_result["results"]
#pokemon_api_list

In [None]:
name = pokemon_api_list[0]['name']
url = pokemon_api_list[0]['url']

In [None]:
pokemon_properties = get_data(url)
types = pokemon_properties['types']
type1= types[0]['type']['name']
type2= types[1]['type']['name']


sucessfully fetched the data


In [None]:
#pokemon_properties

Hint 3: Use pandas to store and manage your pokemon as a data frame.
Check the documentation of pandas here:
https://pandas.pydata.org/
Check pandas tutorial for beginners here:
www.w3schools.com/python/pandas/default.asp.
For more examples and instructions on pandas, check the book: https://jakevdp.github.io/PythonDataScienceHandbook/


In [None]:
import pandas as pd

In [None]:
#Example: create a data frame
pokemon_dict = {"Name": [name], "Type 1": [type1], "Type 2": [type2]}
pokemon_df = pd.DataFrame(pokemon_dict)

In [None]:
pokemon_df.head()

Unnamed: 0,Name,Type 1,Type 2
0,bulbasaur,grass,poison


In [None]:
#pokemon_df.describe()

In [None]:
#Write your code here to accomplish task 1
#insert more cells if needed.

## Task 2: Data Cleaning

In the input folder, there are two other pokemon data documents: pokemon_1 and pokemon_status, both in csv format. Convert each of those files in a data frame and then compare those three pokemon dataframes.
1. Are they holding the same rows?
2. Are they holding the same columns?
3. Can we merge them? If yes, in what way?


Write python code to combine all three tables and save the result as .csv file

Hint: Check the page https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html


In [None]:
pokemon_1 = pd.read_csv("input/pokemon_1.csv")

In [None]:
pokemon_1.describe()

Unnamed: 0,Name,Type1,Type2
count,809,809,405
unique,809,18,18
top,bulbasaur,Water,Flying
freq,1,114,95


One useful operation on data frames is merging.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html


In [None]:
from io import StringIO
A_csv = """country,year,cases
Afghanistan,1999,745
Brazil,1999,37737
China,1999,212258
Afghanistan,2000,2666
Brazil,2000,80488
China,2000,213766"""

with StringIO(A_csv) as fp:
    A = pd.read_csv(fp)
print("=== A ===")
display(A)

=== A ===


Unnamed: 0,country,year,cases
0,Afghanistan,1999,745
1,Brazil,1999,37737
2,China,1999,212258
3,Afghanistan,2000,2666
4,Brazil,2000,80488
5,China,2000,213766


In [None]:
B_csv = """country,year,population
Afghanistan,1999,19987071
Brazil,1999,172006362
China,1999,1272915272
Afghanistan,2000,20595360
Brazil,2000,174504898
China,2000,1280428583"""

with StringIO(B_csv) as fp:
    B = pd.read_csv(fp)
print("\n=== B ===")
display(B)


=== B ===


Unnamed: 0,country,year,population
0,Afghanistan,1999,19987071
1,Brazil,1999,172006362
2,China,1999,1272915272
3,Afghanistan,2000,20595360
4,Brazil,2000,174504898
5,China,2000,1280428583


In [None]:
C = A.merge(B, on=['country', 'year'])
print("\n=== C = merge(A, B) ===")
display(C)


=== C = merge(A, B) ===


Unnamed: 0,country,year,cases,population
0,Afghanistan,1999,745,19987071
1,Brazil,1999,37737,172006362
2,China,1999,212258,1272915272
3,Afghanistan,2000,2666,20595360
4,Brazil,2000,80488,174504898
5,China,2000,213766,1280428583


Another operation that is useful is joining.

The default join only rows that match both input frames is called an inner-join.

* Inner-join (A, B) (default): Keep only rows of A and B where the on-keys match in both.

* Outer-join (A, B): Keep all rows of both frames, but merge rows when the on-keys match. For non-matches, fill in missing values with not-a-number (NaN) values.

* Left-join (A, B): Keep all rows of A. Only merge rows of B whose on-keys match A.

* Right-join (A, B): Keep all rows of B. Only merge rows of A whose on-keys match B.

* Alternatively, you can use merge's how=... parameter, which takes the (string) values, 'inner', 'outer', 'left', and 'right'.

In [None]:
with StringIO("""x,y,z
bug,1,d
rug,2,d
lug,3,d
mug,4,d""") as fp:
    D = pd.read_csv(fp)
print("=== D ===")
display(D)

with StringIO("""x,y,w
hug,-1,e
smug,-2,e
rug,-3,e
tug,-4,e
bug,1,e""") as fp:
    E = pd.read_csv(fp)
print("\n=== E ===")
display(E)

print("\n=== Outer-join (D, E) ===")
display(D.merge(E, on=['x', 'y'], how='outer'))

print("\n=== Left-join (D, E) ===")
display(D.merge(E, on=['x', 'y'], how='left'))

print("\n=== Right-join (D, E) ===")
display(D.merge(E, on=['x', 'y'], how='right'))


print("\n=== Inner-join (D, E) ===")
display(D.merge(E, on=['x', 'y']))


=== D ===


Unnamed: 0,x,y,z
0,bug,1,d
1,rug,2,d
2,lug,3,d
3,mug,4,d



=== E ===


Unnamed: 0,x,y,w
0,hug,-1,e
1,smug,-2,e
2,rug,-3,e
3,tug,-4,e
4,bug,1,e



=== Outer-join (D, E) ===


Unnamed: 0,x,y,z,w
0,bug,1,d,e
1,rug,2,d,
2,lug,3,d,
3,mug,4,d,
4,hug,-1,,e
5,smug,-2,,e
6,rug,-3,,e
7,tug,-4,,e



=== Left-join (D, E) ===


Unnamed: 0,x,y,z,w
0,bug,1,d,e
1,rug,2,d,
2,lug,3,d,
3,mug,4,d,



=== Right-join (D, E) ===


Unnamed: 0,x,y,z,w
0,hug,-1,,e
1,smug,-2,,e
2,rug,-3,,e
3,tug,-4,,e
4,bug,1,d,e



=== Inner-join (D, E) ===


Unnamed: 0,x,y,z,w
0,bug,1,d,e


In [None]:
#Combine the pokemon.csv that you collected with the pokemon_stats.csv
#and produce a complete pokemon.csv data file.

Another operation we will use is the .apply().

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html?highlight=apply#pandas.DataFrame.apply

In [None]:
# Example of data_frame.apply()
G = C.copy() # If you do not use copy function the original data frame is modified
G['year'] = G['year'].apply(lambda x: "'{:02d}".format(x % 100))
display(G)

Unnamed: 0,country,year,cases,population
0,Afghanistan,'99,745,19987071
1,Brazil,'99,37737,172006362
2,China,'99,212258,1272915272
3,Afghanistan,'00,2666,20595360
4,Brazil,'00,80488,174504898
5,China,'00,213766,1280428583


In [None]:
pokemon_status = pd.read_csv("input/pokemon_status.csv")
pokemon_status.head()

Unnamed: 0,#,Name,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,1,Bulbasaur,45,49,49,65,65,45
1,2,Ivysaur,60,62,63,80,80,60
2,3,Venusaur,80,82,83,100,100,80
3,4,Mega Venusaur,80,100,123,122,120,80
4,5,Charmander,39,52,43,60,50,65


In [None]:
pokemon_1 = pd.read_csv("input/pokemon_1.csv")
pokemon_1.head()

Unnamed: 0,Name,Type1,Type2
0,bulbasaur,Grass,Poison
1,ivysaur,Grass,Poison
2,venusaur,Grass,Poison
3,charmander,Fire,
4,charmeleon,Fire,


In [None]:
pokemon_status['Name'] = pokemon_status['Name'].str.lower()
pokemon_status.head()
pokemon_status.merge(pokemon_1, on=['Name'], how='left')

Unnamed: 0,#,Name,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Type1,Type2
0,1,bulbasaur,45,49,49,65,65,45,Grass,Poison
1,2,ivysaur,60,62,63,80,80,60,Grass,Poison
2,3,venusaur,80,82,83,100,100,80,Grass,Poison
3,4,mega venusaur,80,100,123,122,120,80,,
4,5,charmander,39,52,43,60,50,65,Fire,
...,...,...,...,...,...,...,...,...,...,...
795,796,diancie,50,100,150,100,150,50,Rock,Fairy
796,797,mega diancie,50,160,110,160,110,110,,
797,798,hoopa confined,80,110,60,150,130,70,,
798,799,hoopa unbound,80,160,60,170,130,80,,


In [None]:
#pokemon_status['Name'].unique()
#Well the name 'Unown' was suspecious but it turns out to be a valid pokemon name.

Write your program below to combine your collected data frame of pokemon with the pokemon_stats.

The result data frame should have at least all 809 pokemon in pokemon_stats.

The result data frame should not have repeated columns.

The result data frame should have the following columns: id,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary

One measure we can take for an overall "strength" of a pokemon is to sum up all their base stats.

Write your code of a function summing up all the base stats values. Apply this function to each pokemon in your data base and then create a new dataframe called "pokemon_overall_stats_sum" with the following two columns: id, sum