# Webscrapping, Classification, and the Pokedex

In this project, we will use requests and BeautifulSoup to pull information off of the Pokemon Database.
Once we have the information, the next step will be to make a classification model to see how acccurately 
we can classify different pokemon based on their types.

## Webscrapping

The site that will be used to pull information is [pokemondb.net](https://pokemondb.net/).

<img src='https://img.pokemondb.net/news/2018/design-v4.jpg'>


We will be pulling information in two parts.  The Pokemon observed will be from Generations 1 - 8.
1. _General_
    * This infromation is the most basic on the specific pokemon:
        - Name
        - Types
        - Stats
2. _Specific_
    * The information in this section will extend on the general by including the following:
        * Base stats
        * Min stats
        * Max stats
        * Type Defense
        * Species
        * Height
        * Weight
        * Abilities
        * Training data
        * Breeding Data

**Import files**

In [246]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup

In [247]:
page = requests.get("https://pokemondb.net/pokedex/all")

In [248]:
page.status_code

200

Now that the page was pulled and stored as a string of the HTML tags, we will use beautiful soup to pull the table information.

In [249]:
soup = BeautifulSoup(page.content, 'html.parser')
# print(soup.prettify())

In [250]:
children = list(soup.children)

In [251]:
print([type(item) for item in children])

[<class 'bs4.element.Doctype'>, <class 'bs4.element.NavigableString'>, <class 'bs4.element.Tag'>, <class 'bs4.element.NavigableString'>]


In this block, we are pulling all of the information from the webpage and formatting so that we can save the information in a dictionary, or in a data frame

In this block, we are pulling all of the information from the webpage and formatting so that we can save the information in a dictionary, or in a data frame

Now that we are able to pull out specific information from the webpage, we can then store the info into a dataframe, or another data sctructrue. In this example, we will be using a dataframe to store the information

In [252]:
def pull_info(soup=None, tag='', class_str='', **kwargs):
    """
        parameters:
                soup:       beautiful soup object
                tag:        html-tag that we want to pull information from 
                class_str:  string name of the class tied to the html tag being observed
                **kwargs:   any supplemental argumentation that goes into beautifulsoup.find_all() function
    """
    if not soup:
        return "You did not enter a beautiful soup object"
    if class_str:
        keys = [item.text for item in soup.find_all(name=tag, class_=class_str, **kwargs)]
    else:
        keys = [item.text for item in soup.find_all(name=tag, **kwargs)]
    return keys

In [253]:
headers = pull_info(soup=soup, tag='th')  # pull the header information from the webpage

table = pull_info(soup=soup, tag='td') # pull the entries from each respective table
rows = [table[i:i+10]for i in range(0,len(table),10)]

print(headers, len(headers))
for row in rows: print(row, len(row))

['#', 'Name', 'Type', 'Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed'] 10
['001', 'Bulbasaur', 'Grass Poison', '318', '45', '49', '49', '65', '65', '45'] 10
['002', 'Ivysaur', 'Grass Poison', '405', '60', '62', '63', '80', '80', '60'] 10
['003', 'Venusaur', 'Grass Poison', '525', '80', '82', '83', '100', '100', '80'] 10
['003', 'Venusaur Mega Venusaur', 'Grass Poison', '625', '80', '100', '123', '122', '120', '80'] 10
['004', 'Charmander', 'Fire ', '309', '39', '52', '43', '60', '50', '65'] 10
['005', 'Charmeleon', 'Fire ', '405', '58', '64', '58', '80', '65', '80'] 10
['006', 'Charizard', 'Fire Flying', '534', '78', '84', '78', '109', '85', '100'] 10
['006', 'Charizard Mega Charizard X', 'Fire Dragon', '634', '78', '130', '111', '130', '85', '100'] 10
['006', 'Charizard Mega Charizard Y', 'Fire Flying', '634', '78', '104', '78', '159', '115', '100'] 10
['007', 'Squirtle', 'Water ', '314', '44', '48', '65', '50', '64', '43'] 10
['008', 'Wartortle', 'Water ', '405', '5

### Data Cleanning & Feature Engineering

In [273]:
pokedex = pd.DataFrame(data=rows, columns=list(map(lambda x:x.replace(' ','_').lower(),headers)))

In [274]:
pokedex.rename(columns={'#': 'nat_idx'}, inplace=True)

In [275]:
pokedex.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1029 entries, 0 to 1028
Data columns (total 10 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   nat_idx  1029 non-null   object
 1   name     1029 non-null   object
 2   type     1029 non-null   object
 3   total    1029 non-null   object
 4   hp       1029 non-null   object
 5   attack   1029 non-null   object
 6   defense  1029 non-null   object
 7   sp._atk  1029 non-null   object
 8   sp._def  1029 non-null   object
 9   speed    1029 non-null   object
dtypes: object(10)
memory usage: 80.5+ KB


In [276]:
pokedex.head()

Unnamed: 0,nat_idx,name,type,total,hp,attack,defense,sp._atk,sp._def,speed
0,1,Bulbasaur,Grass Poison,318,45,49,49,65,65,45
1,2,Ivysaur,Grass Poison,405,60,62,63,80,80,60
2,3,Venusaur,Grass Poison,525,80,82,83,100,100,80
3,3,Venusaur Mega Venusaur,Grass Poison,625,80,100,123,122,120,80
4,4,Charmander,Fire,309,39,52,43,60,50,65


Here, we are going to separate the type column into two separate columns: primmary and secondary. This will make it easier to identify pokemon via partial and full typing later on.


In [277]:
pokedex['primary'] = pokedex.type.apply(lambda x: x.split()[0])
pokedex['secondary'] = pokedex.type.apply(lambda x: x.split()[-1] if len(x.split())==2 else '')
pokedex.drop(inplace=True, columns='type', axis=0)

In [296]:
pokedex[(pokedex['primary']=='Dragon')|(pokedex['secondary']=='Dragon')]

Unnamed: 0,nat_idx,name,total,hp,attack,defense,sp._atk,sp._def,speed,primary,secondary
7,006,Charizard Mega Charizard X,634,78,130,111,130,85,100,Fire,Dragon
134,103,Exeggutor Alolan Exeggutor,530,95,105,85,125,75,45,Grass,Dragon
186,147,Dratini,300,41,64,45,50,50,50,Dragon,
187,148,Dragonair,420,61,84,65,70,70,70,Dragon,
188,149,Dragonite,600,91,134,95,100,100,80,Dragon,Flying
...,...,...,...,...,...,...,...,...,...,...,...
1020,885,Dreepy,270,28,60,30,40,30,82,Dragon,Ghost
1021,886,Drakloak,410,68,80,50,60,50,102,Dragon,Ghost
1022,887,Dragapult,600,88,120,75,100,75,142,Dragon,Ghost
1027,890,Eternatus,690,140,85,95,145,95,130,Poison,Dragon
