## Web Scraping

### From pokemondb.net, I want the following columns:
- Pokedex # (for splitting generations)
- Pokemon Name
- Type
- Stat Total (sum of HP, Attack, Defense, Sp. Atk, Sp. Def, and Speed)
- HP
- Attack
- Defense
- Sp. Atk
- Sp. Def
- Speed

In [1]:
# Import relevant libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup as BS

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Read in webpage
URL = 'https://pokemondb.net/pokedex/all'
response = requests.get(URL)
soup = BS(response.text)

In [3]:
# Bring pokedex table into notebook as a dataframe
pokedex = pd.read_html(str(soup.find("table")))[0]

In [4]:
# Clean column names
pokedex.columns = [x.lower().replace(". ","_") for x in pokedex.columns]

# Change '#' column to 'pokedex_number' 
pokedex = pokedex.rename(columns={'#': 'pokedex_number'})

## Data Cleaning

### There are certain Pokemon that need to be removed from the dataset to prevent inaccurate training or overtraining of the machine learning model.
### Remove the Following Pokemon:
- Mega Pokemon
- Partner Pokemon
- Primal Pokemon
- Castform Alternate Forms
- Deoxys Alternate Forms
- Burmy Sandy and Trash Cloak Forms
- Rotom Forms
- Dialga, Palkia, Giratina Origin Formes
- Darmanitan Zen Modes
- Basculin White and Red-Striped Form
- Therian Forms
- Black and White Kyurems
- Keldeo Resolute Form
- Ash-Greninja
- Meowstic Female
- Pumpkaboo and Gourgeist Small, Large, and Super Sizes
- Zygarde 10% and Complete Formes
- Rockruff Own Tempo Rockruff
- Wishiwashi School Form
- Toxtricity Amped Form
- Eiscue Noice Face
- Morpeko Hangry Mode
- Eternatus Eternamax
- Urshifu Rapid Strike Style

In [5]:
# List of strings for Pokemon to be removed
removable_string = ['Mega ', 
                    'Partner ', 
                    'Primal ', 
                    'Castform ',
                    'Deoxys A', 
                    'Deoxys D', 
                    'Deoxys S', 
                    'Burmy S', 
                    'Burmy T', 
                    'Wormadam S', 
                    'Wormadam T',
                    'Rotom ', 
                    'Origin', 
                    'Zen', 
                    'Basculin R', 
                    'Basculin W', 
                    'Therian', 
                    ' Kyurem', 
                    'Resolute', 
                    'Ash-Greninja', 
                    'Meowstic Female', 
                    'Pumpkaboo L', 
                    'Pumpkaboo S', 
                    'Gourgeist L', 
                    'Gourgeist S', 
                    'Zygarde 1', 
                    'Zygarde C', 
                    'Rockruff ', 
                    'Wishiwashi Sc', 
                    'Toxtricity A',
                    'Noice', 
                    'Hangry', 
                    'Eternamax', 
                    'Urshifu Rapid']

# Loop to Remove Pokemon
for x in removable_string:
    pokedex = pokedex[~(pokedex['name'].str.contains(x))]

### Features that will need to be calculated/added in:
- Pokemon Generation
- Create seprate columns for Primary type and secondary types
- Pokemon Legendary Status (Legendary or Normal)
- Catch Rate
- Pokemon Abilities (for predicting types)
- Pokemon Movesets (for predicting types)

In [6]:
# Function to determine Pokemon's generation
def pokemon_gen(pokedex_num, pokemon_name):
    if 'Alolan' in pokemon_name:
        return 7
    elif 'Galarian' in pokemon_name:
        return 8
    elif 'Hisuian' in pokemon_name:
        return 8
    elif pokedex_num < 152:
        return 1
    elif pokedex_num < 252:
        return 2
    elif pokedex_num < 387:
        return 3
    elif pokedex_num < 494:
        return 4
    elif pokedex_num < 650:
        return 5
    elif pokedex_num < 722:
        return 6
    elif pokedex_num < 810:
        return 7
    else:
        return 8
    
# Loop through data and assign a generation to each pokemon
pokedex['generation'] = ''

for ind in pokedex.index:
    number = pokedex['pokedex_number'][ind]
    name = pokedex['name'][ind]
    pokedex['generation'][ind] = pokemon_gen(number, name)

In [7]:
# Split the 'type' column into 'primary_type' and 'secondary_type' columns
types = pokedex['type'].str.split(expand = True)
pokedex['type'] = types[0]
pokedex.insert(loc = 3, 
               column = 'secondary_type', 
               value = types[1])

# Rename 'type' column to 'primary_type'
pokedex = pokedex.rename(columns={'type': 'primary_type'})

In [32]:
# Create lists of pokedex numbers for legendary and pseudo-legendary pokemon
legendary_pokedex_number = [144, 145, 146, 150, 151, 243, 244, 245, 249, 250, 
                            251, 377, 378, 379, 380, 381, 382, 383, 384, 385, 
                            386, 480, 481, 482, 483, 484, 485, 486, 487, 489, 
                            490, 491, 492, 493, 494, 638, 639, 640, 641, 642, 
                            643, 644, 645, 646, 647, 648, 649, 716, 717, 718, 
                            719, 720, 721, 772, 773, 785, 786, 787, 788, 789, 
                            790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 
                            800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 
                            888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 
                            898, 905]

# Create legendary column and assign bool value to every Pokemon
pokedex['legendary'] = ''

for ind in pokedex.index:
    number = pokedex['pokedex_number'][ind]
    
    if number in legendary_pokedex_number:
        pokedex['legendary'][ind] = True
    else:
        pokedex['legendary'][ind] = False

In [35]:
pokedex

Unnamed: 0,pokedex_number,name,primary_type,secondary_type,total,hp,attack,defense,sp_atk,sp_def,speed,generation,legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1069,902,Basculegion Male,Water,Ghost,530,120,112,65,80,75,78,8,False
1070,902,Basculegion Female,Water,Ghost,530,120,92,65,100,75,78,8,False
1071,903,Sneasler,Poison,Fighting,510,80,130,60,40,80,120,8,False
1072,904,Overqwil,Dark,Poison,510,85,115,95,65,65,85,8,False


In [12]:
URL = 'https://pokemondb.net/pokedex/{}'

for name in pokedex['name']:
    print(URL.format(name.lower()))

https://pokemondb.net/pokedex/bulbasaur
https://pokemondb.net/pokedex/ivysaur
https://pokemondb.net/pokedex/venusaur
https://pokemondb.net/pokedex/charmander
https://pokemondb.net/pokedex/charmeleon
https://pokemondb.net/pokedex/charizard
https://pokemondb.net/pokedex/squirtle
https://pokemondb.net/pokedex/wartortle
https://pokemondb.net/pokedex/blastoise
https://pokemondb.net/pokedex/caterpie
https://pokemondb.net/pokedex/metapod
https://pokemondb.net/pokedex/butterfree
https://pokemondb.net/pokedex/weedle
https://pokemondb.net/pokedex/kakuna
https://pokemondb.net/pokedex/beedrill
https://pokemondb.net/pokedex/pidgey
https://pokemondb.net/pokedex/pidgeotto
https://pokemondb.net/pokedex/pidgeot
https://pokemondb.net/pokedex/rattata
https://pokemondb.net/pokedex/rattata alolan rattata
https://pokemondb.net/pokedex/raticate
https://pokemondb.net/pokedex/raticate alolan raticate
https://pokemondb.net/pokedex/spearow
https://pokemondb.net/pokedex/fearow
https://pokemondb.net/pokedex/ekans


In [34]:
pd.set_option('display.max_rows', 10)