## PokemonDB Dataset
All the data to be used in the project is taken from: https://pokemondb.net/pokedex/all

Below is a simple tutorial on how to scrape the necessary information that can be shown on the above webpage containing every Pokemon and their base stats.

### Import `requests` library
With the help of this software, you may obtain the HTML code for any website and use it to extract data. In the 'URL' variable, let's save the website's URL.

In [1]:
import requests
import numpy as np
import pandas as pd
import json

URL="https://pokemondb.net/pokedex/all"

### Load the page
The request package needs internet access to function since we are obtaining the information from the complete webpage.

In [2]:
page = requests.get(URL)

### Parse HTML data

#### Beautiful Soup
**Beautiful Soup** is a Python library that retrieves HTML and XML files and parse data from HTML files. The module is imported below.

In [3]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

Display all raw table cells containing pokemon info. Since the result is raw HTML data, the pokemon info must be extracted later.

In [4]:
# Find all (Pokemon name, type1, type2 if available)
poke_basic_info = soup.find(id='pokedex').find('tbody').find_all('td')
poke_basic_info

[<td class="cell-num cell-fixed" data-sort-value="1"><span class="infocard-cell-img"><span class="img-fixed icon-pkmn" data-alt="Bulbasaur icon" data-src="https://img.pokemondb.net/sprites/sword-shield/icon/bulbasaur.png"></span></span><span class="infocard-cell-data">001</span></td>,
 <td class="cell-name"><a class="ent-name" href="/pokedex/bulbasaur" title="View Pokedex for #001 Bulbasaur">Bulbasaur</a></td>,
 <td class="cell-icon"><a class="type-icon type-grass" href="/type/grass">Grass</a><br/> <a class="type-icon type-poison" href="/type/poison">Poison</a></td>,
 <td class="cell-total">318</td>,
 <td class="cell-num">45</td>,
 <td class="cell-num">49</td>,
 <td class="cell-num">49</td>,
 <td class="cell-num">65</td>,
 <td class="cell-num">65</td>,
 <td class="cell-num">45</td>,
 <td class="cell-num cell-fixed" data-sort-value="2"><span class="infocard-cell-img"><span class="img-fixed icon-pkmn" data-alt="Ivysaur icon" data-src="https://img.pokemondb.net/sprites/sword-shield/icon/i

### Get list of all Pokemon

We must determine which items are present in the poke tables because the `mw-content-text` div contains many tables.

The cell below will generate a list of Pokemon with their corresponding number, name, type, total, HP, attack, defense, sp. atk, sp. def, and speed respectively.

It handles single-type Pokemon inserting an empty string on where the second type of the Pokemon should be.

In [5]:
poke_list = []
i = 0
while i < len(poke_basic_info):
    temp = [poke_basic_info[i].text.strip(), ' '.join(poke_basic_info[i + 1].text.strip().split(" ")[1:]) if (len(poke_basic_info[i + 1].text.strip().split(" ")) > 1) & (poke_basic_info[i + 1].text.strip().split(" ")[0] in poke_basic_info[i + 1].text.strip().split(" ")[1:]) else poke_basic_info[i + 1].text.strip()] 
    temp.extend(poke_basic_info[i + 2].text.strip().split(' '))
    temp.extend(poke_basic_info[i + j].text.strip() for j in range(3, 10)) 
    poke_list.append(temp)
    i += 10
    # If Pokemon isn't dual type, insert a blank value. This will help in the dataframe creation process later
    if (len(temp) != 11):
        temp.insert(3, '')
poke_list

[['001',
  'Bulbasaur',
  'Grass',
  'Poison',
  '318',
  '45',
  '49',
  '49',
  '65',
  '65',
  '45'],
 ['002',
  'Ivysaur',
  'Grass',
  'Poison',
  '405',
  '60',
  '62',
  '63',
  '80',
  '80',
  '60'],
 ['003',
  'Venusaur',
  'Grass',
  'Poison',
  '525',
  '80',
  '82',
  '83',
  '100',
  '100',
  '80'],
 ['003',
  'Mega Venusaur',
  'Grass',
  'Poison',
  '625',
  '80',
  '100',
  '123',
  '122',
  '120',
  '80'],
 ['004', 'Charmander', 'Fire', '', '309', '39', '52', '43', '60', '50', '65'],
 ['005', 'Charmeleon', 'Fire', '', '405', '58', '64', '58', '80', '65', '80'],
 ['006',
  'Charizard',
  'Fire',
  'Flying',
  '534',
  '78',
  '84',
  '78',
  '109',
  '85',
  '100'],
 ['006',
  'Mega Charizard X',
  'Fire',
  'Dragon',
  '634',
  '78',
  '130',
  '111',
  '130',
  '85',
  '100'],
 ['006',
  'Mega Charizard Y',
  'Fire',
  'Flying',
  '634',
  '78',
  '104',
  '78',
  '159',
  '115',
  '100'],
 ['007', 'Squirtle', 'Water', '', '314', '44', '48', '65', '50', '64', '43'],
 

### Append into a json file

The cell below will sort all the Pokemon info that were scraped from the cell above before placing all its contents into a json file where it can be later read using Pandas.

In [6]:
poke_json = []
for i in range(len(poke_list)):
    dexno = poke_list[i][0]
    pokemon = poke_list[i][1]
    type1 = poke_list[i][2]
    type2 = poke_list[i][3]
    base_total = poke_list[i][4]
    hp = poke_list[i][5]
    atk = poke_list[i][6]
    def_ = poke_list[i][7]
    sp_atk = poke_list[i][8]
    sp_def = poke_list[i][9]
    speed = poke_list[i][10]
    poke_json.append({
        "dexno": dexno,
        "pokemon": pokemon,
        "type1": type1,
        "type2": type2,
        "base_total": base_total,
        "hp": hp,
        "atk": atk,
        "def_": def_,
        "sp_atk": sp_atk,
        "sp_def": sp_def,
        "speed": speed,
    })
    

In [7]:
with open('pokedex.json', 'w') as fp:
    json.dump(poke_json, fp, indent = 4)

### Read the Pokedex info
Now let us check the information that we previously saved into a json file by calling `pd.read_json` and placing all its contents on a Pandas dataframe. As you can see, all the data that we have appended can be found below. This contains all the Pokemon information along with their types and base stats.

In [8]:
jsonFile = pd.read_json("pokedex.json", orient = "records")
jsonFile

Unnamed: 0,dexno,pokemon,type1,type2,base_total,hp,atk,def_,sp_atk,sp_def,speed
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80
4,4,Charmander,Fire,,309,39,52,43,60,50,65
...,...,...,...,...,...,...,...,...,...,...,...
1070,902,Basculegion Female,Water,Ghost,530,120,92,65,100,75,78
1071,903,Sneasler,Poison,Fighting,510,80,130,60,40,80,120
1072,904,Overqwil,Dark,Poison,510,85,115,95,65,65,85
1073,905,Enamorus Incarnate Forme,Fairy,Flying,580,74,115,70,135,80,106


## Data Description
From the dataframe that pandas read from the json, let us take a look at what each variable is and what they may be used for.

### Data Dictionary
Below is the data dictionary of the dataset containing the name of the column, its datatype, and a short description regarding its content in the dataset.

In [9]:
poke_data_dict = pd.read_json("poke_data_dict.json", orient = "records")
poke_data_dict

Unnamed: 0,Variable,Code,Datatype,Description
0,Pokedex Number,dexno,int,Pokemon's Pokedex number
1,Pokemon,pokemon,String,Pokemon name
2,Primary Type,type1,String,Pokemon's primary type
3,Secondary Type,type2,String,Pokemon's secondary type (if applicable)
4,Base Stat Total,base_total,int,Pokemon's base stat total
5,Base Hit Points,hp,int,Pokemon's base HP stat
6,Base Attack,atk,int,Pokemon's base attack stat
7,Base Defense,def_,int,Pokemon's base defense stat
8,Base Special Attack,sp_atk,int,Pokemon's base special attack stat
9,Base Special Defense,sp_def,int,Pokemon's base special defense stat


From this point on, the different variables available will be shown in slightly more detail by showing its values for each entry.

### 1.) Pokedex Number (dexno) - int
This variable represents the order in which the Pokemon appears in the National Pokedex. There are some duplicates because some Pokemon have alternate forms (Mega, Alolan, Galarian, etc.).

In [10]:
jsonFile['dexno']

0         1
1         2
2         3
3         3
4         4
       ... 
1070    902
1071    903
1072    904
1073    905
1074    905
Name: dexno, Length: 1075, dtype: int64

### 2.) Pokemon (pokemon) - String
This variable represents the name of the Pokemon. Alternate forms and regional forms are included.

In [11]:
jsonFile['pokemon']

0                      Bulbasaur
1                        Ivysaur
2                       Venusaur
3                  Mega Venusaur
4                     Charmander
                  ...           
1070          Basculegion Female
1071                    Sneasler
1072                    Overqwil
1073    Enamorus Incarnate Forme
1074      Enamorus Therian Forme
Name: pokemon, Length: 1075, dtype: object

### 3.) Primary type (type1) - String
This variable represents the primary type of the Pokemon entry.

In [12]:
jsonFile['type1']

0        Grass
1        Grass
2        Grass
3        Grass
4         Fire
         ...  
1070     Water
1071    Poison
1072      Dark
1073     Fairy
1074     Fairy
Name: type1, Length: 1075, dtype: object

### 4.) Secondary (type2) - String
This variable represents the secondary type of the Pokemon entry if it has. Otherwise, an empty string fills its place.

In [13]:
jsonFile['type2']

0         Poison
1         Poison
2         Poison
3         Poison
4               
          ...   
1070       Ghost
1071    Fighting
1072      Poison
1073      Flying
1074      Flying
Name: type2, Length: 1075, dtype: object

### 5.) Base Stat Total (base_total) - int
This variable represents the base stat total of the Pokemon entry if it has. This is the sum of all the base stats for each Pokemon stat available.

In [14]:
jsonFile['base_total']

0       318
1       405
2       525
3       625
4       309
       ... 
1070    530
1071    510
1072    510
1073    580
1074    580
Name: base_total, Length: 1075, dtype: int64

### 6.) Hit Points (hp) - int
This variable represents the base hit point (HP) stat of the Pokemon.

In [15]:
jsonFile['hp']

0        45
1        60
2        80
3        80
4        39
       ... 
1070    120
1071     80
1072     85
1073     74
1074     74
Name: hp, Length: 1075, dtype: int64

### 7.) Attack (atk) - int
This variable represents the base attack (attack) stat of the Pokemon.

In [16]:
jsonFile['atk']

0        49
1        62
2        82
3       100
4        52
       ... 
1070     92
1071    130
1072    115
1073    115
1074    115
Name: atk, Length: 1075, dtype: int64

### 8.) Defense (def_) - int
This variable represents the base defense (def) stat of the Pokemon.

In [17]:
jsonFile['def_']

0        49
1        63
2        83
3       123
4        43
       ... 
1070     65
1071     60
1072     95
1073     70
1074    110
Name: def_, Length: 1075, dtype: int64

### 9.) Special Attack (sp_atk) - int
This variable represents the base special attack (Sp. Atk) stat of the Pokemon.

In [18]:
jsonFile['sp_atk']

0        65
1        80
2       100
3       122
4        60
       ... 
1070    100
1071     40
1072     65
1073    135
1074    135
Name: sp_atk, Length: 1075, dtype: int64

### 10.) Special Defense (sp_def) - int
This variable represents the base special defense (HP) stat of the Pokemon.

In [19]:
jsonFile['sp_def']

0        65
1        80
2       100
3       120
4        50
       ... 
1070     75
1071     80
1072     65
1073     80
1074    100
Name: sp_def, Length: 1075, dtype: int64

### 11.) Speed (speed) - int
This variable represents the secondary type of the Pokemon entry if it has. Otherwise, an empty string fills its place.

In [20]:
jsonFile['speed']

0        45
1        60
2        80
3        80
4        65
       ... 
1070     78
1071    120
1072     85
1073    106
1074     46
Name: speed, Length: 1075, dtype: int64