# Part A: Data Extraction

1. df: from "poke_name.txt" which includes global index, english name simplified chinese name and Japanese name of each pokemon
2. temp_df: from "data\\trans_name.csv". This file is extracted from pokemon wiki. Users should update the data by running the crawling program periodically to make sure the data are newest.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re

In [2]:
with open("data\\poke_name.txt", "r", encoding='utf-8') as file:
    content = file.read()
    file.close()

In [3]:
name_list = re.split(r'\n', content)[1:-1]
columns = re.split(r'\t', re.split(r'\n', content)[0])

In [4]:
columns

['DEX', 'EN_NAME', 'CN_NAME', 'JP_NAME']

In [5]:
df = pd.DataFrame(columns= columns)
for i, name in enumerate(name_list):
    li = re.split(r'\t', name)
    DEX = int(li[0][1:])
    EN_NAME = li[1]
    CN_NAME = li[2]
    JP_NAME = li[3]
    df = df.append(pd.Series([DEX, EN_NAME, CN_NAME, JP_NAME], index=columns), ignore_index=True)

In [6]:
df.head()

Unnamed: 0,DEX,EN_NAME,CN_NAME,JP_NAME
0,0,Egg,蛋,タマゴ
1,1,Bulbasaur,妙蛙种子,フシギダネ
2,2,Ivysaur,妙蛙草,フシギソウ
3,3,Venusaur,妙蛙花,フシギバナ
4,4,Charmander,小火龙,ヒトカゲ


In [7]:
df.tail()

Unnamed: 0,DEX,EN_NAME,CN_NAME,JP_NAME
1006,1006,Toedscool,原野水母,ノノクラゲ
1007,1007,Toedscruel,陆地水母,リククラゲ
1008,1008,Kingambit,仆刀将军,ドドゲザン
1009,1009,Clodsire,土王,ドオー
1010,1010,Annihilape,弃世猴,コノヨザル


The index of pokemon isn't correct, so we need to use the web crawler to get global index and name from the wiki

In [8]:
temp_df = pd.read_csv("data\\glo_name.csv")
temp_df.rename(columns = {'CNT_NAME':'CN_NAME'}, inplace = True)

In [9]:
# temp_df = pd.read_csv("data\\trans_name.csv")
# temp_df.columns

In [10]:
temp_df = temp_df.drop('Unnamed: 0', axis=1)

In [11]:
temp_df.head()

Unnamed: 0,ORI_DEX,DEX,URL,CN_NAME
0,1,906,/wiki/%E6%96%B0%E5%8F%B6%E5%96%B5,新叶喵
1,2,907,/wiki/%E8%92%82%E8%95%BE%E5%96%B5,蒂蕾喵
2,3,908,/wiki/%E9%AD%94%E5%B9%BB%E5%81%87%E9%9D%A2%E5%...,魔幻假面喵
3,4,909,/wiki/%E5%91%86%E7%81%AB%E9%B3%84,呆火鳄
4,5,910,/wiki/%E7%82%99%E7%83%AB%E9%B3%84,炙烫鳄


## A (II). Get Pokemon Information

All battle information of pokemons are stored in the file with the name "poke_info.txt". Unfortunetely, the data structure isn't organized. In the below section, I am going to convert the file into readable format in pandas.

1. read data from "data\\poke_info.txt" with utf-8 encoding

In [12]:
with open("data\\poke_info.txt", "r", encoding='utf-8') as file:
    content = file.read()
    file.close()

In [1]:
first = False
second = False

is_name = False
mem = {}
names= []
stats = []
temp_stat = []
ind = -1
for each in re.split(r'\n', content):
    if each != "======":
        if is_name:
            names.append(each)
            stats.append([])
            ind+=1
        else:
            stats[ind].append(each)
    else:
        is_name = not is_name

NameError: name 're' is not defined

In [14]:
len(names), len(stats)

(667, 667)

In [15]:
strength_name = [
    "hp",
    "attack",
    "defense",
    "sp_attack",
    "sp_defense",
    "speed"
]

2. Show one of pokemon data
--> The name contains index number, name and evolution phase.
--> Each STAT includes many details. EV, Type are main objects that we will use in the further analysis

In [45]:
names[20]

'051 - Dugtrio #149 (Stage: 2)'

In [2]:
stats[0]

IndexError: list index out of range

In [101]:
len(stats)

667

3. Selected information (Name)
Convert the name to array 

    e.g.

    004 - Charmander (Stage: 1) --> [4, Charmander, 1]

In [18]:
names[0]

'004 - Charmander (Stage: 1)'

In [46]:
temp = names[20].split(" ")
temp

['051', '-', 'Dugtrio', '#149', '(Stage:', '2)']

In [20]:
df.columns

Index(['DEX', 'EN_NAME', 'CN_NAME', 'JP_NAME'], dtype='object')

In [21]:
temp_df["CN_NAME"]

0        新叶喵
1        蒂蕾喵
2      魔幻假面喵
3        呆火鳄
4        炙烫鳄
       ...  
395      古玉鱼
396      轰鸣月
397      铁武者
398      故勒顿
399      密勒顿
Name: CN_NAME, Length: 400, dtype: object

In [47]:
print("Global Index is: %s"%temp[0])
print("Pokemon Name is: %s"%temp[2])
print("Pokemon Stage is: %s"%temp[4][:-1])

Global Index is: 051
Pokemon Name is: Dugtrio
Pokemon Stage is: (Stage


In [59]:
mem = {}
for name in names:
    length = len(name.split(" "))
    if length in mem:
        mem[length].append(name)
    else:
        mem[length] = []

print(len(mem[5]))
print(len(mem[6]))
print(len(mem[7]))
print(mem[5][:5])
print(mem[6][0])
print(mem[7][1])

144
507
13
['005 - Charmeleon (Stage: 2)', '006 - Charizard (Stage: 3)', '1028 - Raichu-1 (Stage: 3)', '1033 - Diglett-1 (Stage: 1)', '1034 - Dugtrio-1 (Stage: 2)']
1019 - Pikachu-1 #074 (Stage: 2)
981 - Sandy Shocks #381 (Stage: 3)


1. DEX: global index of pokemon
2. NAME: English name
3. CH_NAME: Chinese name
4. STAT: abilities of pokemon
    a. hp
    b. attack (physical)
    c. defend
    d. sp_attack (magic)
    e. sp_defend
    f. speed
5. attr: attributes of pokemon. If a pokemon has only one attribute, then attr1 == attr2
    a. attr1 
    b. attr2
6. skills? may seperate into other dataframe. pickle save?

# Part B. Match English and Chinese Names of Pokemons

In [33]:
print(temp_df.iloc[0])
print("################")
print(df.iloc[0])

ORI_DEX                                    1
DEX                                      906
URL        /wiki/%E6%96%B0%E5%8F%B6%E5%96%B5
CN_NAME                                  新叶喵
Name: 0, dtype: object
################
DEX          0
EN_NAME    Egg
CN_NAME      蛋
JP_NAME    タマゴ
Name: 0, dtype: object


In [43]:
# CN_NAME (key) : EN_NAME (value)
print(df.head(5))
for i in range(len(df)):
    mem[df.iloc[i]["CN_NAME"]] = df.iloc[i]["EN_NAME"]

  DEX     EN_NAME CN_NAME JP_NAME
0   0         Egg       蛋     タマゴ
1   1   Bulbasaur    妙蛙种子   フシギダネ
2   2     Ivysaur     妙蛙草   フシギソウ
3   3    Venusaur     妙蛙花   フシギバナ
4   4  Charmander     小火龙    ヒトカゲ


## B1. Eliminate Duplication

In [44]:
for i in range(len(temp_df)):
    if temp_df.iloc[i]["CN_NAME"] not in mem:
        print(temp_df.iloc[i]["CN_NAME"])

蒂蕾喵
毽子绵
乌波帕底亚的样子
火爆猴
肯泰罗帕底亚的样子
谜拟Ｑ
毛崖蟹
沙丘娃‎
噬沙堡爷‎
米立龙
戟脊龙
赛富豪


In [60]:
temp = []
for name in names:
    temp.append(name.split(" ")[2])

In [61]:
temp[:3]

['Charmander', 'Charmeleon', 'Charizard']

In [87]:
df[df["EN_NAME"]=='----']

Unnamed: 0,DEX,EN_NAME,CN_NAME,JP_NAME
980,980,----,----,----
987,987,----,----,----


In [80]:
mem = set(temp)
count = 0
unknowns = []
for name in df.iloc[temp_df["DEX"]]["EN_NAME"]:
    if name in mem:
        count += 1
    else:
        unknowns.append(name)
print(count)

384


In [70]:
unknowns

['----',
 'Brute Bonnet',
 'Scream Tail',
 'Sandy Shocks',
 'Flutter Mane',
 'Great Tusk',
 'Slither Wing',
 'Roaring Moon',
 'Iron Treads',
 '----',
 'Iron Moth',
 'Iron Hands',
 'Iron Jugulis',
 'Iron Thorns',
 'Iron Bundle',
 'Iron Valiant']

In [None]:
cols = []
strength_name

In [91]:
stats[0][0].split(" ")[2].split(".")

['39', '52', '43', '60', '50', '65']

In [100]:
df.iloc[temp_df["DEX"]]

Unnamed: 0,DEX,EN_NAME,CN_NAME,JP_NAME
906,906,Sprigatito,新叶喵,ニャオハ
907,907,Floragato,蒂莆喵,ニャローテ
908,908,Meowscarada,魔幻假面喵,マスカーニャ
909,909,Fuecoco,呆火鳄,ホゲータ
910,910,Crocalor,炙烫鳄,アチゲータ
...,...,...,...,...
1004,1004,Armarouge,红莲铠骑,グレンアルマ
1005,1005,Ceruledge,苍炎刃鬼,ソウブレイズ
1006,1006,Toedscool,原野水母,ノノクラゲ
1007,1007,Toedscruel,陆地水母,リククラゲ


In [84]:
names[0]

'004 - Charmander (Stage: 1)'

In [3]:
df = pd.read_csv("data\\pokemon.csv")

In [4]:
df.head(10)

Unnamed: 0,abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,...,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
0,"['Overgrow', 'Chlorophyll']",1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,...,88.1,1,65,65,45,grass,poison,6.9,1,0
1,"['Overgrow', 'Chlorophyll']",1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,...,88.1,2,80,80,60,grass,poison,13.0,1,0
2,"['Overgrow', 'Chlorophyll']",1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,...,88.1,3,122,120,80,grass,poison,100.0,1,0
3,"['Blaze', 'Solar Power']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,...,88.1,4,60,50,65,fire,,8.5,1,0
4,"['Blaze', 'Solar Power']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,...,88.1,5,80,65,80,fire,,19.0,1,0
5,"['Blaze', 'Solar Power']",0.25,1.0,1.0,2.0,0.5,0.5,0.5,1.0,1.0,...,88.1,6,159,115,100,fire,flying,90.5,1,0
6,"['Torrent', 'Rain Dish']",1.0,1.0,1.0,2.0,1.0,1.0,0.5,1.0,1.0,...,88.1,7,50,64,43,water,,9.0,1,0
7,"['Torrent', 'Rain Dish']",1.0,1.0,1.0,2.0,1.0,1.0,0.5,1.0,1.0,...,88.1,8,65,80,58,water,,22.5,1,0
8,"['Torrent', 'Rain Dish']",1.0,1.0,1.0,2.0,1.0,1.0,0.5,1.0,1.0,...,88.1,9,135,115,78,water,,85.5,1,0
9,"['Shield Dust', 'Run Away']",1.0,1.0,1.0,1.0,1.0,0.5,2.0,2.0,1.0,...,50.0,10,20,20,45,bug,,2.9,1,0


In [7]:
df.tail(10)

Unnamed: 0,abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,...,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
791,['Shadow Shield'],1.0,4.0,1.0,1.0,1.0,0.0,1.0,1.0,4.0,...,,792,137,107,97,psychic,ghost,120.0,7,1
792,['Beast Boost'],0.5,1.0,1.0,1.0,0.5,1.0,0.5,0.5,1.0,...,,793,127,131,103,rock,poison,55.5,7,1
793,['Beast Boost'],0.5,0.5,1.0,1.0,2.0,0.5,2.0,4.0,1.0,...,,794,53,53,79,bug,fighting,333.6,7,1
794,['Beast Boost'],0.5,0.5,1.0,1.0,2.0,0.5,2.0,4.0,1.0,...,,795,137,37,151,bug,fighting,25.0,7,1
795,['Beast Boost'],1.0,1.0,1.0,0.5,1.0,1.0,1.0,0.5,1.0,...,,796,173,71,83,electric,,100.0,7,1
796,['Beast Boost'],0.25,1.0,0.5,2.0,0.5,1.0,2.0,0.5,1.0,...,,797,107,101,61,steel,flying,999.9,7,1
797,['Beast Boost'],1.0,1.0,0.5,0.5,0.5,2.0,4.0,1.0,1.0,...,,798,59,31,109,grass,steel,0.1,7,1
798,['Beast Boost'],2.0,0.5,2.0,0.5,4.0,2.0,0.5,1.0,0.5,...,,799,97,53,43,dark,dragon,888.0,7,1
799,['Prism Armor'],2.0,2.0,1.0,1.0,1.0,0.5,1.0,1.0,2.0,...,,800,127,89,79,psychic,,230.0,7,1
800,['Soul-Heart'],0.25,0.5,0.0,1.0,0.5,1.0,2.0,0.5,1.0,...,,801,130,115,65,steel,fairy,80.5,7,1


In [6]:
df.columns

Index(['abilities', 'against_bug', 'against_dark', 'against_dragon',
       'against_electric', 'against_fairy', 'against_fight', 'against_fire',
       'against_flying', 'against_ghost', 'against_grass', 'against_ground',
       'against_ice', 'against_normal', 'against_poison', 'against_psychic',
       'against_rock', 'against_steel', 'against_water', 'attack',
       'base_egg_steps', 'base_happiness', 'base_total', 'capture_rate',
       'classfication', 'defense', 'experience_growth', 'height_m', 'hp',
       'japanese_name', 'name', 'percentage_male', 'pokedex_number',
       'sp_attack', 'sp_defense', 'speed', 'type1', 'type2', 'weight_kg',
       'generation', 'is_legendary'],
      dtype='object')

In [8]:
df.iloc[0]

abilities            ['Overgrow', 'Chlorophyll']
against_bug                                  1.0
against_dark                                 1.0
against_dragon                               1.0
against_electric                             0.5
against_fairy                                0.5
against_fight                                0.5
against_fire                                 2.0
against_flying                               2.0
against_ghost                                1.0
against_grass                               0.25
against_ground                               1.0
against_ice                                  2.0
against_normal                               1.0
against_poison                               1.0
against_psychic                              2.0
against_rock                                 1.0
against_steel                                1.0
against_water                                0.5
attack                                        49
base_egg_steps      

In [28]:
import ast

ability_list = set()
for i in df["abilities"]:
    i = ast.literal_eval(i)

    for j in i:
        ability_list.add(j)

In [31]:
ability_list

{'Adaptability',
 'Aftermath',
 'Air Lock',
 'Analytic',
 'Anger Point',
 'Anticipation',
 'Arena Trap',
 'Aroma Veil',
 'Aura Break',
 'Bad Dreams',
 'Battery',
 'Battle Armor',
 'Battle Bond',
 'Beast Boost',
 'Berserk',
 'Big Pecks',
 'Blaze',
 'Bulletproof',
 'Cheek Pouch',
 'Chlorophyll',
 'Clear Body',
 'Cloud Nine',
 'Color Change',
 'Comatose',
 'Competitive',
 'Compoundeyes',
 'Contrary',
 'Corrosion',
 'Cursed Body',
 'Cute Charm',
 'Damp',
 'Dancer',
 'Dark Aura',
 'Dazzling',
 'Defeatist',
 'Defiant',
 'Disguise',
 'Download',
 'Drizzle',
 'Drought',
 'Dry Skin',
 'Early Bird',
 'Effect Spore',
 'Electric Surge',
 'Emergency Exit',
 'Fairy Aura',
 'Filter',
 'Flame Body',
 'Flare Boost',
 'Flash Fire',
 'Flower Gift',
 'Flower Veil',
 'Fluffy',
 'Forecast',
 'Forewarn',
 'Friend Guard',
 'Frisk',
 'Full Metal Body',
 'Fur Coat',
 'Gale Wings',
 'Galvanize',
 'Gluttony',
 'Gooey',
 'Grass Pelt',
 'Grassy Surge',
 'Guts',
 'Harvest',
 'Healer',
 'Heatproof',
 'Heavy Metal',
 

In [9]:
strength_name = [
    "hp",
    "attack",
    "sp_attack",
    "defense",
    "sp_defense",
    "speed"
]

In [10]:
df[strength_name]

Unnamed: 0,hp,attack,sp_attack,defense,sp_defense,speed
0,45,49,65,49,65,45
1,60,62,80,63,80,60
2,80,100,122,123,120,80
3,39,52,60,43,50,65
4,58,64,80,58,65,80
...,...,...,...,...,...,...
796,97,101,107,103,101,61
797,59,181,59,131,31,109
798,223,101,97,53,53,43
799,97,107,127,101,89,79


In [39]:
race = 154
ind = 31
effort=255
hp = int(race*2+ind+effort/4+100+10)

race = 50
effort = 0
ab = int(race*2+ind+effort/4+5)
print(hp, ab)

512 136


In [40]:
personality = [[0]*6]*16

In [47]:
buff = [
    ["+hp",1.1],
    ["+pa",1.1],
    ["+ma",1.1],
    ["+pd",1.1],
    ["+md",1.1],
    ["+sp",1.1]
]

debuff = [
    ["-hp",0.9],
    ["-pa",0.9],
    ["-ma",0.9],
    ["-pd",0.9],
    ["-md",0.9],
    ["-sp",0.9]
]
length = len(buff)
mem = {}
for i in range(length):
    for j in range(length):
        if i!=j:
            string = buff[i][0]+debuff[j][0]
            val = [1]*6
            val[i] = buff[i][1]
            val[j] = debuff[j][1]
            mem[string] = val
            

In [None]:
def ab_cal(races, efforts, codes):
    # calculate hp of pokemon
    races[0] = int(races[0]*2+31+efforts[0]/4+100+10) * codes[0]
    
    # calculate abilities of pokemon
    for i in range(1, len(races)):
        races[i] = int(races[i]*2+31+efforts[i]/4+5)*codes[1]
    
    return races


In [53]:
races = df.iloc[0][strength_name].values

In [None]:
ab_cal(races, e, mem["+sp-pa"])