# Archetype Classifier for Pokemon TCG Decks

The primary goal of this notebook is to write and maintain a function that takes a decklist from rk9 as input, and returns its archetype as output. This will be useful for displaying metashares and winrates.

Some other useful things to have would be:
- An equivalence relation between different versions of cards. This would make it possible to count interesting things like "What was the most-played card at this event?", and could make the clissifier more robust

- Some interesting meta-dependent analysis like 'What is the most common ace spec in Gardevoir?" or "What is the winningest ace spec in Gardevoir"? etc

- Most popular deck among players with at least 1200 elo?

In [99]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import re
import copy
import numpy as np

From the webscraping project, I already have a function to grab the URLs from the tournament's homepage, and a function to get the roster for the tournament.

In [100]:
# Set the tournament name with the year and city of the regional

tournament = 'pokemon-bologna-2025'

homepage = 'https://rk9.gg/event/' + tournament

In [101]:
def get_rk9_urls(homepage):
    soup = BeautifulSoup(requests.get(homepage).text)

    tcg = soup.find("div", class_ = 'card h-100 mt-3 p-2 shadow bg-blue-050') # Locate tcg box (sometimes indigo)

    # Find url extensions for roster and pairings
    roster_code = tcg.find('a', {'href': re.compile('/roster*')})['href']  
    pairings_code = tcg.find('a', {'href': re.compile('/pairings*')})['href']

    roster_url = 'https://rk9.gg' + roster_code
    pairings_url = 'https://rk9.gg' + pairings_code
    
    return [roster_url,pairings_url]

In [102]:
def get_roster(homepage, csv = False, filename = tournament + 'roster.csv'):
    
    url = get_rk9_urls(homepage)[0] #set player roster url
    
    soup = BeautifulSoup(requests.get(url).text) #load in the soup

    table = soup.find('table') #roster page only has one table
    
    headers = table.find_all('th') #Find the column headers
    headers = [heading.string for heading in headers]
    
    body = table.find('tbody') # isolate the body of the table
    
    rows = body.find_all('tr') # get rows
    
    all_roster_data = []
    for row in rows:  # This loop isolates the text in each cell

        row_data = row.find_all('td')
        individual_row_data = [data.text.strip() for data in row_data]

        dlist = row_data[-2]
        dlist_url = dlist.find('a')['href']  # This is grabbing the decklist url, otherwise you just get "view"

        individual_row_data[-2] = dlist_url

        all_roster_data.append(individual_row_data)
        
        df = pd.DataFrame(all_roster_data, columns = headers)
        
    if not csv:
        return df #returns the desired table
    if csv:
        return df.to_csv(filename)

## Scraping decklists

Before getting into classifying the decks, I'll write a function to scrape the decklists from their page on RK9. A decklist will be store as a list of dictionaries, where each dictionary represents a card in the deck. this way it is easy to access a property of a card via the key for that property.

In [103]:
def get_list(dlist_url):
    
    dlist_soup = BeautifulSoup(requests.get("https://rk9.gg/" + dlist_url).text)
    
    dlist_table = dlist_soup.find('table', class_ = 'decklist')
    
    card_dict = dict.fromkeys(('Name', 'Type', 'Language', 'ID', 'Count'))
    deck = []

    for card_soup in dlist_table.find_all('li'):
        card_dict['Name'] = card_soup['data-cardname']
        card_dict['Type'] = card_soup['data-cardtype']
        card_dict['Language'] = card_soup['data-language']
        card_dict['ID'] = card_soup['data-setnum']
        card_dict['Count'] = int(card_soup['data-quantity'])

        final_dict = copy.copy(card_dict)

        for i in range(int(card_soup['data-quantity'])):
            deck.append(final_dict)
            
    return deck

## The Classifier

This function will need lots of maintenance. Every time the meta changes, it needs to be updated. This will take a lot of time and energy so it might be worth reaching out to Robin to see if I can use the limitless API. They must have a function like this that they maintain.

It could also be interesting to train some neural network to perform this classification, but I think we would have the same issue of needing to retrain the model everytime the meta changes. Not to mention the issue of aquiring enough **uniformly** labelled decklists to make the model reliable...

In [104]:
def archetype(deck_list):  #Classify decks
    
    oger = 0
    pult = 0
    zard = 0
    pidg = 0
    gardy = 0
    pagos = 0
    noir = 0
    bolt = 0
    thorns = 0
    hammer = 0
    miraidon = 0
    arch = 0
    ceruledge = 0
    klawf = 0
    dengo = 0
    gouging = 0
    moon = 0
    bouf = 0
    noctowl = 0
    zoro = 0
    reshiram = 0
    eevee = 0
    eevee_ex = 0
    joltik = 0
    
    for card in deck_list:
        if card['Name'] == 'Teal Mask Ogerpon ex':
            oger += 1
            
        if card['Name'] == "Dragapult ex":
            pult += 1
        
        if card['Name'] == "Charizard ex":
            zard += 1
            
        if card['Name'] == "Pidgeot ex":
            pidg += 1
        
        if card['Name'] == "Gardevoir ex":
            gardy += 1
        
        if card['Name'] == "Dusknoir":
            noir += 1
            
        if card['Name'] == "Raging Bolt ex":
            bolt += 1
            
        if card['Name'] == 'Miraidon ex':
            miraidon += 1
            
        if card['Name'] == 'Archaludon ex':
            arch += 1
        
        if card['Name'] == 'Ceruledge ex':
            ceruledge += 1
        
        if card['Name'] == 'Gholdengo ex':
            dengo += 1
        
        if card['Name'] == 'Boufalant':
            bouf += 1
            
        if card['Name'] == 'Noctowl':
            noctowl += 1
            
        if card['Name'] == "N's Zoroark ex":
            zoro += 1
        
        if card['Name'] == "N's Reshiram":
            reshiram += 1
            
        if card['Name'] == 'Eevee':
            eevee += 1
            
        if card['Name'] == 'Eevee ex':
            eevee_ex += 1
        
        if card['Name'] == 'Joltik':
            joltik += 1
            
    # Eevees
    
    if eevee + eevee_ex >= 3:
        return 'Eevees'

    #Raging Bolt
    elif oger >=2 and bolt >= 2:
        return 'Raging Bolt ex'

    #Dragapult & pult zard
    elif pult >=2:
        if noir > 0:
            return 'Dragapult Dusknoir'
        elif zard == 0:
            return 'Straight Dragapult'
        else:
            return 'Pult Zard'
    
    #Tera box
    elif oger >= 2 and bolt == 0:
        return "Tera box"
    
    #Gardevoir
    elif gardy >= 2:
        return 'Gardevoir ex'

    #Terapagos
    elif pagos >=2 and bouf >= 2:
        return 'Tanky Terapagos ex'
    
    #Archaludon
    elif arch >=3:
        return 'Archaludon ex'

    #Gholdengo ex
    elif dengo >= 3:
        return 'Gholdengo ex'
    
    #zoroark
    elif zoro >=3 and reshiram >=1:
        return "N's Zoroark ex"
    
    #joltik box
    elif joltik > 0:
        return 'Joltik Box'
    
    else:
        return 'Other'
    

## One Eternity Later...

This function will iterate through the roster (Extremely slow! Surely there's a smarter way...) and classify each deck. The output is a list of deck archetypes that can be added to the roster dataframe.

In [105]:
def get_archetypes(roster):
    archetypes = []
    total_players = len(roster)
    
    for player_index in range(total_players):

        if roster.at[player_index, 'Division'] == 'Masters':
            dlist_url = roster.at[player_index,"Deck List"]
            dlist = get_list(dlist_url)

            deck = archetype(dlist)
            
            print(str(player_index) + ' of ' + str(total_players))
            archetypes.append(deck)
    return archetypes

Finally, generating the roster, calculating the archetypes, and add the archetypes in their own column leaves me with the table I wanted.

In [106]:
roster = get_roster(homepage)
mask = (roster['Division'] == "Masters")
roster = roster[mask]
roster = roster.reset_index()
roster

Unnamed: 0,index,Player ID,First name,Last name,Country,Division,Deck List,Standing
0,0,3....4,Elias,Stratmann,DE,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/02LGNCJr...,45
1,2,4....2,niccolò,genna,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0Kueckon...,147
2,3,4....2,Daphne,Tonge,UK,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0LsTAh0j...,612
3,4,4....4,leonardo,tabani,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0QNPiCFS...,1144
4,5,5....1,Manuel Giuseppe,La Iacona,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0RSdXSY1...,1196
...,...,...,...,...,...,...,...,...
1232,1482,4....3,Lorenzo,Murzilli,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zojNRTEv...,1218
1233,1483,4....7,Cedric,Bürkel,DE,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zrOQZckq...,843
1234,1484,4....6,Daniele,Antonazzo,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zralIvJ3...,585
1235,1485,4....4,Morgan,Pattini,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zuB3Vxs7...,1062


In [107]:
archetypes = get_archetypes(roster)

roster['Archetype'] = archetypes

roster

0 of 1237
1 of 1237
2 of 1237
3 of 1237
4 of 1237
5 of 1237
6 of 1237
7 of 1237
8 of 1237
9 of 1237
10 of 1237
11 of 1237
12 of 1237
13 of 1237
14 of 1237
15 of 1237
16 of 1237
17 of 1237
18 of 1237
19 of 1237
20 of 1237
21 of 1237
22 of 1237
23 of 1237
24 of 1237
25 of 1237
26 of 1237
27 of 1237
28 of 1237
29 of 1237
30 of 1237
31 of 1237
32 of 1237
33 of 1237
34 of 1237
35 of 1237
36 of 1237
37 of 1237
38 of 1237
39 of 1237
40 of 1237
41 of 1237
42 of 1237
43 of 1237
44 of 1237
45 of 1237
46 of 1237
47 of 1237
48 of 1237
49 of 1237
50 of 1237
51 of 1237
52 of 1237
53 of 1237
54 of 1237
55 of 1237
56 of 1237
57 of 1237
58 of 1237
59 of 1237
60 of 1237
61 of 1237
62 of 1237
63 of 1237
64 of 1237
65 of 1237
66 of 1237
67 of 1237
68 of 1237
69 of 1237
70 of 1237
71 of 1237
72 of 1237
73 of 1237
74 of 1237
75 of 1237
76 of 1237
77 of 1237
78 of 1237
79 of 1237
80 of 1237
81 of 1237
82 of 1237
83 of 1237
84 of 1237
85 of 1237
86 of 1237
87 of 1237
88 of 1237
89 of 1237
90 of 1237
91 of 123

692 of 1237
693 of 1237
694 of 1237
695 of 1237
696 of 1237
697 of 1237
698 of 1237
699 of 1237
700 of 1237
701 of 1237
702 of 1237
703 of 1237
704 of 1237
705 of 1237
706 of 1237
707 of 1237
708 of 1237
709 of 1237
710 of 1237
711 of 1237
712 of 1237
713 of 1237
714 of 1237
715 of 1237
716 of 1237
717 of 1237
718 of 1237
719 of 1237
720 of 1237
721 of 1237
722 of 1237
723 of 1237
724 of 1237
725 of 1237
726 of 1237
727 of 1237
728 of 1237
729 of 1237
730 of 1237
731 of 1237
732 of 1237
733 of 1237
734 of 1237
735 of 1237
736 of 1237
737 of 1237
738 of 1237
739 of 1237
740 of 1237
741 of 1237
742 of 1237
743 of 1237
744 of 1237
745 of 1237
746 of 1237
747 of 1237
748 of 1237
749 of 1237
750 of 1237
751 of 1237
752 of 1237
753 of 1237
754 of 1237
755 of 1237
756 of 1237
757 of 1237
758 of 1237
759 of 1237
760 of 1237
761 of 1237
762 of 1237
763 of 1237
764 of 1237
765 of 1237
766 of 1237
767 of 1237
768 of 1237
769 of 1237
770 of 1237
771 of 1237
772 of 1237
773 of 1237
774 of 1237
775 

Unnamed: 0,index,Player ID,First name,Last name,Country,Division,Deck List,Standing,Archetype
0,0,3....4,Elias,Stratmann,DE,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/02LGNCJr...,45,Gholdengo ex
1,2,4....2,niccolò,genna,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0Kueckon...,147,Raging Bolt ex
2,3,4....2,Daphne,Tonge,UK,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0LsTAh0j...,612,Joltik Box
3,4,4....4,leonardo,tabani,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0QNPiCFS...,1144,Gholdengo ex
4,5,5....1,Manuel Giuseppe,La Iacona,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0RSdXSY1...,1196,Other
...,...,...,...,...,...,...,...,...,...
1232,1482,4....3,Lorenzo,Murzilli,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zojNRTEv...,1218,Straight Dragapult
1233,1483,4....7,Cedric,Bürkel,DE,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zrOQZckq...,843,Other
1234,1484,4....6,Daniele,Antonazzo,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zralIvJ3...,585,Raging Bolt ex
1235,1485,4....4,Morgan,Pattini,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zuB3Vxs7...,1062,Gardevoir ex


In [108]:
roster["Name plus country"] = roster['First name'] + ' ' + roster['Last name'] + ' [' + roster['Country'] + ']'

In [109]:
roster

Unnamed: 0,index,Player ID,First name,Last name,Country,Division,Deck List,Standing,Archetype,Name plus country
0,0,3....4,Elias,Stratmann,DE,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/02LGNCJr...,45,Gholdengo ex,Elias Stratmann [DE]
1,2,4....2,niccolò,genna,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0Kueckon...,147,Raging Bolt ex,niccolò genna [IT]
2,3,4....2,Daphne,Tonge,UK,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0LsTAh0j...,612,Joltik Box,Daphne Tonge [UK]
3,4,4....4,leonardo,tabani,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0QNPiCFS...,1144,Gholdengo ex,leonardo tabani [IT]
4,5,5....1,Manuel Giuseppe,La Iacona,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/0RSdXSY1...,1196,Other,Manuel Giuseppe La Iacona [IT]
...,...,...,...,...,...,...,...,...,...,...
1232,1482,4....3,Lorenzo,Murzilli,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zojNRTEv...,1218,Straight Dragapult,Lorenzo Murzilli [IT]
1233,1483,4....7,Cedric,Bürkel,DE,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zrOQZckq...,843,Other,Cedric Bürkel [DE]
1234,1484,4....6,Daniele,Antonazzo,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zralIvJ3...,585,Raging Bolt ex,Daniele Antonazzo [IT]
1235,1485,4....4,Morgan,Pattini,IT,Masters,/decklist/public/BO01w6JENCbLrU9Em1ia/zuB3Vxs7...,1062,Gardevoir ex,Morgan Pattini [IT]


In [110]:
roster.to_csv('Roster_with_Archetypes_from_'+tournament+'.csv')

In [111]:
tournament

'pokemon-bologna-2025'