# Archetype Classifier for Pokemon TCG Decks

The primary goal of this notebook is to write and maintain a function that takes a decklist from rk9 as input, and returns its archetype as output. This will be useful for displaying metashares and winrates.

Some other useful things to have would be:
- An equivalence relation between different versions of cards. This would make it possible to count interesting things like "What was the most-played card at this event?"

- Some interesting meta-dependent analysis like 'What is the most common ace spec in Gardevoir?" or "What is the winningest ace spec in Gardevoir"? etc

In [None]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import re
import copy
import numpy as np

From the webscraping project, I already have a function to grab the URLs from the tournament's homepage, and a function to get the roster for the tournament.

In [None]:
# Set the tournament name with the year and city of the regional

tournament = 'pokemon-merida-2025'

homepage = 'https://rk9.gg/event/' + tournament

In [None]:
def get_rk9_urls(homepage):
    soup = BeautifulSoup(requests.get(homepage).text)

    tcg = soup.find("div", class_ = 'card h-100 mt-3 p-2 shadow bg-blue-050') # Locate tcg box (sometimes indigo)

    # Find url extensions for roster and pairings
    roster_code = tcg.find('a', {'href': re.compile('/roster*')})['href']  
    pairings_code = tcg.find('a', {'href': re.compile('/pairings*')})['href']

    roster_url = 'https://rk9.gg' + roster_code
    pairings_url = 'https://rk9.gg' + pairings_code
    
    return [roster_url,pairings_url]

In [None]:
def get_roster(homepage, csv = False, filename = tournament + 'roster.csv'):
    
    url = get_rk9_urls(homepage)[0] #set player roster url
    
    soup = BeautifulSoup(requests.get(url).text) #load in the soup

    table = soup.find('table') #roster page only has one table
    
    headers = table.find_all('th') #Find the column headers
    headers = [heading.string for heading in headers]
    
    body = table.find('tbody') # isolate the body of the table
    
    rows = body.find_all('tr') # get rows
    
    all_roster_data = []
    for row in rows:  # This loop isolates the text in each cell

        row_data = row.find_all('td')
        individual_row_data = [data.text.strip() for data in row_data]

        dlist = row_data[-2]
        dlist_url = dlist.find('a')['href']  # This is grabbing the decklist url, otherwise you just get "view"

        individual_row_data[-2] = dlist_url

        all_roster_data.append(individual_row_data)
        
        df = pd.DataFrame(all_roster_data, columns = headers)
        
    if not csv:
        return df #returns the desired table
    if csv:
        return df.to_csv(filename)

## Scraping decklists

Before getting into classifying the decks, I'll write a function to scrape the decklists from their page on RK9. A decklist will be store as a list of dictionaries, where each dictionary represents a card in the deck. this way it is easy to access a property of a card via the key for that property.

In [None]:
def get_list(dlist_url):
    
    dlist_soup = BeautifulSoup(requests.get("https://rk9.gg/" + dlist_url).text)
    
    dlist_table = dlist_soup.find('table', class_ = 'decklist')
    
    card_dict = dict.fromkeys(('Name', 'Type', 'Language', 'ID', 'Count'))
    deck = []

    for card_soup in dlist_table.find_all('li'):
        card_dict['Name'] = card_soup['data-cardname']
        card_dict['Type'] = card_soup['data-cardtype']
        card_dict['Language'] = card_soup['data-language']
        card_dict['ID'] = card_soup['data-setnum']
        card_dict['Count'] = int(card_soup['data-quantity'])

        final_dict = copy.copy(card_dict)

        for i in range(int(card_soup['data-quantity'])):
            deck.append(final_dict)
            
    return deck

## The Classifier

This function will need lots of maintenance. Every time the meta changes, it needs to be updated. This will take a lot of time and energy so it might be worth reaching out to Robin to see if I can use the limitless API. They must have a function like this that they maintain.

It could also be interesting to train some neural network to perform this classification, but I think we would have the same issue of needing to retrain the model everytime the meta changes. Not to mention the issue of aquiring enough **uniformly** labelled decklists to make the model reliable...

In [None]:
def archetype(deck_list):  #Classify decks
    
    oger = 0
    drago = 0
    pult = 0
    lugia = 0
    zard = 0
    pidg = 0
    gardy = 0
    pagos = 0
    noir = 0
    palk = 0
    bolt = 0
    comfey = 0
    tina = 0
    colress = 0
    thorns = 0
    hammer = 0
    lax = 0
    dte = 0
    miraidon = 0
    
    for card in deck_list:
        if card['Name'] == 'Teal Mask Ogerpon ex':
            oger += 1
            
        if card['Name'] == "Regidrago VSTAR":
            drago += 1
            
        if card['Name'] == "Dragapult ex":
            pult += 1
        
        if card['Name'] == "Lugia VSTAR":
            lugia += 1
        
        if card['Name'] == "Charizard ex":
            zard += 1
            
        if card['Name'] == "Pidgeot ex":
            pidg += 1
        
        if card['Name'] == "Gardevoir ex":
            gardy += 1
        
        if card['Name'] == "Dusknoir":
            noir += 1
            
        if card['Name'] == "Origin Forme Palkia VSTAR":
            palk += 1
            
        if card['Name'] == "Raging Bolt ex":
            bolt += 1
        
        if card['Name'] == "Comfey":
            comfey += 1
            
        if card['Name'] == "Giratina_VSTAR":
            tina += 1
        
        if card['Name'] == "Colress's Experiment":
            colress += 1
            
        if card['Name'] == "Iron Thorns ex":
            thorns += 1
        
        if card['Name'] == "Crushing Hammer":
            hammer += 1
            
        if card['Name'] == "Snorlax":
            lax += 1
        
        if card['Name'] == 'Double Turbo Energy':
            dte += 1
            
        if card['Name'] == 'Miraidon ex':
            miraidon += 1
            
    #Regidrago
    if oger >= 3 and drago >= 3:
        return 'Regidrago VSTAR'

    #Raging Bolt
    elif oger >=3 and bolt >= 3:
        return 'Raging Bolt ex'

    #Dragapult
    elif pult >=3:
        return 'Dragapult ex'

    #Charizard
    elif zard >=2 and pidg >=2:
        return 'Charizard ex'

    #Lugia
    elif lugia >= 2:
        return 'Lugia VSTAR'

    #Gardevoir
    elif gardy >= 2:
        return 'Gardevoir ex'

    #Terapagos
    elif pagos >=2 and dte >= 3:
        return 'Terapagos ex'

    #Palkia
    elif palk >=2 and pagos >= 2:
        return 'Palkia VSTAR'

    #Lost box
    elif comfey>=3 and colress==4 and tina == 0:
        return 'Lost Zone Toolbox'

    #Thorns
    elif thorns == 4 and hammer > 0:
        return 'Iron Thorns ex'

    #Snorlax
    elif lax == 4:
        return 'Snorlax Stall'
    
    #Miraidon
    elif miraidon >= 1:
        return 'Miraidon ex'

    else:
        return 'Other'
    

## One Eternity Later...

This function will iterate through the roster (Extremely slow! Surely there's a smarter way...) and classify each deck. The output is a dictionary with player names + country as keys and their deck archetype as the value.

In [None]:
def get_archetypes(roster):
    archetypes = []
    total_players = len(roster)
    
    for player_index in range(total_players):

        if roster.at[player_index, 'Division'] == 'Masters':
            dlist_url = roster.at[player_index,"Deck List"]
            dlist = get_list(dlist_url)

            deck = archetype(dlist)
            
            print(str(player_index) + ' of ' + str(total_players))
            archetypes.append(deck)
    return archetypes

Finally, generating the roster, calculating the archetypes, and replacing the decklist urls with the archetypes leaves me with the table I wanted.

In [None]:
roster = get_roster(homepage)

roster['Deck List'] = get_archetypes(roster)

roster

In [None]:
roster.to_csv('Roster_with_Archetypes_from_'+tournment+'.csv')