# ALGS Data Scraping

This notebook will be used for scraping data from https://liquipedia.net/apexlegends/Apex_Legends_Global_Series/2021-22 for their match data for the ALGS Year 2 season.

We want to explore this data for the IronViz Tableau Competition

The data we want to collect is the following: 
- Region
- Round Number (set of games)
- Game Number (refered to as round in table)

For preseason:
- Qualifier Round (for preseason qualifiers)
- Lobby number 

For splits:
- Circuit Round (for challenger circuit)
- Rounds (set of games) for pro league
- Game number (refered to as round in table)
- Playoffs games

For championship:
- LCQs
- Winners Bracket, Losers Bracket, Finals
- Round number

For the games with group stages:
- Groups A, B, C, D

In [46]:
from collections import defaultdict
import re
import requests
from bs4 import BeautifulSoup

To make the process a bit more straightforward, we will be creating lists to iterate over. We might not use some of these lists but it will help us at least keep organization straightforward. 

The lists will include for all data:
- Region
- Round Number (set of games)

For preseason:
- Qualifier Round (for preseason qualifiers)

For splits:
- Circuit Round (for challenger circuit)
- Rounds (set of games) for pro league
- Playoffs games

For championship:
- Winners Bracket, Losers Bracket, Finals
- Round number

For the games with group stages:
- Groups A, B, C, D
<hr>

We will go in the following order for our data scraping:

- Preseason
- Split 1
- Split 2
- Championships

In [3]:
# Lets create our lists and start with our generic lists that are shared through most of the wikipedia links

# These are the main regions for ALGS
regions = ["North_America", "South_America", "EMEA", "APAC_North", "APAC_South" ]

# There are usually 6 rounds (set of games) starting with round 1
rounds = [1, 2, 3, 4, 5, 6]

In [4]:
# First attempt at scraping data before we create our loops

URL = "https://liquipedia.net/apexlegends/Apex_Legends_Global_Series/2021/Preseason_Qualifier_1/North_America/Round_1"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

In [5]:
# This is a whole lot of info so we want to focus on two regions: the lobby tab and the results table
# I inspected the website using dev tools and found that there is a class for different tabs. 

results = soup.find_all(class_="tabs-content")
print(results)

[<div class="tabs-content wiki-bordercolor-light">
<div class="content1">
<div class="table-responsive" style="padding-right:1px;white-space:nowrap"><table class="wikitable wikitable-striped wikitable-bordered table-battleroyale-results" style="line-height:25px"><tbody><tr><th colspan="4"><b style="vertical-align:middle">Standings</b></th></tr><tr><th colspan="2" style="width:150px">Team</th><th style="border-right-width:1px !important">Total</th><th class="table-battleroyale-results-round">Round 1</th></tr><tr><td class="bg-up bg- pbg-" style="text-align:center"><b>1.</b></td><td class="bg-up"><span class="team-template-team-short" data-highlightingclass="CLX"><span class="team-template-image-icon team-template-lightmode"><a href="/apexlegends/CLX" title="CLX"><img alt="CLX" decoding="async" height="50" loading="lazy" src="/commons/images/thumb/a/ab/Apex_Legends_lightmode.png/50px-Apex_Legends_lightmode.png" srcset="/commons/images/thumb/a/ab/Apex_Legends_lightmode.png/74px-Apex_Legen

In [11]:
# Looking at the results above, we see that the lobby info is actually contained insidethe class "table-battleroyale-results"
# This gives us all the lobbies

results = soup.find_all(class_="table-battleroyale-results")
print(results)

[<table class="wikitable wikitable-striped wikitable-bordered table-battleroyale-results" style="line-height:25px"><tbody><tr><th colspan="4"><b style="vertical-align:middle">Standings</b></th></tr><tr><th colspan="2" style="width:150px">Team</th><th style="border-right-width:1px !important">Total</th><th class="table-battleroyale-results-round">Round 1</th></tr><tr><td class="bg-up bg- pbg-" style="text-align:center"><b>1.</b></td><td class="bg-up"><span class="team-template-team-short" data-highlightingclass="CLX"><span class="team-template-image-icon team-template-lightmode"><a href="/apexlegends/CLX" title="CLX"><img alt="CLX" decoding="async" height="50" loading="lazy" src="/commons/images/thumb/a/ab/Apex_Legends_lightmode.png/50px-Apex_Legends_lightmode.png" srcset="/commons/images/thumb/a/ab/Apex_Legends_lightmode.png/74px-Apex_Legends_lightmode.png 1.5x, /commons/images/thumb/a/ab/Apex_Legends_lightmode.png/99px-Apex_Legends_lightmode.png 2x" style="vertical-align: middle" widt

In [40]:
data = []
#table = soup.find('table', attrs={'class':['table-battleroyale-results', ]})

for _ in results:
    table_body = _.find('tbody')
    rows = table_body.find_all('tr')
    for row in rows:
        cols = row.find_all('td')
        cols = [ele.text.strip() for ele in cols]
        data.append([ele for ele in cols if ele])

In [41]:
data

[[],
 [],
 ['1.', 'CLX', '76', '76'],
 ['2.', 'OA', '55', '55'],
 ['3.', 'RCO', '44', '44'],
 ['4.', 'PWM', '39', '39'],
 ['5.', 'TEAM Workaholics', '37', '37'],
 ['6.', 'Insane Xboxsers', '33', '33'],
 ['7.', 'Chud Bungus', '15', '15'],
 ['8.', 'Ampedant', '15', '15'],
 ['9.', 'Team Shadow Wolf', '14', '14'],
 ['10.', 'JustSomeNewGuys', '12', '12'],
 ['11.', '2B1R', '12', '12'],
 ['12.', 'ControllerLegend', '12', '12'],
 ['13.', 'Carbon Esports', '7', '7'],
 ['14.', 'Tooshbags', '6', '6'],
 ['15.', 'Aces Team', '3', '3'],
 [],
 [],
 ['1.', 'BW', '102', '102'],
 ['2.', 'Spooky Scary', '66', '66'],
 ['3.', 'HololiveDXD', '26', '26'],
 ['4.', '6ide', '23', '23'],
 ['5.', 'ParkingLotBirds', '21', '21'],
 ['6.', 'Azakana', '20', '20'],
 ['7.', 'Bot Squad', '17', '17'],
 ['8.', 'Team Animo', '17', '17'],
 ['9.', 'The Not Squad', '16', '16'],
 ['10.', 'KingsVictoryClub', '15', '15'],
 ['11.', 'AVS', '12', '12'],
 ['12.', 'Remember Reach', '11', '11'],
 ['13.', 'Themeathouse', '10', '10'],
 [

This lets us get data for the lobby but it's kind of messy, lets try to make it into a dict of lists

In [39]:
results_body_all = soup.find_all('tbody')
results_body_all

[<tbody><tr><th class="navbox-title wiki-backgroundcolor-light" colspan="2" scope="col"><div class="noprint plainlinks navbox-navbar mini" style="padding:0;font-size:xx-small;float:left; text-align:left"><ul class="hlist"><li class="nv-view"><a href="/apexlegends/Template:ALGSY2_Navbox" title="Template:ALGSY2 Navbox"><abbr class="" style=";;border:none;-moz-box-shadow:none;-webkit-box-shadow:none;box-shadow:none;" title="View this template">v</abbr></a></li><li class="nv-talk"><a class="new" href="/apexlegends/index.php?title=Template_talk:ALGSY2_Navbox&amp;action=edit&amp;redlink=1" title="Template talk:ALGSY2 Navbox (page does not exist)"><abbr class="" style=";;border:none;-moz-box-shadow:none;-webkit-box-shadow:none;box-shadow:none;" title="Discuss this template">d</abbr></a></li><li class="nv-edit"><a class="internal text" href="https://liquipedia.net/apexlegends/index.php?title=Template:ALGSY2_Navbox&amp;action=edit"><abbr class="" style=";;border:none;-moz-box-shadow:none;-webki

In [34]:
data_all = defaultdict(list)
i = 0

for _ in results_body_all:
    rows = _.find_all('tr')
    cols = [ele.text.strip() for ele in rows]
    data_all[i] = [ele for ele in cols if ele]
    
    i += 1

In [35]:
data_all

defaultdict(list,
            {0: ['vdeApex Legends Global Series 21-22',
              'Preseason\n20-21\n21-22\n22-23',
              'Preseason QualifiersNorth America\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4\nSouth America\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4\nEMEA\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4\nAPAC North\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4\nAPAC South\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4',
              'Preseason Qualifiers',
              'North America\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4',
              'South America\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4',
              'EMEA\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4',
              'APAC North\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4',
              'APAC South\nQualifier 1\nQualifier 2\nQualifier 3\nQualifier 4',
              'Split 1North America\nPlayoffs\nPro League (Matches)\nChallenger Circ

This is closer but maybe we can avoid having to clean up the first few unnecessary tables.

In [44]:
def table_and_results(class_):
    return class_ and re.compile("table-battleroyale-results").search(class_)

In [65]:
test_results = soup.find_all(class_=table_and_results)
test_results[0]

<table class="wikitable wikitable-striped wikitable-bordered table-battleroyale-results" style="line-height:25px"><tbody><tr><th colspan="4"><b style="vertical-align:middle">Standings</b></th></tr><tr><th colspan="2" style="width:150px">Team</th><th style="border-right-width:1px !important">Total</th><th class="table-battleroyale-results-round">Round 1</th></tr><tr><td class="bg-up bg- pbg-" style="text-align:center"><b>1.</b></td><td class="bg-up"><span class="team-template-team-short" data-highlightingclass="CLX"><span class="team-template-image-icon team-template-lightmode"><a href="/apexlegends/CLX" title="CLX"><img alt="CLX" decoding="async" height="50" loading="lazy" src="/commons/images/thumb/a/ab/Apex_Legends_lightmode.png/50px-Apex_Legends_lightmode.png" srcset="/commons/images/thumb/a/ab/Apex_Legends_lightmode.png/74px-Apex_Legends_lightmode.png 1.5x, /commons/images/thumb/a/ab/Apex_Legends_lightmode.png/99px-Apex_Legends_lightmode.png 2x" style="vertical-align: middle" width

In [74]:
test_data_all = defaultdict(list)

i = 0

for lobby in test_results:
    temp_table = []
    rows = lobby.find_all('tr')
    for row in rows:
        cols = [ele.text.strip() for ele in row]
        temp_table.append([ele for ele in cols if ele])
    test_data_all[i] = temp_table
    i += 1

In [76]:
test_data_all

defaultdict(list,
            {0: [['Standings'],
              ['Team', 'Total', 'Round 1'],
              ['1.', 'CLX', '76', '76'],
              ['2.', 'OA', '55', '55'],
              ['3.', 'RCO', '44', '44'],
              ['4.', 'PWM', '39', '39'],
              ['5.', 'TEAM Workaholics', '37', '37'],
              ['6.', 'Insane Xboxsers', '33', '33'],
              ['7.', 'Chud Bungus', '15', '15'],
              ['8.', 'Ampedant', '15', '15'],
              ['9.', 'Team Shadow Wolf', '14', '14'],
              ['10.', 'JustSomeNewGuys', '12', '12'],
              ['11.', '2B1R', '12', '12'],
              ['12.', 'ControllerLegend', '12', '12'],
              ['13.', 'Carbon Esports', '7', '7'],
              ['14.', 'Tooshbags', '6', '6'],
              ['15.', 'Aces Team', '3', '3']],
             1: [],
             2: [['Standings'],
              ['Team', 'Total', 'Round 1'],
              ['1.', 'BW', '102', '102'],
              ['2.', 'Spooky Scary', '66', '66'],
   

## Preseason Data

Lets start by trying to scrape our preseason data