# ALGS Data Scraping

This notebook will be used for scraping data from https://liquipedia.net/apexlegends/Apex_Legends_Global_Series/2021-22 for their match data for the ALGS Year 2 season.

We want to explore this data for the IronViz Tableau Competition

The data we want to collect is the following: 
- Region
- Round Number (set of games)
- Game Number (refered to as round in table)

For preseason:
- Qualifier Round (for preseason qualifiers)
- Lobby number 

For splits:
- Circuit Round (for challenger circuit)
- Rounds (set of games) for pro league
- Game number (refered to as round in table)
- Playoffs games

For championship:
- LCQs
- Winners Bracket, Losers Bracket, Finals
- Round number

For the games with group stages:
- Groups A, B, C, D

In [3]:
import requests
from bs4 import BeautifulSoup

To make the process a bit more straightforward, we will be creating lists to iterate over. We might not use some of these lists but it will help us at least keep organization straightforward. 

The lists will include for all data:
- Region
- Round Number (set of games)

For preseason:
- Qualifier Round (for preseason qualifiers)

For splits:
- Circuit Round (for challenger circuit)
- Rounds (set of games) for pro league
- Playoffs games

For championship:
- Winners Bracket, Losers Bracket, Finals
- Round number

For the games with group stages:
- Groups A, B, C, D
<hr>

We will go in the following order for our data scraping:

- Preseason
- Split 1
- Split 2
- Championships

In [6]:
# Lets create our lists and start with our generic lists that are shared through most of the wikipedia links

# These are the main regions for ALGS
regions = ["North_America", "South_America", "EMEA", "APAC_North", "APAC_South" ]

# There are usually 6 rounds (set of games) starting with round 1
rounds = [1, 2, 3, 4, 5, 6]

In [15]:
# First attempt at scraping data before we create our loops

URL = "https://liquipedia.net/apexlegends/Apex_Legends_Global_Series/2021/Preseason_Qualifier_1/North_America/Round_1"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

print(soup.prettify)

<bound method Tag.prettify of <!DOCTYPE html>

<html class="client-nojs Send_pizza_to_FO-nTTaX All_glory_to_Liquipedia update-style" dir="ltr" lang="en" prefix="og: http://ogp.me/ns#">
<head>
<meta charset="utf-8"/>
<title>Apex Legends Global Series: Preseason Qualifier #1 - North America - Round 1 - Liquipedia Apex Legends Wiki</title>
<script>document.documentElement.className="client-js Send_pizza_to_FO-nTTaX All_glory_to_Liquipedia update-style";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"9b3576c7c4cf1f6064d32d68","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Apex_Legends_Global_Series/2021/Preseason_Qualifier_1/North_America/Round_1","wgTitle":"Apex Legends Global Series/2021/Preseason Qualifier 1/

In [12]:
# This is a whole lot of info so we want to focus on two regions: the lobby tab and the results table
# I inspected the website using dev tools and found that there is a class for different tabs. 

results = soup.find_all(class_="tabs")
print(results)

[<ul class="nav nav-tabs navigation-not-searchable tabs tabs5"><li><a href="/apexlegends/Apex_Legends_Global_Series/2021/Preseason_Qualifiers/North_America" title="Apex Legends Global Series/2021/Preseason Qualifiers/North America">Preseason Qualifiers</a></li><li class="active"><a href="/apexlegends/Apex_Legends_Global_Series/2021/Preseason_Qualifier_1/North_America" title="Apex Legends Global Series/2021/Preseason Qualifier 1/North America">Qualifier 1</a></li><li><a href="/apexlegends/Apex_Legends_Global_Series/2021/Preseason_Qualifier_2/North_America" title="Apex Legends Global Series/2021/Preseason Qualifier 2/North America">Qualifier 2</a></li><li><a href="/apexlegends/Apex_Legends_Global_Series/2021/Preseason_Qualifier_3/North_America" title="Apex Legends Global Series/2021/Preseason Qualifier 3/North America">Qualifier 3</a></li><li><a href="/apexlegends/Apex_Legends_Global_Series/2021/Preseason_Qualifier_4/North_America" title="Apex Legends Global Series/2021/Preseason Qualifi

In [13]:
# Looking at the results above, we see that the lobby info is actually given a specific tab # we'll test with tab1
# In the future we will make a loop and go through all lobbies

results = soup.find_all(class_="tab1")
print(results)

[<li class="tab1 active">Lobby 1</li>]


In [32]:
data=[]
table = soup.find('table', attrs={'class':['content2', 'table-battleroyale-results', ]})
table_body = table.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

In [33]:
data

[[],
 [],
 ['1.', 'CLX', '76', '76'],
 ['2.', 'OA', '55', '55'],
 ['3.', 'RCO', '44', '44'],
 ['4.', 'PWM', '39', '39'],
 ['5.', 'TEAM Workaholics', '37', '37'],
 ['6.', 'Insane Xboxsers', '33', '33'],
 ['7.', 'Chud Bungus', '15', '15'],
 ['8.', 'Ampedant', '15', '15'],
 ['9.', 'Team Shadow Wolf', '14', '14'],
 ['10.', 'JustSomeNewGuys', '12', '12'],
 ['11.', '2B1R', '12', '12'],
 ['12.', 'ControllerLegend', '12', '12'],
 ['13.', 'Carbon Esports', '7', '7'],
 ['14.', 'Tooshbags', '6', '6'],
 ['15.', 'Aces Team', '3', '3']]

This lets us get one lobby so we want to really make sure we're able to get it for multiple lobbies

## Preseason Data

Lets start by trying to scrape our preseason data