<h1>Aoe2.net: Scrape Ongoing Matches Data</h1>

## Imports

In [54]:
from bs4 import BeautifulSoup as bs
# from bs4.BeautifulSoup import prettify # this didn't work
prettify = bs.prettify
import requests
import pandas

## HTTP Request -> Response

In [55]:
# dir(bs)
# help(requests.Request)
r = requests.get("https://aoe2.net/#aoe2de-matches-ongoing")

## Initialize Soup

In [56]:
soup = bs(r.content)
print(prettify(soup))

<!DOCTYPE html>
<html lang="en">
 <head>
  <title>
   AoE2.net
  </title>
  <meta content="wss://aoe2.net/ws" name="websocketurl"/>
  <link href="/webjars/bootswatch/4.2.1/darkly/bootstrap.css" rel="stylesheet" type="text/css"/>
  <link href="/webjars/datatables/1.10.21/css/dataTables.bootstrap4.min.css" rel="stylesheet" type="text/css"/>
  <link href="/webjars/flag-icon-css/3.5.0/css/flag-icon.min.css" rel="stylesheet" type="text/css"/>
  <link href="/webjars/octicons/4.3.0/build/font/octicons.min.css" rel="stylesheet" type="text/css"/>
  <link href="/webjars/octicons/4.3.0/build/octicons.min.css" rel="stylesheet" type="text/css"/>
  <link href="/vassets/stylesheets/450d7ff8fb1a639b715ca946438f3a8b-main.css" rel="stylesheet"/>
  <link href="/assets/images/125504a37739b91c50d3c1673bfb00a3-favicon.png" rel="shortcut icon" type="image/png"/>
  <meta content="Play Age of Empires II: HD (AoE2:HD) and Age of Empires II: Definitive Edition (AoE2:DE) online! Lobby Browser and Leaderboards" na

### Problem

I used Firefox Inspect tool by pointing at each 'card' which shows a given match. This way I determined the element I'm looking for as follows:<br>
`<table id="aoe2de-matches-table">` <-- the table<br>
..`<tbody>` <-- table body<br>
....`<tr ... >` <-- table row<br>
<br>
But, when I locate this table in our soup, using `id="aoe2de-matches-table"`, the tag `<tbody>` isn't there.. huh. I do see stuff for the table head tag `<thead>`, but that's it.<br>
<br>
I went back to Firefox to investigated page behaviour on reload. The page loads with only table headers at first, then after a few seconds after "loading..." message, the page populates content in the table. This suggests our `request.get()` method is only grabbing the first stage before whatever event occured, hence the empty table body.

### Solution 1

It looks like this site provides an API to query their data, including data for ongoing matches!<br><br>
https://aoe2.net/#api<br><br>Let's try using the api for Matches, as described in that link...

In [65]:
r2 = requests.get("https://aoe2.net/api/matches?game=aoe2de&count=1000")

In [66]:
r2.text



Great, we got all the details for these matches in JSON format. <br><br>Next, let's convert format from `str` to `dict`, and then try to visualize our data better, so that we see how to extract some details like elo, player names, game map, and what civs they are playing with.

In [73]:
import json
dir(json)

['JSONDecodeError',
 'JSONDecoder',
 'JSONEncoder',
 '__all__',
 '__author__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_default_decoder',
 '_default_encoder',
 'codecs',
 'decoder',
 'detect_encoding',
 'dump',
 'dumps',
 'encoder',
 'load',
 'loads',
 'scanner']

In [75]:
help(json.load)
help(json.loads) #use this one for string input

Help on function load in module json:

load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
    Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
    a JSON document) to a Python object.
    
    ``object_hook`` is an optional function that will be called with the
    result of any object literal decode (a ``dict``). The return value of
    ``object_hook`` will be used instead of the ``dict``. This feature
    can be used to implement custom decoders (e.g. JSON-RPC class hinting).
    
    ``object_pairs_hook`` is an optional function that will be called with the
    result of any object literal decoded with an ordered list of pairs.  The
    return value of ``object_pairs_hook`` will be used instead of the ``dict``.
    This feature can be used to implement custom decoders.  If ``object_hook``
    is also defined, the ``object_pairs_hook`` takes priority.
    
    To use a custom ``JSONDecod

In [80]:
matches_data = json.loads(r2.text)
matches_data

[{'match_id': '1953364',
  'lobby_id': None,
  'match_uuid': '5fc30da7-3ae2-0c45-9ba4-25b1d1b9f222',
  'version': None,
  'name': 'AUTOMATCH',
  'num_players': 6,
  'num_slots': 6,
  'average_rating': None,
  'cheats': False,
  'full_tech_tree': None,
  'ending_age': 5,
  'expansion': None,
  'game_type': 0,
  'has_custom_content': None,
  'has_password': None,
  'lock_speed': None,
  'lock_teams': None,
  'map_size': 3,
  'map_type': 76,
  'pop': None,
  'ranked': True,
  'leaderboard_id': 4,
  'rating_type': 4,
  'resources': None,
  'rms': 'valley.rms2',
  'scenario': 'valley.rms2',
  'server': None,
  'shared_exploration': None,
  'speed': None,
  'starting_age': 0,
  'team_together': None,
  'team_positions': None,
  'treaty_length': 2,
  'turbo': None,
  'victory': None,
  'victory_time': None,
  'visibility': None,
  'opened': 1575463333,
  'started': 1575463333,
  'finished': 1575490152,
  'players': [{'profile_id': 142635,
    'steam_id': None,
    'name': 'Lozthecelt',
    'c

Cool, the data is workable in python now and Jupyter even makes the display readable. Success.

## Do Something with the Data

### Goal 1. Present only elo and match id; Sort by descending elo.

Ranking is not present in early matches, `'rating':None`. Let's check the timestamps, which need to be converted from epoch to ddaatetime.

In [92]:
from datetime import datetime

In [88]:
t1 = matches_data[0]['started']
print(t1)

1575463333


This is so-called Unix Epoch Time, measured in seconds from 12am, Jan 1st, 1970.

In [93]:
t1_ = datetime.fromtimestamp(t1)
print(t1_)

2019-12-04 06:42:13


Ok, this is around when DE came out, so it's probably before ratings were established. In fact, I think this is THE first game that was stored. Cool!
<br><br>
So, let's refine our API query to games started within the last 20 minutes.

In [97]:
now_ = datetime.now(); print(now)

2021-05-15 00:18:34.934039


In [107]:
now = int(now_.timestamp()); print(now)

1621056007


In [119]:
now = int(datetime.now().timestamp())
print(now)
r3 = requests.get(f"https://aoe2.net/api/matches?game=aoe2de&count=1000&since={now-20*60}")
print(r3)

1621056666
<Response [200]>


In [120]:
md = json.loads(r3.text) #md is 'matches data'
print(len(md))
md

675


[{'match_id': '91524878',
  'lobby_id': '109775240943962406',
  'match_uuid': 'a2900851-55b6-7445-8cbc-127a029fa621',
  'version': None,
  'name': 'hp',
  'num_players': 2,
  'num_slots': 2,
  'average_rating': 1500,
  'cheats': False,
  'full_tech_tree': False,
  'ending_age': 5,
  'expansion': None,
  'game_type': 0,
  'has_custom_content': None,
  'has_password': False,
  'lock_speed': False,
  'lock_teams': True,
  'map_size': 5,
  'map_type': 72,
  'pop': 250,
  'ranked': False,
  'leaderboard_id': 0,
  'rating_type': 0,
  'resources': 0,
  'rms': None,
  'scenario': None,
  'server': None,
  'shared_exploration': True,
  'speed': 3,
  'starting_age': 0,
  'team_together': True,
  'team_positions': False,
  'treaty_length': 0,
  'turbo': False,
  'victory': 1,
  'victory_time': 1,
  'visibility': 2,
  'opened': 1621055477,
  'started': 1621055477,
  'finished': None,
  'players': [{'profile_id': 3426668,
    'steam_id': '76561198451025485',
    'name': 'Czy.ZED',
    'clan': None,

Sweet, we got it. And, looks like there's currently 675 active matches that started within the last 20 minutes.<br>
<br>
Next, let's extract `match_id`, `match_uuid`, `average_rating`, and `started`. We can use Pandas dataframes

In [124]:
import pandas as pd

In [134]:
# create data frame with only first match, md[0]
df = pd.DataFrame(md[0])

In [135]:
df

Unnamed: 0,match_id,lobby_id,match_uuid,version,name,num_players,num_slots,average_rating,cheats,full_tech_tree,...,team_positions,treaty_length,turbo,victory,victory_time,visibility,opened,started,finished,players
0,91524878,109775240943962406,a2900851-55b6-7445-8cbc-127a029fa621,,hp,2,2,1500,False,False,...,False,0,False,1,1,2,1621055477,1621055477,,"{'profile_id': 3426668, 'steam_id': '765611984..."
1,91524878,109775240943962406,a2900851-55b6-7445-8cbc-127a029fa621,,hp,2,2,1500,False,False,...,False,0,False,1,1,2,1621055477,1621055477,,"{'profile_id': 3426540, 'steam_id': '765611984..."
2,91524878,109775240943962406,a2900851-55b6-7445-8cbc-127a029fa621,,hp,2,2,1500,False,False,...,False,0,False,1,1,2,1621055477,1621055477,,"{'profile_id': None, 'steam_id': None, 'name':..."
3,91524878,109775240943962406,a2900851-55b6-7445-8cbc-127a029fa621,,hp,2,2,1500,False,False,...,False,0,False,1,1,2,1621055477,1621055477,,"{'profile_id': None, 'steam_id': None, 'name':..."
4,91524878,109775240943962406,a2900851-55b6-7445-8cbc-127a029fa621,,hp,2,2,1500,False,False,...,False,0,False,1,1,2,1621055477,1621055477,,"{'profile_id': None, 'steam_id': None, 'name':..."
5,91524878,109775240943962406,a2900851-55b6-7445-8cbc-127a029fa621,,hp,2,2,1500,False,False,...,False,0,False,1,1,2,1621055477,1621055477,,"{'profile_id': None, 'steam_id': None, 'name':..."
6,91524878,109775240943962406,a2900851-55b6-7445-8cbc-127a029fa621,,hp,2,2,1500,False,False,...,False,0,False,1,1,2,1621055477,1621055477,,"{'profile_id': None, 'steam_id': None, 'name':..."
7,91524878,109775240943962406,a2900851-55b6-7445-8cbc-127a029fa621,,hp,2,2,1500,False,False,...,False,0,False,1,1,2,1621055477,1621055477,,"{'profile_id': None, 'steam_id': None, 'name':..."
