# Extract Game Ids

This notebook allows to extract the games id for a season in a round by inspecting the web pages of the form 

```
https://nbl.com.au/schedule?round=<ROUND-ID>&season=<SEASON-ID>
```

For example this is pre-season for 2022-2033:

https://nbl.com.au/schedule?round=PS&season=34173

Those pages expose game links of the form `https://nbl.com.au/games/<GAME-ID>`, but only after Javascript has run. So, we need to use a virtual webdriver to actually browse the page (in silent) after that. We do this with module `selenium` that provides drivers for browsers. Here is [an explanation](https://stackoverflow.com/questions/11047348/is-this-possible-to-load-the-page-after-the-javascript-execute-using-python) how to load a page after Javascript has executed.

**Note:** the original page, before Javascript, will also expose the game ids in structures of the form `matchId:<GAME-ID>`, but it will give all of them of the season, without filtering on the round.


## Option 1: Via Salenium virtual browser

In [1]:
import re

# Download geckodriver (https://github.com/mozilla/geckodriver/releases) and put it in path
# Salenium webdriver: https://www.selenium.dev/documentation/overview/
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By

In [2]:
def get_url(n):
    return f"https://classroom.github.com/classrooms/139094533-artificial-intelligence-2023/roster?roster_entries_page={n}"


USERNAME = '' #Github username - do not commit your own!
PASSWORD = '' #Github password - do not commit your own!

In [3]:


# We need an actual browser so that the JavaScript is loaded and the links https://.../games/<game_id> are generated
options = Options()
#options.headless = True
browser = webdriver.Firefox(options=options)
# browser = webdriver.Firefox(options=options, executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')

html_texts = []

#for rno in ROUNDS:
url = get_url(2)
print(f"Extracting web HTML at ", url)
browser.get(url)

#authenticate:
username = browser.find_element(By.NAME, 'login')
username.send_keys(USERNAME)

password = browser.find_element(By.NAME, 'password')
password.send_keys(PASSWORD)

form = browser.find_element(By.NAME,'commit')
form.submit()


html_text = browser.page_source


Extracting web HTML at  https://classroom.github.com/classrooms/139094533-artificial-intelligence-2023/roster?roster_entries_page=2


In [15]:
#print(html_text)

#Get the list of students
cards = browser.find_elements(By.CSS_SELECTOR,"div.assignment-list-item.d-flex.col-12")

len(cards)
for card in cards:

    #Make sure it is a student with an id, we seem to be picking up some other elements in the page.
    try:
        description = card.find_element(By.CSS_SELECTOR,"h3.assignment-name-link.h4")  
    except:
        print("No description.")
    
    #Check to see if it is a duplicate (is 9 digits long with a -1 at the end), and if so, delete then confirm.
    if len(description.text)==9 and description.text[-2:]=="-1":
        print(description.text)
        button = card.find_elements(By.CSS_SELECTOR, "div.Button-withTooltip")[1]
        button.click()

        popup= browser.find_element(By.CSS_SELECTOR,'input.btn.btn-danger.btn-block.js-submit')
        
        #DANGER! The following line of code seems to delete the wrong student, do not run until fixed!
        #popup.submit() 
    else:
        print("Not a duplicate")



Not a duplicate
Not a duplicate
3607359-1
No description.
Not a duplicate
Not a duplicate
3618845-1
No description.
Not a duplicate
Not a duplicate
Not a duplicate
3668498-1
No description.
Not a duplicate
3676400-1
No description.
Not a duplicate
Not a duplicate
Not a duplicate
3687137-1
No description.
Not a duplicate
3695517-1
No description.
Not a duplicate
3704571-1
No description.
No description.
No description.
