This notebook uses Beautiful Soup to scrape Valve's video game Deadlock's changelog page to get the links to all patch notes (updates). Each patch note is extracted from the page by parsing the HTML and finding the tags that link to the individual patch notes. Based on the URL structure, loop through and extract the text data from each individual patch note. Store the extracted (raw) data in a .txt file. Data is first stored locally for initial development and then pushed to Google Cloud Storage in batch.

As of 24Nov2024, all patch notes are located in [this forum](https://forums.playdeadlock.com/forums/changelog.10/)

![Deadlock changelog menu](images/phase1-changelog-homepage.png)

In [9]:
import requests
from bs4 import BeautifulSoup

def get_patch_links(page_num):
    # Determine URL for the current page
    url = f"https://forums.playdeadlock.com/forums/changelog.10/page-{page_num}" if page_num > 1 else "https://forums.playdeadlock.com/forums/changelog.10/"
    
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract all the thread links that are patch notes
    links = []
    for a_tag in soup.find_all('a', href=True):
        link = a_tag['href']
        if '/threads/' in link:  # Look for valid thread links
            full_url = f"https://forums.playdeadlock.com{link}"
            links.append(full_url)

    return links

# Start with page 1
patch_links = get_patch_links(1)
print("Patch Notes Links from Page 1:")
for link in patch_links:
    print(link)


Patch Notes Links from Page 1:
https://forums.playdeadlock.com/threads/changelog-feedback-process.6/
https://forums.playdeadlock.com/threads/changelog-feedback-process.6/
https://forums.playdeadlock.com/threads/changelog-feedback-process.6/latest
https://forums.playdeadlock.com/threads/11-21-2024-update.47476/
https://forums.playdeadlock.com/threads/11-21-2024-update.47476/
https://forums.playdeadlock.com/threads/11-21-2024-update.47476/latest
https://forums.playdeadlock.com/threads/11-13-2024-update.46391/
https://forums.playdeadlock.com/threads/11-13-2024-update.46391/
https://forums.playdeadlock.com/threads/11-13-2024-update.46391/latest
https://forums.playdeadlock.com/threads/11-10-2024-update.45689/
https://forums.playdeadlock.com/threads/11-10-2024-update.45689/
https://forums.playdeadlock.com/threads/11-10-2024-update.45689/latest
https://forums.playdeadlock.com/threads/11-07-2024-update.44786/
https://forums.playdeadlock.com/threads/11-07-2024-update.44786/
https://forums.playd