Load in the data from S3 to confirm that the data is accessible. Write some code to further preprocess the data and reupload back into S3.

The next step will be to use AWS Lambda to trigger a preprocessing script upon a new json file upload.

In [15]:
import boto3
import pandas as pd
import json
import re
from io import StringIO

# create AWS session
s3 = boto3.client('s3')

# define the bucket and the key (file path)
bucket_name = 'deadlock-patch-notes-data-bucket'

# get objects in the bucket
response = s3.list_objects_v2(Bucket=bucket_name)

# sort files by filename
files = sorted(response.get('Contents', []), key=lambda x: x['Key'], reverse=True)

# get lastest file
if files:
    last_file = files[0]['Key']
    
    # skip if the file starts with '.ipynb'
    if last_file.startswith('.ipynb'):
        print(f'the latest file starts with .ipynb, skipping: {last_file}')
    else:
        print(f'opening file: {last_file}')
        
        # download and read file contents
        obj = s3.get_object(Bucket=bucket_name, Key=last_file)
        file_content = obj['Body'].read().decode('utf-8')

        # parse the content as json
        json_content = json.loads(file_content)
        
        # print the json content
        print(json.dumps(json_content, indent=4))  # formatted output
else:
    print(f'no files found in {bucket_name} bucket')

opening file: 2025-01-27_54590.json
{
    "poster": "Yoshi",
    "date": "2025-01-27",
    "time": "6:10 PM",
    "content": "- Abrams: Infernal Resilience increased from 11% to 12%\n- Bebop: Uppercut air control lockout period reduced from 0.5s to 0.3s\n- Dynamo: Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8\n- Dynamo: Kinetic Pulse T3 now also adds +1 Charge\n- Calico: Gloom Bombs now has updated impact SFX\n- Calico: Gloom Bombs now has an arming effect for when they are about to detonate\n- Calico: Leaping Slash only heals when hitting heroes\n- Calico: Leaping Slash fixed VFX to match the damage area more accurately\n- Calico: Leaping Slash updated to break breakables in the area\n- Calico: Leaping Slash fixed a bug where calico's slash would deal no damage near walls\n- Calico: Ava duration reduced from 20s to 15s\n- Calico: Ava cooldown reduced from 50s to 45s\n- Calico: Ava now gets slowed by 30% for 1s anytime she takes damage\n- Calico: Ava speed reduced from 

In [11]:
# format hero_patch_notes content
s3_json_hero_patch_notes = json_content['hero_patch_notes']
s3_json_hero_patch_notes

['- Abrams: Infernal Resilience increased from 11% to 12%',
 '- Bebop: Uppercut air control lockout period reduced from 0.5s to 0.3s',
 '- Dynamo: Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8',
 '- Dynamo: Kinetic Pulse T3 now also adds +1 Charge',
 '- Calico: Gloom Bombs now has updated impact SFX',
 '- Calico: Gloom Bombs now has an arming effect for when they are about to detonate',
 '- Calico: Leaping Slash only heals when hitting heroes',
 '- Calico: Leaping Slash fixed VFX to match the damage area more accurately',
 '- Calico: Leaping Slash updated to break breakables in the area',
 "- Calico: Leaping Slash fixed a bug where calico's slash would deal no damage near walls",
 '- Calico: Ava duration reduced from 20s to 15s',
 '- Calico: Ava cooldown reduced from 50s to 45s',
 '- Calico: Ava now gets slowed by 30% for 1s anytime she takes damage',
 '- Calico: Ava speed reduced from 75% to 65%',
 '- Calico: Ava T2 speed increased from +35% to +45%',
 '- Calico: Ava n

My approach to this preprocessing includes first creating a dictionary where the key is the hero and the value is a list of changes to that hero. My first step will be to handle "from X to Y" changes since there is a clear pattern that can be recognized. I plan to come back and make some incremental improvements to incorporate additional pattern recognition such as "by X%" and then handle miscellaneous cases that have no mention of value changes but rather chagnes to an ability or their interaction with others and the game environment.

In [14]:
hero_patch_notes = {}

# iterate over each change in the list
for line in s3_json_hero_patch_notes:
    # remove the initial dash and space
    if line.startswith('- '):
        line = line[2:]
    # split the line into hero and change parts using ':' as a delimiter
    try:
        hero, change = line.split(':', 1)
    except ValueError:
        # if the line doesn't contain a colon, skip it
        continue

    # trim any extra whitespace from hero and change
    hero = hero.strip()
    change = change.strip()
    
    # add the change to the list for the hero
    if hero not in hero_patch_notes:
        hero_patch_notes[hero] = []
    hero_patch_notes[hero].append(change)

print(json.dumps(hero_patch_notes, indent=4))

{
    "Abrams": [
        "Infernal Resilience increased from 11% to 12%"
    ],
    "Bebop": [
        "Uppercut air control lockout period reduced from 0.5s to 0.3s"
    ],
    "Dynamo": [
        "Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8",
        "Kinetic Pulse T3 now also adds +1 Charge"
    ],
    "Calico": [
        "Gloom Bombs now has updated impact SFX",
        "Gloom Bombs now has an arming effect for when they are about to detonate",
        "Leaping Slash only heals when hitting heroes",
        "Leaping Slash fixed VFX to match the damage area more accurately",
        "Leaping Slash updated to break breakables in the area",
        "Leaping Slash fixed a bug where calico's slash would deal no damage near walls",
        "Ava duration reduced from 20s to 15s",
        "Ava cooldown reduced from 50s to 45s",
        "Ava now gets slowed by 30% for 1s anytime she takes damage",
        "Ava speed reduced from 75% to 65%",
        "Ava T2 speed increase

In [17]:
# the above output seems like a reasonable checkpoint to have since it is sorted by hero and change
# now I will filter this data so that I am initially just working with "from-to" changes

def filter_from_to_changes(hero_patch_notes):
    filtered = {}
    for hero, changes in hero_patch_notes.items():
        # keep only changes that contain both "from" and "to"
        from_to_changes = [change for change in changes if "from" in change.lower() and "to" in change.lower()]
        if from_to_changes:
            filtered[hero] = from_to_changes
    return filtered

filtered_hero_patch_notes_from_to = filter_from_to_changes(hero_patch_notes)

print(json.dumps(filtered_hero_patch_notes_from_to, indent=4))

{
    "Abrams": [
        "Infernal Resilience increased from 11% to 12%"
    ],
    "Bebop": [
        "Uppercut air control lockout period reduced from 0.5s to 0.3s"
    ],
    "Dynamo": [
        "Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8"
    ],
    "Calico": [
        "Ava duration reduced from 20s to 15s",
        "Ava cooldown reduced from 50s to 45s",
        "Ava speed reduced from 75% to 65%",
        "Ava T2 speed increased from +35% to +45%",
        "Return to Shadows cooldown increased from 80s to 90s"
    ],
    "Grey Talon": [
        "Spirit Snare cooldown reduced from 37s to 34s",
        "Spirit Snare T2 increased from +0.5s to +0.75s"
    ],
    "Haze": [
        "Bullet Dance T3 increased from +40% Evasion to +60%",
        "Bullet Dance T3 increased from +2 Dance Move Speed to +3"
    ],
    "Holliday": [
        "Powder Keg Charge Time increased from 1s to 2s",
        "Bounce Pad spirit scaling reduced from 0.9 to 0.4",
        "Spirit Lasso 