Load in the data from S3 to confirm that the data is accessible. Write some code to further preprocess the data and reupload back into S3.

The next step will be to use AWS Lambda to trigger a preprocessing script upon a new json file upload.

In [1]:
import boto3
import pandas as pd
import json
import re
import os
from io import StringIO

# create AWS session
s3 = boto3.client('s3')

# define the bucket and the key (file path)
bucket_name = 'deadlock-patch-notes-data-bucket'

# get objects in the bucket
response = s3.list_objects_v2(Bucket=bucket_name)

# sort files by filename
files = sorted(response.get('Contents', []), key=lambda x: x['Key'], reverse=True)

# get lastest file
if files:
    last_file = files[0]['Key']
    
    # skip if the file starts with '.ipynb'
    if last_file.startswith('.ipynb'):
        print(f'the latest file starts with .ipynb, skipping: {last_file}')
    else:
        print(f'opening file: {last_file}')
        
        # download and read file contents
        obj = s3.get_object(Bucket=bucket_name, Key=last_file)
        file_content = obj['Body'].read().decode('utf-8')

        # parse the content as json
        json_content = json.loads(file_content)
        
        # print the json content
        print(json.dumps(json_content, indent=4))  # formatted output
else:
    print(f'no files found in {bucket_name} bucket')

opening file: 2025-01-27_54590.json
{
    "poster": "Yoshi",
    "date": "2025-01-27",
    "time": "6:10 PM",
    "content": "- Abrams: Infernal Resilience increased from 11% to 12%\n- Bebop: Uppercut air control lockout period reduced from 0.5s to 0.3s\n- Dynamo: Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8\n- Dynamo: Kinetic Pulse T3 now also adds +1 Charge\n- Calico: Gloom Bombs now has updated impact SFX\n- Calico: Gloom Bombs now has an arming effect for when they are about to detonate\n- Calico: Leaping Slash only heals when hitting heroes\n- Calico: Leaping Slash fixed VFX to match the damage area more accurately\n- Calico: Leaping Slash updated to break breakables in the area\n- Calico: Leaping Slash fixed a bug where calico's slash would deal no damage near walls\n- Calico: Ava duration reduced from 20s to 15s\n- Calico: Ava cooldown reduced from 50s to 45s\n- Calico: Ava now gets slowed by 30% for 1s anytime she takes damage\n- Calico: Ava speed reduced from 

In [2]:
# format hero_patch_notes content
s3_json_hero_patch_notes = json_content['hero_patch_notes']
s3_json_hero_patch_notes

['- Abrams: Infernal Resilience increased from 11% to 12%',
 '- Bebop: Uppercut air control lockout period reduced from 0.5s to 0.3s',
 '- Dynamo: Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8',
 '- Dynamo: Kinetic Pulse T3 now also adds +1 Charge',
 '- Calico: Gloom Bombs now has updated impact SFX',
 '- Calico: Gloom Bombs now has an arming effect for when they are about to detonate',
 '- Calico: Leaping Slash only heals when hitting heroes',
 '- Calico: Leaping Slash fixed VFX to match the damage area more accurately',
 '- Calico: Leaping Slash updated to break breakables in the area',
 "- Calico: Leaping Slash fixed a bug where calico's slash would deal no damage near walls",
 '- Calico: Ava duration reduced from 20s to 15s',
 '- Calico: Ava cooldown reduced from 50s to 45s',
 '- Calico: Ava now gets slowed by 30% for 1s anytime she takes damage',
 '- Calico: Ava speed reduced from 75% to 65%',
 '- Calico: Ava T2 speed increased from +35% to +45%',
 '- Calico: Ava n

My approach to this preprocessing includes first creating a dictionary where the key is the hero and the value is a list of changes to that hero. My first step will be to handle "from X to Y" changes since there is a clear pattern that can be recognized. I plan to come back and make some incremental improvements to incorporate additional pattern recognition such as "by X%" and then handle miscellaneous cases that have no mention of value changes but rather chagnes to an ability or their interaction with others and the game environment.

In [3]:
hero_patch_notes = {}

# iterate over each change in the list
for line in s3_json_hero_patch_notes:
    # remove the initial dash and space
    if line.startswith('- '):
        line = line[2:]
    # split the line into hero and change parts using ':' as a delimiter
    try:
        hero, change = line.split(':', 1)
    except ValueError:
        # if the line doesn't contain a colon, skip it
        continue

    # trim any extra whitespace from hero and change
    hero = hero.strip()
    change = change.strip()
    
    # add the change to the list for the hero
    if hero not in hero_patch_notes:
        hero_patch_notes[hero] = []
    hero_patch_notes[hero].append(change)

print(json.dumps(hero_patch_notes, indent=4))

{
    "Abrams": [
        "Infernal Resilience increased from 11% to 12%"
    ],
    "Bebop": [
        "Uppercut air control lockout period reduced from 0.5s to 0.3s"
    ],
    "Dynamo": [
        "Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8",
        "Kinetic Pulse T3 now also adds +1 Charge"
    ],
    "Calico": [
        "Gloom Bombs now has updated impact SFX",
        "Gloom Bombs now has an arming effect for when they are about to detonate",
        "Leaping Slash only heals when hitting heroes",
        "Leaping Slash fixed VFX to match the damage area more accurately",
        "Leaping Slash updated to break breakables in the area",
        "Leaping Slash fixed a bug where calico's slash would deal no damage near walls",
        "Ava duration reduced from 20s to 15s",
        "Ava cooldown reduced from 50s to 45s",
        "Ava now gets slowed by 30% for 1s anytime she takes damage",
        "Ava speed reduced from 75% to 65%",
        "Ava T2 speed increase

In [4]:
# the above output seems like a reasonable checkpoint to have since it is sorted by hero and change
# now I will filter this data so that I am initially just working with "from-to" changes

def filter_from_to_changes(hero_patch_notes):
    filtered = {}
    for hero, changes in hero_patch_notes.items():
        # keep only changes that contain both "from" and "to"
        from_to_changes = [change for change in changes if "from" in change.lower() and "to" in change.lower()]
        if from_to_changes:
            filtered[hero] = from_to_changes
    return filtered

filtered_hero_patch_notes_from_to = filter_from_to_changes(hero_patch_notes)

print(json.dumps(filtered_hero_patch_notes_from_to, indent=4))

{
    "Abrams": [
        "Infernal Resilience increased from 11% to 12%"
    ],
    "Bebop": [
        "Uppercut air control lockout period reduced from 0.5s to 0.3s"
    ],
    "Dynamo": [
        "Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8"
    ],
    "Calico": [
        "Ava duration reduced from 20s to 15s",
        "Ava cooldown reduced from 50s to 45s",
        "Ava speed reduced from 75% to 65%",
        "Ava T2 speed increased from +35% to +45%",
        "Return to Shadows cooldown increased from 80s to 90s"
    ],
    "Grey Talon": [
        "Spirit Snare cooldown reduced from 37s to 34s",
        "Spirit Snare T2 increased from +0.5s to +0.75s"
    ],
    "Haze": [
        "Bullet Dance T3 increased from +40% Evasion to +60%",
        "Bullet Dance T3 increased from +2 Dance Move Speed to +3"
    ],
    "Holliday": [
        "Powder Keg Charge Time increased from 1s to 2s",
        "Bounce Pad spirit scaling reduced from 0.9 to 0.4",
        "Spirit Lasso 

I tried looking for an exisiting API that gives me some data with the heroes and their abilities, but I could not locate anything online. I also tried searching the source code to see if there was anything I could use, but none of the files on my local computer where I play the video game had information on what I needed. As a result, I opened the video game and began creating a json file for my needs. Since there are currently 26 playable heroes as of February 2, 2025, I will start with my main hero, Dynamo, and add more characters once I get a working prototype.

Previously, I created a heroes.txt file with the only contents being the names of all playable heroes. But after having to create a new json file with the abilities, I realized that I do not actually need this text file because the hero names would be included already in the more comprehensive json file.

In [5]:
# read in the heroes from heroes.json
# only Dynamo and Haze are in there for initial development and prototyping

heroes_json_folder_path = ''
heroes_json_file = 'heroes.json'

with open(os.path.join(heroes_json_folder_path, heroes_json_file), 'r') as file:
    data = json.load(file)

heroes_subset = list(data['Heroes'].keys())
heroes_subset

['Dynamo', 'Haze']

In [6]:
# use dictionary comprehension to filter the data further for the subset of heroes
filtered_hero_subset_patch_notes_from_to = {hero: filtered_hero_patch_notes_from_to[hero] for hero in heroes_subset if hero in filtered_hero_patch_notes_from_to}

filtered_hero_subset_patch_notes_from_to

{'Dynamo': ['Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8'],
 'Haze': ['Bullet Dance T3 increased from +40% Evasion to +60%',
  'Bullet Dance T3 increased from +2 Dance Move Speed to +3']}

There is a pretty consistent structure for the hero changes. <br>
For example, it looks like the formula is ***'\<hero\>: \<ability\> \<change verb\> from \<old value\>\<units\> to \<new value\>\<units\>'***

I do not know what all of the "change verbs" are so I will extract all of them from the patch notes. My strategy is to get the word directly before "from" using regex. For example: <br>
- "Powder Keg Charge Time increased from 1s to 2s",<br>
- "Bounce Pad spirit scaling reduced from 0.9 to 0.4",<br>
- "Spirit Lasso duration reduced from 2.5s to 2.25s"<br>

I would expect the change verbs to be "increased, reduced, reduced". I can totally operate under the assumption that the change verbs are only increased and reduced, but it may be good practice to know what all of them are. ALTERNATIVELY, I may not even need to know the change verb because I can infer if the delta between X and y (i.e. from X to Y) is positive or negative.

In [7]:
# for each patch note, extract the from "old value" to "new value" and add them to a data frame
filtered_hero_subset_patch_notes_from_to

{'Dynamo': ['Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8'],
 'Haze': ['Bullet Dance T3 increased from +40% Evasion to +60%',
  'Bullet Dance T3 increased from +2 Dance Move Speed to +3']}

In [8]:
# get the list of predefined abilities to ensure matches are made

# create an empty list to store abilities
all_abilities = []

# loop over each hero in the json data
for hero, hero_data in data["Heroes"].items():
    # get the abilities list for the current hero
    hero_specific_abilities = hero_data.get("Abilities", [])
    # loop over each ability and add its name to the list
    for hero_specific_ability in hero_specific_abilities:
        ability_name = hero_specific_ability.get("name")
        if ability_name:
            all_abilities.append(ability_name)

all_abilities

['Kinetic Pulse',
 'Quantum Entanglement',
 'Rejuvenating Aurora',
 'Singularity',
 'Sleep Dagger',
 'Smoke Bomb',
 'Fixation',
 'Bullet Dance']

Next steps would be to functionalize the above code into a dataframe for each patch note and to work on extracting hero subset data from each patch note, scheduling AWS Lambda to process new files, and to create some data visualizations

Another step is to focus on extracting the data that we want.

In [9]:
# join abilities into a regex pattern
ability_pattern = '|'.join(re.escape(ability) for ability in all_abilities)

# regex to extract the description, values, and units
pattern = re.compile(
    r"(?P<ability>(" + ability_pattern + r"))\s+" +    # match the ability name from the list
    r"(?P<description>.*?)\s+" +                       # capture description up to "increased" or "decreased"
    r"(?P<change_type>increased|decreased)\s+" +       # match "increased" or "decreased"
    r"from\s+" +
    r"(?P<old_value>[+-]?\d+(?:\.\d+)?)" +             # old value
    r"(?P<old_unit>(?:\s+(?!to\b)[A-Za-z%][\w\s%]*)?)" +  # old unit (words, %, spaces) that is not "to"
    r"\s+to\s+" +
    r"(?P<new_value>[+-]?\d+(?:\.\d+)?)" +             # new value
    r"(?P<new_unit>(?:\s+[A-Za-z%][\w\s%]*)?)"         # new unit
)

# example patch note (replace with your actual input)
example_patch_note_from_to = "Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8"
print(f"Original patch note: {example_patch_note_from_to}")

# search for the match in the patch note
match = pattern.search(example_patch_note_from_to)
if match:
    # extract ability, description, change type, old and new values, and units
    ability = match.group("ability")
    description = match.group("description").strip()
    change_type = match.group("change_type")
    old_value = match.group("old_value")
    old_unit = match.group("old_unit").strip() if match.group("old_unit").strip() else "n/a"
    new_value = match.group("new_value")
    new_unit = match.group("new_unit").strip() if match.group("new_unit").strip() else "n/a"

    # if units are different, assign both to the longer unit (converted to lowercase)
    if old_unit != new_unit:
        if old_unit == "n/a" and new_unit != "n/a":
            old_unit = new_unit.lower()
        elif new_unit == "n/a" and old_unit != "n/a":
            new_unit = old_unit.lower()
        else:
            if len(old_unit) > len(new_unit):
                new_unit = old_unit.lower()
            elif len(new_unit) > len(old_unit):
                old_unit = new_unit.lower()

    # output extracted information
    print(f"\nPatch note: {example_patch_note_from_to}")
    print(f"Ability: {ability}")
    print(f"Description: {description}")
    print(f"Change type: {change_type}")
    print(f"Old value: {old_value}, Old unit: {old_unit}")
    print(f"New value: {new_value}, New unit: {new_unit}")
else:
    print("No match found.")

Original patch note: Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8

Patch note: Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8
Ability: Kinetic Pulse
Description: damage spirit scaling
Change type: increased
Old value: 1.4, Old unit: n/a
New value: 1.8, New unit: n/a


In [10]:
patch_date = json_content['date']

In [11]:
hero = 'Dynamo'
data = []
data.append([hero, ability, description, old_value, new_value, new_unit, patch_date])
data

[['Dynamo',
  'Kinetic Pulse',
  'damage spirit scaling',
  '1.4',
  '1.8',
  'n/a',
  '2025-01-27']]

In [12]:
columns = ['hero', 'ability', 'description', 'old_value', 'new_value', 'new_unit', 'patch_date']

In [13]:
# create the example dataframe 
df_dynamo = pd.DataFrame(data, columns=columns)
df_dynamo

Unnamed: 0,hero,ability,description,old_value,new_value,new_unit,patch_date
0,Dynamo,Kinetic Pulse,damage spirit scaling,1.4,1.8,,2025-01-27


In [14]:
with open(os.path.join(heroes_json_folder_path, heroes_json_file), 'r') as file:
    data = json.load(file)

heroes_subset = list(data['Heroes'].keys())
heroes_subset

# use dictionary comprehension to filter the data further for the subset of heroes
filtered_hero_subset_patch_notes_from_to = {hero: filtered_hero_patch_notes_from_to[hero] for hero in heroes_subset if hero in filtered_hero_patch_notes_from_to}

filtered_hero_subset_patch_notes_from_to

{'Dynamo': ['Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8'],
 'Haze': ['Bullet Dance T3 increased from +40% Evasion to +60%',
  'Bullet Dance T3 increased from +2 Dance Move Speed to +3']}

In [15]:
# the above output seems like a reasonable checkpoint to have since it is sorted by hero and change
# now I will filter this data so that I am initially just working with "from-to" changes

def filter_from_to_changes(hero_patch_notes):
    filtered = {}
    for hero, changes in hero_patch_notes.items():
        # keep only changes that contain both "from" and "to"
        from_to_changes = [change for change in changes if "from" in change.lower() and "to" in change.lower()]
        if from_to_changes:
            filtered[hero] = from_to_changes
    return filtered

filtered_hero_patch_notes_from_to = filter_from_to_changes(hero_patch_notes)

print(json.dumps(filtered_hero_patch_notes_from_to, indent=4))

{
    "Abrams": [
        "Infernal Resilience increased from 11% to 12%"
    ],
    "Bebop": [
        "Uppercut air control lockout period reduced from 0.5s to 0.3s"
    ],
    "Dynamo": [
        "Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8"
    ],
    "Calico": [
        "Ava duration reduced from 20s to 15s",
        "Ava cooldown reduced from 50s to 45s",
        "Ava speed reduced from 75% to 65%",
        "Ava T2 speed increased from +35% to +45%",
        "Return to Shadows cooldown increased from 80s to 90s"
    ],
    "Grey Talon": [
        "Spirit Snare cooldown reduced from 37s to 34s",
        "Spirit Snare T2 increased from +0.5s to +0.75s"
    ],
    "Haze": [
        "Bullet Dance T3 increased from +40% Evasion to +60%",
        "Bullet Dance T3 increased from +2 Dance Move Speed to +3"
    ],
    "Holliday": [
        "Powder Keg Charge Time increased from 1s to 2s",
        "Bounce Pad spirit scaling reduced from 0.9 to 0.4",
        "Spirit Lasso 

In [16]:
heroes_json_folder_path = ''
heroes_json_filename = 'heroes.json'
json_patch_notes_folder_path = 'json-patch-notes'

In [17]:
def process_patch_notes(heroes, hero_patch_notes):
    data = []

    # Join abilities into a regex pattern, allowing spaces
    ability_pattern = '|'.join([re.escape(ability) for ability in all_abilities])
    
    # Regex to extract the description, values, and units
    pattern = re.compile(
        r"(?P<ability>(" + ability_pattern + r"))\s+" +    # match the ability name from the list
        r"(?P<description>.*?)\s+" +                       # capture description up to "increased" or "decreased"
        r"(?P<change_type>increased|decreased)\s+" +       # match "increased" or "decreased"
        r"from\s+" +
        r"(?P<old_value>[+-]?\d+(?:\.\d+)?%?)" +           # old value (allowing optional percentage sign)
        r"(?P<old_unit>(?:\s+(?!to\b)[A-Za-z%][\w\s%]*)?)" +  # old unit (words, %, spaces) that is not "to"
        r"\s+to\s+" +
        r"(?P<new_value>[+-]?\d+(?:\.\d+)?%?)" +           # new value (allowing optional percentage sign)
        r"(?P<new_unit>(?:\s+[A-Za-z%][\w\s%]*)?)"         # new unit
    )

    # Loop through each hero and their list of changes
    for hero, patch_notes_list in hero_patch_notes.items():

        if hero not in heroes:
            continue
        else:
            for i, patch_note in enumerate(patch_notes_list):
                if "from" in patch_note.lower() and "to" in patch_note.lower():
                    print(hero, " ", patch_note)
                    
                    # Search for the match in the patch note
                    match = pattern.search(patch_note)
                    if match:
                        # Extract relevant information
                        ability = match.group("ability")
                        description = match.group("description").strip()
                        old_value = match.group("old_value")  # keep the percentage if present
                        new_value = match.group("new_value")  # keep the percentage if present
                        new_unit = match.group("new_unit").strip() if match.group("new_unit") else ''
                    
                        # Append the data row
                        data.append([hero, ability, description, old_value, new_value, new_unit, patch_date])
                else:
                    continue

    # Convert to DataFrame
    columns = ['hero', 'ability', 'description', 'old_value', 'new_value', 'new_unit', 'patch_date']
    df = pd.DataFrame(data, columns=columns)

    return df

In [18]:
process_patch_notes(heroes_subset, hero_patch_notes)

Dynamo   Kinetic Pulse damage spirit scaling increased from 1.4 to 1.8
Haze   Bullet Dance T3 increased from +40% Evasion to +60%
Haze   Bullet Dance T3 increased from +2 Dance Move Speed to +3


Unnamed: 0,hero,ability,description,old_value,new_value,new_unit,patch_date
0,Dynamo,Kinetic Pulse,damage spirit scaling,1.4,1.8,,2025-01-27
1,Haze,Bullet Dance,T3,+40%,+60%,,2025-01-27
2,Haze,Bullet Dance,T3,+2,+3,,2025-01-27


In [19]:
# create AWS session
s3 = boto3.client('s3')

# get objects in the bucket
response = s3.list_objects_v2(Bucket=bucket_name)

# sort files by filename
files = sorted(response.get('Contents', []), key=lambda x: x['Key'], reverse=True)

def read_s3_json(file):
        
    # skip if the file starts with '.ipynb'
    if file.startswith('.ipynb'):
        print(f'the latest file starts with .ipynb, skipping: {file}')
        return None
        
    else:
        print(f'opening file: {last_file}')
        
        # download and read file contents
        obj = s3.get_object(Bucket=bucket_name, Key=last_file)
        file_content = obj['Body'].read().decode('utf-8')

        # parse the content as json
        json_content = json.loads(file_content)
        
        return json_content['hero_patch_notes']