# Converting Slippi to a DataFrame: The Metadata
## Table of Contents

1. [Problem Statement](#problem)
2. [Context](#context)<br>
    a. [Terminology](#terms)<br>
    b. [Why Super Smash Bros.?](#why)<br>
    c. [How does Super Smash Bros. Work?](#how)<br>
3. [Executive Summary](#executive)<br>
    a. [Data Gathering](#gather)<br>
    b. [Parsing Data](#parse)<br>
    c. [Modeling](#modeling)<br>
    d. [Limitations](#limitdoesnotexist)<br>
4. [Importing Libraries](#imports)
5. [Retrieving Filepaths of Each Slippi Game](#filepaths)
6. [Data Normalization: First Normal Form](#engineer)
7. [Reading in a Slippi File](#readslp)
8. [Extracting Metadata From Games](#extract)<br>
    a. [Date](#date)<br>
    b. [Duration](#duration)<br>
    c. [Platform](#platform)<br>

<a id='problem'></a>
## Problem Statement

Note: For those unfamiliar with Super Smash Bros., I encourage you to read the [context](#context) section.

Fox versus Falco is a frequent match in competitive Melee. Discussions about whether Fox or Falco have the upperhand in the matchup is a popular debate among players. I believe that among today's top ranked players in the world, Fox will most likely win because of his faster speed over Falco. However, Falco is more likely to win against newer players because he is able to utilize different tools to perform combos that lead to Fox losing a stock.

Since Falco is often one of the first major hurdles for new players to defeat, I hope to construct a coaching tool that will help them overcome this. The final product will use a collection of Slippi games as input and provide summary statistics on the games. For coaching tools of the product, a player can feed in a collection of Slippi games to learn how a specified player and character plays. Then this will create an AI that someone can battle. This is useful for those who wish to practice against a live player, but does not have immediate access to others. A similar idea to how Super Smash Bros. Wii U utilized [Amiibos](https://www.youtube.com/watch?v=uOnLcVOvrEE). As a quick check to see how feasible this project is, I will train a recurrent neural network on a series of games to see how well the network can learn Fox's wake-up behavior.
<img src="../images/melee-wallpaper.jpg" alt="Drawing" style="width: 600px;"/>

<a id='context'></a>
## Context
<a id='why'></a>
### Why Super Smash Bros.?

Super Smash Bros. is a video game series published by Nintendo where video game characters of different franchises battle it out. The second game of the series, Super Smash Bros. Melee, was released in December 2001. This sparked fun parties, happy players, and heated rivalries. These rivalries grew from friend groups, to local neighborhood challenges, to large-scale tournaments by 2002. If you are interested in learning the history of the Super Smash Bros. Melee's competitive scene, I encourage you to watch a docuseries called [The Smash Brothers](https://www.youtube.com/watch?v=NSf2mgkRm7Q&list=PLoUHkRwnRH-IXbZfwlgiEN8eXmoj6DtKM) produced by East Point Pictures.

Many players within this community are dedicated to becoming the best player that they can be. There are many fan-made tools and mods to the game for the purpose of either improving the player experience in training or improving the production of content. Some tools allow players to create save states so that they can easily practice a scenario quickly. Others allow content creators to create replays for the audience to enjoy. Today, I would like to highlight one of these tools, Project Slippi.
- [Website](https://slippi.gg/)
- [Github](https://github.com/project-slippi/project-slippi)
- [Medium](https://medium.com/project-slippi)

<a id='how'></a>
### How Does Super Smash Bros. Work?

In this project, I will be only looking at 1v1 tournament legal matches. The criteria for that are:
- The match must be 1v1 with no other players or CPU's in the match.
- The match must be on a tournament legal stage.
- The match must be 8 minutes or less.

In a 1v1 match, players are limited to defeat their opponent with nothing more than the abilities of themselves as a player and the abilities of their selected character. Characters have various moves such as jabs, smash attacks, tilt attacks, special attacks, grabs, and aerial attacks. Most of these attacks have slight variations of themselves, but at different directions. When a player strikes their opponent with an attack, then the opponent's damage percentage goes up.
<img src="../images/damage-example.gif"/> <center>Fox Damaging Falco</center>

As a character's damage increases, then the distance at which they are launched after a hit then increases. This is beneficial as an opponent because the further they travel for each hit, then it should be easier to push them through a blast zone and have them lose a stock. Once a character loses all their stocks, then the other player is determined the winner. If a timeout occurs, then the player with the most stocks and least damage wins. In the case that those are tied as well, then a rematch is played with each charcter getting one stock.

<img src="../images/GAME.gif"/> <center>Jigglypuff Defeating Fox's Last Stock; Winning the Game</center>

### A Primer on Fighting Games

Fighting games is one of the many genres of video games. Some fighting game titles that may sound familiar are Street Fighter and Tekken. These games are unique in that Street Fighter is a 2-dimensional fighter where characters can only move left or right and Tekken characters can move forward, backwards, left or right. While all of these games are systematically different, they all share similar strategies that a player can utilize. For example, players can play as a heavy brute that can pack a punch, but are slow. Or they can play a glass cannon that can attack with lightning fast speed, but they are easy to kill once their momentum is broken.

The idea of mixups is another common thread among fighting games. One of the most frequent mixup situations that occur in fighting games are when a character is laying on the ground. Usually, the character has one of four options known as get-up options or wake-up options. They can stay on the ground, get up, roll forward, or roll back. The opponent must be able to either predict or quickly react to whichever option the player takes and act accordingly. If the opponent does not capitalize on the player's vulnerability, then the player has a chance to fight back and win.

In Melee, when a character is launched towards the floor, wall, or ceiling, then they have the option to tech. When a player is hit and placed in a situation where they must decide whether to tech in-place, tech-roll left, or tech-roll right, they may miss the 20 frame window and not tech. This will cause them to stay vulnerable on the ground with few options to retaliate. However, like stated before, this is a mixup. If a player frequently techs to the right, then their opponent will be accustomed to that behavior. So whenever the the opponent sees that the player has the option to tech, then they will assume the player will tech right and adjust the combo. If the player suddenly techs in-place and the opponent did not expect that, then the player has recovered safely and is free to move. This process is known as tech chasing - the opponent is tech-chasing the player. 
![tech-chasing](../images/tech-chase.gif) <center>Captain Falcon tech-chasing Fox McCloud</center>



<a id='executive'></a>
## Executive Summary
<a id = 'gather'></a>
### Data Gathering
To begin, I considered constructing a script that would scrape [Slippi's site](https://slippi.gg) for games that occured during a tournament using the library `selenium`. I opted not to do this because the official Slippi Discord channel has the `!replaydumps` chat command that provides a download link to Slippi files from [Fight Pitt 9](https://smash.gg/tournament/fight-pitt-9-1/details), [Full Bloom 5](https://smash.gg/tournament/full-bloom-5/details), [The Gang Steals the Script](https://smash.gg/tournament/the-gang-steals-the-script/details), and [Pound 2019](https://smash.gg/tournament/pound-2019/details). The source of the data is on a different platform, but each are controlled by the creators and major contributors to the project such as [Fizzi](https://twitter.com/Fizzi36).

<a id='parse'></a>
### Parsing Data
With the data in hand, I used the `slippi` library to read in each file as a Slippi's own Game object. Game objects have attributes whose values are sometimes other objects. Since each Slippi file has the same structure, I created a function `metadata_to_df` to parse the metadata of each game. The objective here is to filter the games as needed. For example, I want 1v1 games, so I can filter for games where the team battle option is set to off.

<a id = 'modeling'></a>
### Modeling
Once I have filtered the games I will use as input, I will then create another function to parse the Frame objects of each game. Each Frame object contains information of each frame within the game such as character position and controller inputs. These values will be fed to a recurrent neural network (RNN). The RNN will have a simple topology of a single hidden layer. This is because I am interested to see how well the RNN can learn the players behavior the least amount of complexity possible. If the results were not much better than the baseline accuracy score, then I would increase the complexity of the topology, but cannot due to the below limitations.

<a id = 'limitatdoesnotexist'></a>
### Limitations
tl;dr Power and money.

When parsing the metadata and frame data from each game, it took a considerable amount of time to execute for all Fight Pitt 9 games. This was a concern because Fight Pitt 9 contained the least amount of games compared to the other tournaments. This lack of power encouraged me only utilize Fight Pitt 9 games.

Since every frame of a game contains multiple features that can be used as inputs to the RNN, there is a lot of data to deal with. If a single game lasts for a minute, then that is 3600 frames (60 seconds * 60 frames per second). Each frame contains 333 features after cleaning and dummying. In short, a lot of power is needed to be able to fit the neural network quickly.

I could use AWS cloud computing to perform the task, but the machine's that were noticeably stronger than my machine cost too much for me at the moment.

<a id = 'imports'></a>
## Importing libraries

In [1]:
import pandas as pd
import numpy as np
import slippi as slp
import os

<a id = 'filepaths'></a>
## Retrieving Filepaths of Each Slippi Game
Create a list that contains each file path to all games within the provided directory<sub>[1](https://kite.com/python/examples/4286/os-get-the-path-of-all-files-in-a-directory)</sub>.

In [2]:
# Directory to take filepaths from
dir_fp9 = '../../data/Fight-Pitt-9/'

# lists of file paths to Slippi files
fight_pitt_9, full_bloom_5, gang = [], [], []

# adding each file to their respective list
# Fight Pitt 9
for path in os.listdir(dir_fp9):
    full_path = os.path.join(dir_fp9, path)
    if os.path.isfile(full_path):
        fight_pitt_9.append(full_path)

According to Finder, there are 1,151 items in the Fight Pitt 9 directory.

In [3]:
print(f'# of Fight Pitt 9 filepaths: {len(fight_pitt_9)}')
print('Expecting 1151')
fight_pitt_9[:5]

# of Fight Pitt 9 filepaths: 1151
Expecting 1151


['../../data/Fight-Pitt-9/Game_20190406T182021.slp',
 '../../data/Fight-Pitt-9/Game_20190406T054329.slp',
 '../../data/Fight-Pitt-9/Game_20190406T113710.slp',
 '../../data/Fight-Pitt-9/Game_20190406T060932.slp',
 '../../data/Fight-Pitt-9/Game_20190406T063208.slp']

In [4]:
for thing in fight_pitt_9:
    if 'copy' in thing:
        print(thing)

../../data/Fight-Pitt-9/Game_20190406T165651 copy.slp
../../data/Fight-Pitt-9/Game_20190406T173112 copy.slp
../../data/Fight-Pitt-9/Game_20190406T113216 copy.slp
../../data/Fight-Pitt-9/Game_20190406T194233 copy.slp
../../data/Fight-Pitt-9/Game_20190406T104203 copy.slp
../../data/Fight-Pitt-9/Game_20190406T192258 copy.slp
../../data/Fight-Pitt-9/Game_20190406T104708 copy.slp
../../data/Fight-Pitt-9/Game_20190406T122825 copy.slp


<a id = 'engineer'></a>
### Data Normalization: 1st Normal Form<sub>[2](https://www.guru99.com/database-normalization.html#2)</sub>
You may have noticed that some ` copy` before the file extension if you looked at more game files within the list. This occurred because the main directory that contained all Slippi files for a particular tournament grouped all files according to which station (i.e. console) at which the game was played.

In other words, the main directory contained sub-directories that held the games. In order to get all games in one folder, regardless of station, I had moved all Slippi files into the main directory and deleted the subdirectories once emtpy. Since it is possible for two stations to have started a game at the same time within a second, then it is possible for a `game_id` to be shared between multiple games. Currently, the filename of each Slippi file will be used as the index in the metadata  dataframe. This creates an anomoly in which the index that represents a single game may be used to represent two games.

In future versions, I will adjust [cell 2](#cell2) to be able to retrieve the filepaths of Slippi files within all sub-directories and then make the index of the metadata dataframe the filename of the Slippi file as well as the station ID.

These anomalies are not addressed at the moment because I will only be using games that are Fox vs. Falco on Final Destination. Of all games that fit that criteria, none of them share a filename.

<a id='readslp'></a>
## Reading in a Slippi File

In [5]:
# A single game from each tournament:
game = slp.Game(fight_pitt_9[0])

<a id = 'extract'></a>
## Extracting Metadata From Games
<a id = 'date'></a>
### Date
When reading in a list of Slippi files, a list comprehension will be used to iterate through each game.

In [6]:
game.metadata

Metadata(date=2019-04-06 18:20:21+00:00, duration=11653, platform=Platform.NINTENDONT, players=(None, None, None, None))

In [7]:
# A Game object has a metadata attribute whose value is a Metadata object.
# This Metadata object has attributes shown below.
date = game.metadata.date
print(date)

2019-04-06 18:20:21+00:00


<a id = 'duration'></a>
### Duration
This details the length of the match in _n_ frames where a single frame is 1/60 seconds.

In [8]:
duration = game.metadata.duration
duration

11653

<a id = 'platform'></a>
### Platform
The platform on which the game was played. Either on a Dolphin emulator or console.

In [9]:
platform = game.metadata.platform
platform

Platform.NINTENDONT

<a id = 'chars'></a>
### Characters
We will need to determine which controller ports are being used to determine where to read data from.

In [10]:
game.metadata.players

(None, None, None, None)

It appears this data is not stored in the metadata attribute. Since these files are able to reconstruct a replay of the game using the controller inputs, then this information must be somewhere in the file. That "somewhere" is the `start` attribute.

In [11]:
# Similar to the Metadata, each Game object has a
# Start object stored in each game's start attribute
game.start.players

(Player(character=CSSCharacter.ICE_CLIMBERS, costume=0, stocks=4, tag=, team=None, type=Type.HUMAN, ucf=UCF(dash_back=DashBack.UCF, shield_drop=ShieldDrop.UCF)),
 None,
 None,
 Player(character=CSSCharacter.MARTH, costume=1, stocks=4, tag=, team=None, type=Type.HUMAN, ucf=UCF(dash_back=DashBack.UCF, shield_drop=ShieldDrop.UCF)))

In [12]:
game.start

Start(is_frozen_ps=None, is_pal=False, is_teams=False, players=(Player(character=CSSCharacter.ICE_CLIMBERS, costume=0, stocks=4, tag=, team=None, type=Type.HUMAN, ucf=UCF(dash_back=DashBack.UCF, shield_drop=ShieldDrop.UCF)), None, None, Player(character=CSSCharacter.MARTH, costume=1, stocks=4, tag=, team=None, type=Type.HUMAN, ucf=UCF(dash_back=DashBack.UCF, shield_drop=ShieldDrop.UCF))), random_seed=3456179710, slippi=Slippi(version=1.7.1), stage=Stage.FINAL_DESTINATION)

In [13]:
# For each port in the tuple stored in Game.Start.players,
# if the port is not a Nonetype value,
# append its index to the list
ports = [game.start.players.index(port) for port in game.start.players if port != None]
ports

[0, 3]

In [14]:
[game.start.players.index(port) for port in game.start.players if port != None]

[0, 3]

In [15]:
# If a player is occupying the 0th port, then we can extract data about that player
# If no player exists at the requested port, an error is thrown.
game.start.players[0].character

CSSCharacter.ICE_CLIMBERS

In [16]:
characters = [game.start.players[port].character for port in ports]
characters

[CSSCharacter.ICE_CLIMBERS, CSSCharacter.MARTH]

### Stage

In [17]:
game.start.stage

Stage.FINAL_DESTINATION

### Metadata Series
When reading in multiple games a DataFrame will be made where each row will represent a single game. When doing so, we can make the value of each key of the dictionary a list of values for each column.

#### For Older Versions of Slippi Such as Fight Pitt 9

In [18]:
pd.Series({'date': date,
           'duration': duration,
           'platform': platform,
           'p1_port': ports[0],
           'p1_character': characters[0],
           'p2_port': ports[1],
           'p2_character': characters[1],
          'stage': game.start.stage})

date            2019-04-06 18:20:21+00:00
duration                            11653
platform              Platform.NINTENDONT
p1_port                                 0
p1_character    CSSCharacter.ICE_CLIMBERS
p2_port                                 3
p2_character           CSSCharacter.MARTH
stage             Stage.FINAL_DESTINATION
dtype: object

### Getting all Metadata
#### game_id

The `game_id` will be established using the time at which the game has started. As mentioned above in the Data Normalization section, this can cause anomalies in the data. However, of the games that we will use in our model, we do not encounter this problem. This will be addressed in future versions of this project.

In [19]:
fight_pitt_9[0]

'../../data/Fight-Pitt-9/Game_20190406T182021.slp'

In [20]:
# Take the first filepath in the list of filepaths.
# Split the string according at each "/" character
# Take the last element of the resulting list after splitting.
# Remove "Game_" from the beginning and ".slp" from the end.
fight_pitt_9[0].split('/')[-1].strip('Game_').strip('.slp')

'20190406T182021'

In [21]:
def metadata_to_df(slp_paths):
    '''
    Of a collection of games, store the metadata as a dataframe.
    
    slp_paths (list): each value is the file path to games
    returns a dataframe
    '''
    # Used to determine how far along the function is while waiting
    length = len(slp_paths)
    count = 0
    
    # lists of values to populate dataframe.
    dates, game_id, durations, plats, p1_ports, p1_chars, p2_ports, p2_chars, stages, is_teams, is_pal = list(), \
    list(), list(), list(), list(), list(), list(), list(), list(), list(), list()
    
    # For each filepath in the provided list of filepaths
    for path in slp_paths:
        count += 1
        print(f'Parsing metadata from file {count} of {length}: {round(count / length * 100, 2)}%', end = '\r')
        
        # try to instantiate the Game object, else skip it and try the next one
        try:
            game = slp.Game(path)
        except:
            print(f'Skip game {count} of {length}')
            continue
            
        # set game ID
        # to get file path using game_id:
        # ../folder_directory/Game_[game_id].slp
        game_id.append(slp_paths[count - 1].split('/')[-1].strip('Game_').strip('.slp'))
        
        # take the date, duration, and platform data
        dates.append(game.metadata.date)
        durations.append(game.metadata.duration)
        plats.append(game.metadata.platform)

        # get active ports
        ports = [game.start.players.index(port) for port in game.start.players if port != None]
        p1_ports.append(ports[0])
        p2_ports.append(ports[1])

        # get characters
        characters = [game.start.players[port].character for port in ports]
        p1_chars.append(characters[0])
        p2_chars.append(characters[1])
        
        # get stages played on
        stages.append(game.start.stage)
        
        # is the game not a 1v1
        is_teams.append(game.start.is_teams)
        
        # is this not a v1.02 match
        is_pal.append(game.start.is_pal)
    
    # return metadata DataFrame
    return pd.DataFrame(data = {
            'game_id': game_id,
            'date': dates,
            'duration': durations,
            'platform': plats,
            'p1_port': p1_ports,
            'p1_char': p1_chars,
            'p2_port': p2_ports,
            'p2_char': p2_chars,
            'stage': stages,
            'is_teams': is_teams,
            'is_pal': is_pal
        })

Even though the character information is missing in the metadata, the character information is stored in other locations of each game file such as the start attribute of the Game object.

In [22]:
df_fp9 = metadata_to_df(fight_pitt_9)
df_fp9.head()

Parsing metadata from file 21 of 1151: 1.82%



Parsing metadata from file 37 of 1151: 3.21%



Parsing metadata from file 83 of 1151: 7.21%



Skip game 173 of 1151 file 173 of 1151: 15.03%
Parsing metadata from file 176 of 1151: 15.29%



Parsing metadata from file 185 of 1151: 16.07%



Skip game 197 of 1151 file 197 of 1151: 17.12%
Parsing metadata from file 312 of 1151: 27.11%



Skip game 381 of 1151 file 381 of 1151: 33.1%%
Parsing metadata from file 427 of 1151: 37.1%%



Parsing metadata from file 464 of 1151: 40.31%



Parsing metadata from file 518 of 1151: 45.0%%



Parsing metadata from file 760 of 1151: 66.03%



Skip game 868 of 1151 file 868 of 1151: 75.41%
Skip game 1036 of 1151file 1036 of 1151: 90.01%
Parsing metadata from file 1151 of 1151: 100.0%

Unnamed: 0,game_id,date,duration,platform,p1_port,p1_char,p2_port,p2_char,stage,is_teams,is_pal
0,20190406T182021,2019-04-06 18:20:21+00:00,11653,Platform.NINTENDONT,0,14,3,9,32,False,False
1,20190406T054329,2019-04-06 05:43:29+00:00,1435,Platform.NINTENDONT,1,20,2,18,31,False,False
2,20190406T113710,2019-04-06 11:37:10+00:00,7577,Platform.NINTENDONT,0,22,1,2,3,True,False
3,20190406T060932,2019-04-06 06:09:32+00:00,9589,Platform.NINTENDONT,0,20,3,7,28,False,False
4,20190406T063208,2019-04-06 06:32:08+00:00,10043,Platform.NINTENDONT,0,14,3,9,8,False,False


In [23]:
game.start.players[0]

Player(character=CSSCharacter.ICE_CLIMBERS, costume=0, stocks=4, tag=, team=None, type=Type.HUMAN, ucf=UCF(dash_back=DashBack.UCF, shield_drop=ShieldDrop.UCF))

In [24]:
game.start.players[0].ucf

UCF(dash_back=DashBack.UCF, shield_drop=ShieldDrop.UCF)

In [25]:
df_fp9.to_csv('../data/fp9.csv')

<a id='terms'></a>
## Terminology
- Melee (noun): Shorthand for Super Smash Bros. Melee
- Stock (noun): A single unit of character lives. When a character runs out of stocks, then they lose the game.
- Frame (noun): A single unit of time for animation. In this case, there are 60 frames per second.
    - If a video is a collection of images being shown sequentially at a specified rate, then a single image is a frame.
- Hurtbox (noun): An invisible shape that contours the visible character model. Allows interaction between various elements of the game such as hitboxes.
- Hitbox (noun): An invisible shape that, when collided with an opponent's hurtbox, the game registers a hit.
- Mixup (noun): Changing one's pattern of fighting to be unpredictable and gain advantage over the opponent.
- Fox (noun): A playable character in Super Smash Bros. Melee.
- Marth (noun): 
- Stage (noun): The stage the characters are fighting on.
- Blast Zone (noun): Outer perimeter of the stage. When a character crosses over the perimeter, then that character loses a stock.
- Tech (verb): When a character is launched towards a surface, they can press either L or R triggers within 20 frames of colliding with the surface in order to be able to retaliate faster than if one were to not tech.
    - When teching on the ground, a character can tech in place, tech-roll left, tech-roll right, or not tech at all.
![tech-example](../images/tech.gif) <center>Example of a neutral tech or tech in-place</center>