# Classic Doom Speed Demo Data Analzer

This Jupyter Notebook will take as input a csv file made according the specifications laid out in the README.md file of this repository and, hopefully, generate something interesting.

This was all done in pursuit of learning, so this notebook will be structured as though the reader is also a student (with some prior Python and statistics experience). I don't expect many people will be doing serious data work with Doom speed demos, but in case you would like, making some small changes to this notebook should be simple work.

In [407]:
import numpy as np
import pandas as pd

pd.options.display.max_rows = 500

In [408]:
df = pd.read_csv('./examples/uv-speed-data.csv')
levels = [
    # Starport Levels
    'ENTRYWAY',
    'UNDERHALLS',
    'THE GANTLET',
    'THE FOCUS',
    'THE WASTE TUNNELS',
    'THE CRUSHER',
    # Hellish Outpost Levels
    'DEAD SIMPLE',
    'TRICKS AND TRAPS',
    'THE PIT',
    'REFUELING BASE',
    'CIRCLE OF DEATH',
    # City Levels
    'THE FACTORY',
    'DOWNTOWN',
    'THE INMOST DENS',
    'INDUSTRIAL ZONE',
    'SUBURBS',
    'TENEMENTS',
    'THE COURTYARD',
    'THE CITADEL',
    'GOTCHA',
    # Inside Hell Levels
    'NIRVANA',
    'THE CATACOMBS',
    'BARRELS O FUN',
    'BARRELS D FUN', # OCR mistake.
    'THE CHASM',
    'BLOODFALLS',
    'BLDODFALLS', # OCR mistake.
    'ABANDONED MINES',
    'MONSTER CONDO',
    'THE SPIRIT WORLD',
    'THE LIVING END',
    'ICON OF SIN',
    # Secret Levels
    'WOLFENSTEIN',
    'GROSSE'
]

In [409]:
df.shape

(2701, 3)

In [410]:
df

Unnamed: 0,armor,health,level
0,,,
1,,,
2,,,
3,,,
4,,,
5,,,
6,,,
7,,,
8,,,
9,4,2 5,JEZM V F


In [411]:
df.columns

Index(['armor', 'health', 'level'], dtype='object')

# Munging Steps

1. Remove all rows with any `NaN` values.
1. Replace with `NaN` any `level` values that are not in the `LEVELS` list.
1. Back fill the `level` column.
1. Delete any rows with invalid values for `health` or `armor`.
1. Find a way to fix incongruent values near level transitions.

In [412]:
# Clear level cells that do not match the list.
df.level[~df.level.isin(levels)] = np.nan
df

Unnamed: 0,armor,health,level
0,,,
1,,,
2,,,
3,,,
4,,,
5,,,
6,,,
7,,,
8,,,
9,4,2 5,


In [413]:
# Fill in level a few rows before and after.
df.level = df.level.fillna(method='backfill', limit=2)
df.level = df.level.fillna(method='ffill', limit=4)
df

Unnamed: 0,armor,health,level
0,,,
1,,,
2,,,
3,,,
4,,,
5,,,
6,,,
7,,,
8,,,
9,4,2 5,


In [414]:
# Clear health and armor cells adjacent to valid level cells.
df.health[df.level.isin(levels)] = np.nan
df.armor[df.level.isin(levels)] = np.nan
df

Unnamed: 0,armor,health,level
0,,,
1,,,
2,,,
3,,,
4,,,
5,,,
6,,,
7,,,
8,,,
9,4,2 5,


In [415]:
# Backfill the level column.
df.level = df.level.fillna(method='backfill')
df

Unnamed: 0,armor,health,level
0,,,ENTRYWAY
1,,,ENTRYWAY
2,,,ENTRYWAY
3,,,ENTRYWAY
4,,,ENTRYWAY
5,,,ENTRYWAY
6,,,ENTRYWAY
7,,,ENTRYWAY
8,,,ENTRYWAY
9,4,2 5,ENTRYWAY


In [416]:
# Convert health and armor to Float64 Series.
df.health = pd.to_numeric(DF.health, errors='coerce')
df.armor = pd.to_numeric(DF.armor, errors='coerce')
df

Unnamed: 0,armor,health,level
0,,,ENTRYWAY
1,,,ENTRYWAY
2,,,ENTRYWAY
3,,,ENTRYWAY
4,,,ENTRYWAY
5,,,ENTRYWAY
6,,,ENTRYWAY
7,,,ENTRYWAY
8,,,ENTRYWAY
9,4.0,,ENTRYWAY


In [417]:
# Drop any rows with a NaN value.
df.dropna(inplace=True)
df

Unnamed: 0,armor,health,level
13,0.0,100.0,ENTRYWAY
14,0.0,100.0,ENTRYWAY
15,0.0,100.0,ENTRYWAY
16,0.0,100.0,ENTRYWAY
17,0.0,100.0,ENTRYWAY
18,0.0,100.0,ENTRYWAY
19,0.0,100.0,ENTRYWAY
20,0.0,100.0,ENTRYWAY
21,0.0,100.0,ENTRYWAY
22,0.0,100.0,ENTRYWAY


In [418]:
# Drop health and armor values that are out of range.
df = df[(df.health > 0) & (df.health <= 200)]
df = df[(df.armor >= 0) & (df.armor <= 200)]
df

Unnamed: 0,armor,health,level
13,0.0,100.0,ENTRYWAY
14,0.0,100.0,ENTRYWAY
15,0.0,100.0,ENTRYWAY
16,0.0,100.0,ENTRYWAY
17,0.0,100.0,ENTRYWAY
18,0.0,100.0,ENTRYWAY
19,0.0,100.0,ENTRYWAY
20,0.0,100.0,ENTRYWAY
21,0.0,100.0,ENTRYWAY
22,0.0,100.0,ENTRYWAY
