---
title: "Data Cleaning"
format:
    html: 
        code-fold: false
---

<!-- After digesting the instructions, you can delete this cell, these are assignment instructions and do not need to be included in your final submission.  -->

{{< include instructions.qmd >}} 

# Code 

Provide the source code used for this section of the project here.

If you're using a package for code organization, you can import it at this point. However, make sure that the **actual workflow steps**—including data processing, analysis, and other key tasks—are conducted and clearly demonstrated on this page. The goal is to show the technical flow of your project, highlighting how the code is executed to achieve your results.

If relevant, link to additional documentation or external references that explain any complex components. This section should give readers a clear view of how the project is implemented from a technical perspective.

Remember, this page is a technical narrative, NOT just a notebook with a collection of code cells, include in-line Prose, to describe what is going on.

In [1]:
#Import packages used in this notebook
import pandas as pd
import re
import numpy as np
import ast
from pathlib import Path
import os
import json

# Begin by extracting all useful data from the JSON files
# Then combine them into one pandas data frame.

In [None]:
json_dir = 'data/raw-data'
rows = []

#Create a lot of counting if statements

#get count of tier
def count_stars(units, level):
    return sum(1 for u in units if u.get("tier") == level)

#get total unit cost
def count_cost(units, cost):
    return sum(1 for u in units if u.get("rarity") == cost - 1)

#get items
def count_items(units):
    return sum(len(u.get("itemNames", [])) for u in units)

#get unit costs. Cost = rarity - 1
def unit_cost(units):
    return [u.get("rarity", -1) + 1 for u in units]

for filename in os.listdir(json_dir):
    if filename.endswith(".json"):
        filepath = os.path.join(json_dir, filename)
        try:
            with open(filepath, "r") as f:
                data = json.load(f)
        except Exception as e:
            print(f"Error reading {filename}: {e}")
            continue

        match_id = data["metadata"]["match_id"]
        info = data["info"]
        game_length = info.get("game_length")
        game_version = info.get("game_version")

        for p in info["participants"]:
            #Start extracting the rows of csv file
            units = p.get("units", [])
            traits = p.get("traits", [])

            #Basic Gameplay Info
            rows.append({
                "match_id": match_id,
                "puuid": p["puuid"],
                "placement": p.get("placement"),
                "level": p.get("level"),
                "time_eliminated": p.get("time_eliminated"),
                "total_damage": p.get("total_damage_to_players"),
                "game_length": game_length,
                "game_version": game_version,
                "gold_left": p.get("gold_left"),

                #Unit, Trait
                "traits": traits,
                "units": units,
                "num_traits": len(traits),
                "num_units": len(units),

                #Count of the units by star level
                "num_1star": count_stars(units, 1),
                "num_2star": count_stars(units, 2),
                "num_3star": count_stars(units, 3),

                #Get count of units by cost
                "num_cost1": count_cost(units, 1),
                "num_cost2": count_cost(units, 2),
                "num_cost3": count_cost(units, 3),
                "num_cost4": count_cost(units, 4),
                "num_cost5": count_cost(units, 5),

                #How many Items
                "total_items": count_items(units),

                #Get the Augments
                "augments": p.get("augments"),})

#Create final DataFrame and faltten
df = pd.DataFrame(rows)

In [None]:
#how many rows does the final df have?
print("Merged rows:", len(df))

#display the df
print(df.head())

In [None]:
#Save df as csv 
df.to_csv("data/raw/RIOT_rawdata.csv", index=False)

In [2]:
#Read combined csv file from folder raw as a pandas df
dfRaw = pd.read_csv("../data-collection/data/raw/RIOT_rawdata.csv")

#Inital check of df
dfRaw.head(5)

Unnamed: 0,match_id,puuid,placement,level,time_eliminated,total_damage,game_length,game_version,gold_left,traits,...,num_1star,num_2star,num_3star,num_cost1,num_cost2,num_cost3,num_cost4,num_cost5,total_items,augments
0,NA1_5412752266,zkXtkj27xwOEvLL2bygUUGjlPlDFOrxW6vscN82z0s4m4w...,8,9,1591.827148,40,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,1,"[{'name': 'TFT15_Bastion', 'num_units': 1, 'st...",...,4,5,0,1,2,1,0,1,13,
1,NA1_5412752266,5bRR3JrvSRXpRqOi2u1VDib7uHYVQH9DBlDF7A1dv9_uvw...,1,9,2174.608154,173,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,61,"[{'name': 'TFT15_DragonFist', 'num_units': 1, ...",...,4,5,1,3,1,2,0,2,17,
2,NA1_5412752266,ILqfYW7Mnea2sHFJ5mvZ4yhH6wHmkRhOqA8m8oTZXvtN71...,6,9,1815.780273,80,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,0,"[{'name': 'TFT15_Bastion', 'num_units': 4, 'st...",...,2,7,1,3,1,3,0,2,11,
3,NA1_5412752266,yja8Q8Aza0XQ_s9heBROE4aDddezONw0abVOC5GpRZGMqw...,3,9,2032.849243,124,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,9,"[{'name': 'TFT15_Bastion', 'num_units': 2, 'st...",...,0,9,0,1,2,1,0,3,12,
4,NA1_5412752266,OUBCpR6kdwc2R4zz_MSa6FTluLG14FQnrjJ9Pglbs2Z4ON...,5,8,1818.687256,84,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,41,"[{'name': 'TFT15_Bastion', 'num_units': 2, 'st...",...,1,7,0,3,1,1,0,3,13,


In [3]:
#See intitial shape
dfRaw.shape

(50825, 23)

In [4]:
#How many games did the API acquire?
dfRaw["match_id"].nunique()

6388

In [5]:
#How many players did the API acquire?
dfRaw["puuid"].nunique()

8662

In [6]:
#Begin by dropping duplicates
dfRaw = dfRaw.drop_duplicates()
dfRaw.shape

(50825, 23)

In [7]:
#Check datatypes
dfRaw.dtypes

match_id            object
puuid               object
placement            int64
level                int64
time_eliminated    float64
total_damage         int64
game_length        float64
game_version        object
gold_left            int64
traits              object
units               object
num_traits           int64
num_units            int64
num_1star            int64
num_2star            int64
num_3star            int64
num_cost1            int64
num_cost2            int64
num_cost3            int64
num_cost4            int64
num_cost5            int64
total_items          int64
augments           float64
dtype: object

In [8]:
#get decriptive stats of int and float fields
dfRaw.describe(include=['int64','float64'])

Unnamed: 0,placement,level,time_eliminated,total_damage,game_length,gold_left,num_traits,num_units,num_1star,num_2star,num_3star,num_cost1,num_cost2,num_cost3,num_cost4,num_cost5,total_items,augments
count,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,50825.0,0.0
mean,4.497196,8.549788,1880.512909,95.141564,2186.548711,9.397226,10.366552,8.678013,2.132435,5.626857,0.912464,1.515258,1.654934,1.547231,0.00423,2.192878,12.576606,
std,2.292515,0.873291,280.256322,48.930045,146.905622,17.203921,2.297279,1.140197,1.545226,2.153924,1.256414,0.894001,0.743277,1.002328,0.067867,1.098274,3.810764,
min,1.0,1.0,7.777373,0.0,7.777373,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
25%,2.0,8.0,1676.638184,58.0,2091.750244,0.0,9.0,8.0,1.0,4.0,0.0,1.0,1.0,1.0,0.0,1.0,10.0,
50%,4.0,9.0,1898.408813,89.0,2184.151855,2.0,10.0,9.0,2.0,6.0,0.0,2.0,2.0,1.0,0.0,2.0,12.0,
75%,6.0,9.0,2086.252686,128.0,2283.740723,10.0,12.0,9.0,3.0,7.0,2.0,2.0,2.0,2.0,0.0,3.0,15.0,
max,8.0,10.0,3036.849609,283.0,3048.027588,438.0,20.0,15.0,11.0,14.0,7.0,7.0,6.0,8.0,2.0,9.0,36.0,


In [9]:
#Is all data from same version of game?
dfRaw['game_version'].unique()

array(['Linux Version 15.22.724.5161 (Nov 05 2025/16:11:29) [PUBLIC] <Releases/15.22>',
       'Linux Version 15.21.721.8442 (Oct 24 2025/18:44:48) [PUBLIC] <Releases/15.21>',
       'Linux Version 15.23.726.9074 (Nov 17 2025/11:33:30) [PUBLIC] <Releases/15.23>',
       'Linux Version 15.22.723.8534 (Nov 03 2025/13:45:54) [PUBLIC] <Releases/15.22>',
       'Linux Version 15.23.728.3286 (Nov 21 2025/16:26:55) [PUBLIC] <Releases/15.23>',
       'Linux Version 15.22.723.1955 (Oct 30 2025/15:20:31) [PUBLIC] <Releases/15.22>',
       'Linux Version 15.21.721.4012 (Oct 23 2025/12:15:03) [PUBLIC] <Releases/15.21>',
       'Linux Version 15.20.719.0545 (Oct 14 2025/12:30:46) [PUBLIC] <Releases/15.20>',
       'Linux Version 15.21.720.0925 (Oct 17 2025/16:00:09) [PUBLIC] <Releases/15.21>',
       'Linux Version 15.21.721.0583 (Oct 22 2025/11:42:56) [PUBLIC] <Releases/15.21>'],
      dtype=object)

In [11]:
#As seen above, the Releases are slightly different. This is important because each release has a diff balance to the game
#source: https://www.leagueoflegends.com/en-us/news/tags/teamfight-tactics-patch-notes/

#lets pull the releases out of game_version and count the number of rows by version
dfRaw["Release_Version"] = dfRaw["game_version"].str.extract(r"<Releases/([\d\.]+)>")
dfRaw["Release_Version"].value_counts()

Release_Version
15.22    33808
15.23     8986
15.21     7972
15.20       59
Name: count, dtype: int64

In [12]:
#Remove 15.20 should be removed because it only has 59 records
#15.22 is most robust when eval a single release
#15.23 and 15.21 are kept incase I want to do a version comparison

dfRaw = dfRaw[dfRaw["Release_Version"] != "15.20"]
dfRaw["Release_Version"].value_counts()

Release_Version
15.22    33808
15.23     8986
15.21     7972
Name: count, dtype: int64

In [13]:
#check percentage of nulls in each row
dfRaw.notnull().mean() * 100

match_id           100.0
puuid              100.0
placement          100.0
level              100.0
time_eliminated    100.0
total_damage       100.0
game_length        100.0
game_version       100.0
gold_left          100.0
traits             100.0
units              100.0
num_traits         100.0
num_units          100.0
num_1star          100.0
num_2star          100.0
num_3star          100.0
num_cost1          100.0
num_cost2          100.0
num_cost3          100.0
num_cost4          100.0
num_cost5          100.0
total_items        100.0
augments             0.0
Release_Version    100.0
dtype: float64

In [14]:
#Auguments is only case of null data. But Auguments has 0 data popualted so get rid of it.
dfRaw = dfRaw.drop(["augments"],axis = 1)

# This might be error on my API script. But I dont have time to pull new data or fix this.

In [15]:
#How much of the dataset is created by bots?
#Bots are identified by having bot in puuid

dfRaw['puuid'] = dfRaw['puuid'].astype(str).str.lower()
bot_puuid = dfRaw['puuid'] == "bot"
num_bots = bot_puuid.sum()
print("Total bots:", num_bots)

Total bots: 11


In [16]:
#Remove bot data
dfRaw = dfRaw[dfRaw['puuid'] != "bot"]

In [None]:
#Begin creating useful fields for modeling

In [17]:
#Traits and Units need to be converted into Python Objects
dfRaw["traits"] = dfRaw["traits"].apply(ast.literal_eval)
dfRaw["units"] = dfRaw["units"].apply(ast.literal_eval)

dfRaw.head(5)

Unnamed: 0,match_id,puuid,placement,level,time_eliminated,total_damage,game_length,game_version,gold_left,traits,...,num_1star,num_2star,num_3star,num_cost1,num_cost2,num_cost3,num_cost4,num_cost5,total_items,Release_Version
0,NA1_5412752266,zkxtkj27xwoevll2byguugjlpldforxw6vscn82z0s4m4w...,8,9,1591.827148,40,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,1,"[{'name': 'TFT15_Bastion', 'num_units': 1, 'st...",...,4,5,0,1,2,1,0,1,13,15.22
1,NA1_5412752266,5brr3jrvsrxprqoi2u1vdib7uhyvqh9dbldf7a1dv9_uvw...,1,9,2174.608154,173,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,61,"[{'name': 'TFT15_DragonFist', 'num_units': 1, ...",...,4,5,1,3,1,2,0,2,17,15.22
2,NA1_5412752266,ilqfyw7mnea2shfj5mvz4yhh6whmkrhoqa8m8otzxvtn71...,6,9,1815.780273,80,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,0,"[{'name': 'TFT15_Bastion', 'num_units': 4, 'st...",...,2,7,1,3,1,3,0,2,11,15.22
3,NA1_5412752266,yja8q8aza0xq_s9hebroe4adddezonw0abvoc5gprzgmqw...,3,9,2032.849243,124,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,9,"[{'name': 'TFT15_Bastion', 'num_units': 2, 'st...",...,0,9,0,1,2,1,0,3,12,15.22
4,NA1_5412752266,oubcpr6kdwc2r4zz_msa6ftlulg14fqnrjj9pglbs2z4on...,5,8,1818.687256,84,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,41,"[{'name': 'TFT15_Bastion', 'num_units': 2, 'st...",...,1,7,0,3,1,1,0,3,13,15.22


In [18]:
#Find count of traits and units
dfRaw["total_traits"] = dfRaw["traits"].apply(lambda t: len(t))
dfRaw["total_units"] = dfRaw["units"].apply(lambda u: len(u))

#check
dfRaw.head(5)

Unnamed: 0,match_id,puuid,placement,level,time_eliminated,total_damage,game_length,game_version,gold_left,traits,...,num_3star,num_cost1,num_cost2,num_cost3,num_cost4,num_cost5,total_items,Release_Version,total_traits,total_units
0,NA1_5412752266,zkxtkj27xwoevll2byguugjlpldforxw6vscn82z0s4m4w...,8,9,1591.827148,40,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,1,"[{'name': 'TFT15_Bastion', 'num_units': 1, 'st...",...,0,1,2,1,0,1,13,15.22,14,9
1,NA1_5412752266,5brr3jrvsrxprqoi2u1vdib7uhyvqh9dbldf7a1dv9_uvw...,1,9,2174.608154,173,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,61,"[{'name': 'TFT15_DragonFist', 'num_units': 1, ...",...,1,3,1,2,0,2,17,15.22,10,10
2,NA1_5412752266,ilqfyw7mnea2shfj5mvz4yhh6whmkrhoqa8m8otzxvtn71...,6,9,1815.780273,80,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,0,"[{'name': 'TFT15_Bastion', 'num_units': 4, 'st...",...,1,3,1,3,0,2,11,15.22,9,10
3,NA1_5412752266,yja8q8aza0xq_s9hebroe4adddezonw0abvoc5gprzgmqw...,3,9,2032.849243,124,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,9,"[{'name': 'TFT15_Bastion', 'num_units': 2, 'st...",...,0,1,2,1,0,3,12,15.22,8,9
4,NA1_5412752266,oubcpr6kdwc2r4zz_msa6ftlulg14fqnrjj9pglbs2z4on...,5,8,1818.687256,84,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,41,"[{'name': 'TFT15_Bastion', 'num_units': 2, 'st...",...,0,3,1,1,0,3,13,15.22,7,8


In [19]:
#create binary that shows if placed 1st
dfRaw["top1"] = (dfRaw["placement"] == 1).astype(int)

#create binary that shows if placed top 4
dfRaw["top4"] = (dfRaw["placement"] <= 4).astype(int)

#create binary that shows if placed bottom 4 (5-8)
dfRaw["bottom4"] = (dfRaw["placement"] >= 5).astype(int)

#check results
dfRaw.head(5)

Unnamed: 0,match_id,puuid,placement,level,time_eliminated,total_damage,game_length,game_version,gold_left,traits,...,num_cost3,num_cost4,num_cost5,total_items,Release_Version,total_traits,total_units,top1,top4,bottom4
0,NA1_5412752266,zkxtkj27xwoevll2byguugjlpldforxw6vscn82z0s4m4w...,8,9,1591.827148,40,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,1,"[{'name': 'TFT15_Bastion', 'num_units': 1, 'st...",...,1,0,1,13,15.22,14,9,0,0,1
1,NA1_5412752266,5brr3jrvsrxprqoi2u1vdib7uhyvqh9dbldf7a1dv9_uvw...,1,9,2174.608154,173,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,61,"[{'name': 'TFT15_DragonFist', 'num_units': 1, ...",...,2,0,2,17,15.22,10,10,1,1,0
2,NA1_5412752266,ilqfyw7mnea2shfj5mvz4yhh6whmkrhoqa8m8otzxvtn71...,6,9,1815.780273,80,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,0,"[{'name': 'TFT15_Bastion', 'num_units': 4, 'st...",...,3,0,2,11,15.22,9,10,0,0,1
3,NA1_5412752266,yja8q8aza0xq_s9hebroe4adddezonw0abvoc5gprzgmqw...,3,9,2032.849243,124,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,9,"[{'name': 'TFT15_Bastion', 'num_units': 2, 'st...",...,1,0,3,12,15.22,8,9,0,1,0
4,NA1_5412752266,oubcpr6kdwc2r4zz_msa6ftlulg14fqnrjj9pglbs2z4on...,5,8,1818.687256,84,2182.824219,Linux Version 15.22.724.5161 (Nov 05 2025/16:1...,41,"[{'name': 'TFT15_Bastion', 'num_units': 2, 'st...",...,1,0,3,13,15.22,7,8,0,0,1


# When looking at data, champions have in front of name TFT7_ or TFT7b_. 
# This indicates that they are from Set 7.

Set 7 and Set 15 are completly different versions. Our results will be invalid if both are used.
So, remove Set 7 data.

In [20]:
#When looking at data, champions have in front of name TFT7_ or TFT7b_. 
#This indicates that they are from Set 7.
#Set 7 and Set 15 are completly different versions. So we need to get rid of all Set 7.

def get_champ(row):
    if isinstance(row, list):
        return [u["character_id"] for u in row]
    return []

#Using field units and get_champions, make field raw_champions
dfRaw["raw_champions"] = dfRaw["units"].apply(get_champions)
set7_champs = (
    dfRaw["raw_champions"].explode().dropna().unique())

#Identify Set 7
set7_champs = [c for c in set7_champs if c.startswith(("TFT7_", "TFT7b_"))]

#Remove Set 7.
dfRaw = dfRaw[
    ~dfRaw["raw_champions"].apply(lambda lst: any(c in set7_champs for c in lst))
].copy()

In [21]:
#Now get rid of the TFT15_ in front of the champions names.
def clean_names(lst):
    return [c.replace("TFT15_", "").replace("tft15_", "") for c in lst]
dfRaw["champion_list"] = dfRaw["raw_champions"].apply(clean_names)

#check Champions
dfRaw["champion_list"].explode().dropna().unique()

array(['Aatrox', 'DrMundo', 'Vi', 'Udyr', 'Sett', 'Braum', 'leesin',
       'TwistedFate', 'Zyra', 'Kayle', 'Zac', 'Gangplank', 'Viego',
       'Ashe', 'JarvanIV', 'Galio', 'Ezreal', 'Garen', 'Rell', 'Rakan',
       'Caitlyn', 'Jayce', 'Neeko', 'Leona', 'Yuumi', 'Naafiri', 'Lux',
       'XinZhao', 'Samira', 'Volibear', 'Gwen', 'Syndra', 'Malzahar',
       'KSante', 'Lucian', 'Senna', 'Ryze', 'Karma', 'Yone', 'Kennen',
       'Ahri', 'Swain', 'smolder', 'Kalista', 'Kobuko', 'Shen', 'Jinx',
       'Poppy', 'Varus', 'Seraphine', 'Gnar', 'Xayah', 'Janna',
       'Katarina', 'Smolder', 'Sivir', 'Malphite', 'Ziggs', 'KogMaw',
       'Yasuo', 'rammus', 'Akali', 'Darius', 'Jhin', 'kogmaw', 'KaiSa',
       'Rammus', 'Ekko', 'lulu'], dtype=object)

# This is better, but notice some are in lowercase like lulu, rammus, kogmaw
# Need to address this, kogmaw and KogMaw might be treated as seperate entities when I perform other functions.

In [22]:
#These are the set 15 champions
#source: https://tftactics.gg/champions/
set15 = ['Aatrox','Ahri','Akali','Ashe',
                'Braum','Caitlyn','Darius','DrMundo',
                'Ezreal','Gangplank','Garen','Gnar','Gwen',
                'Janna','JarvanIV','Jayce','Jhin','Jinx',
                'Kalista','Karma','Katarina','Kayle','Kennen',
                'Kobuko','KogMaw','KSante','LeeSin','Leona','Lucian',
                'Lulu','Lux','Malphite','Malzahar','Naafiri','Neeko',
                'Poppy','Rakan','Rammus','Rell','Ryze','Samira','Senna',
                'Seraphine','Sett','Shen','Sivir','Smolder','Swain',
                'Syndra','TwistedFate','Udyr','Varus','Viego','Vi',
                'Volibear','Xayah','XinZhao','Yasuo','Yone','Yuumi',
                'Zac','Ziggs','Zyra']


mapping = {c.lower(): c for c in set15}

#normalize for cases
def norm(lst):
    cleaned = []
    for c in lst:
        if isinstance(c, str):
            key = c.lower()
            if key in mapping:
                cleaned.append(mapping[key])
    return cleaned

dfRaw["champion_list"] = dfRaw["champion_list"].apply(norm)

In [23]:
#Check the champions.
sorted(dfRaw["champion_list"].explode().dropna().unique())

['Aatrox',
 'Ahri',
 'Akali',
 'Ashe',
 'Braum',
 'Caitlyn',
 'Darius',
 'DrMundo',
 'Ezreal',
 'Gangplank',
 'Garen',
 'Gnar',
 'Gwen',
 'Janna',
 'JarvanIV',
 'Jayce',
 'Jhin',
 'Jinx',
 'KSante',
 'Kalista',
 'Karma',
 'Katarina',
 'Kayle',
 'Kennen',
 'Kobuko',
 'KogMaw',
 'LeeSin',
 'Leona',
 'Lucian',
 'Lulu',
 'Lux',
 'Malphite',
 'Malzahar',
 'Naafiri',
 'Neeko',
 'Poppy',
 'Rakan',
 'Rammus',
 'Rell',
 'Ryze',
 'Samira',
 'Senna',
 'Seraphine',
 'Sett',
 'Shen',
 'Sivir',
 'Smolder',
 'Swain',
 'Syndra',
 'TwistedFate',
 'Udyr',
 'Varus',
 'Vi',
 'Viego',
 'Volibear',
 'Xayah',
 'XinZhao',
 'Yasuo',
 'Yone',
 'Yuumi',
 'Zac',
 'Ziggs',
 'Zyra']

In [24]:
#See final shape
dfRaw.shape

(48692, 30)

In [25]:
#How many players are left?
dfRaw["puuid"].nunique()

6994

In [None]:
#How many matches are left?
dfRaw["puuid"].nunique()

In [None]:
#Save final dataframe as a csv file to data/processed-data
#This csv file will be used for EDA

#Set where csv file is going
out_dir = Path("../data/processed-data")
out_dir.mkdir(parents=True, exist_ok=True)
out_path = out_dir / "TFT_processed-data.csv"

#Export dataframe to CSV at location defined by out_dir
dfRaw.to_csv(out_path, index=False)