# <center>Analysis of World of Warcraft Population Data</center>
### <center>Data obtained from [realmpop.com](https://realmpop.com/)</center>

This blog post uses names and images from World of Warcraft and data proprietary to World of Warcraft. World of Warcraft, Warcraft and Blizzard Entertainment are trademarks or registered trademarks of Blizzard Entertainment, Inc. in the U.S. and/or other countries.

Welcome back to the second in my series of notebooks detailing the analysis of World of Warcraft population data. In our first [notebook](https://github.com/ereidelbach/wow/blob/master/warcraft1_data_collection.ipynb) we walked through the process of obtaining the data. In this notebook, we're going to dive into the data and reshape it to fit the format we want to simplify any future analysis we want to run moving forward. 

Let the fun begin!

# 1. Data Format (Recap)

In our last notebook, we did a brief dive into the data and discovered that it is heavily nested. Let's refresh ourselves with the actual structure of the data before we go any further.

In [3]:
# Import the necessary packages
from bs4 import BeautifulSoup
import json
import os
import pandas as pd
import requests
import time

In [8]:
# create a list of all realm files in the US folder
DIR = 'data/realms-us'
list_realms_us = sorted([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])
list_realms_us[-5:]

['us-ysera_08-15-2018-15:46.json',
 'us-ysondre_08-15-2018-15:46.json',
 'us-zangarmarsh_08-15-2018-15:46.json',
 'us-zuljin_08-15-2018-15:46.json',
 'us-zuluhed_08-15-2018-15:46.json']

My alliance characters just happen to be on Ysera so let's use that file for our initial analysis.

In [10]:
with open('data/realms-us/us-ysera_08-15-2018-15:46.json', 'r') as f:
    json_ysera = json.load(f)

As we said, the realm data is heavily nested so we'll save space (and memory) by avoiding outputting any of that data here.  However, the general structure of the realm data that we're interested in is as follows:
> Characters
> > Gender
> > > Class
> > > > Race
> > > > > Level
> > > > > > CharacterName

# 2. Data Transformation

That heavily nested format is less than ideal so why don't we get to work unraveling that format and working into a more traditional (i.e. Pandas DataFrame) format. My initial plan of attack will be to start working through the data, level by level via a series of 'for' loops. Once we reach the final level of the dictionary, we can then start adding the data to a Pands DataFrame. In essence, we're going to treat the deepest level of the dictionary (`CharacterName` as a single row in the DataFrame. The higher-level information above this row (i.e. `Gender`, `Class`, `Race`, `Level`) will become their own columns in the DataFrame and will included in each row.

I don't expect this to be the most efficient method possible, but it will get us started with our analysis and we can always revise the efficiency aspect at a later point in time if it proves to be a significant issue.

In [14]:
# Extract the characters dictionary from the Ysera data as that's our focus (we only want player data)
json_realm = json_ysera['characters']

In [16]:
# create a dictionary for storing our transformed realm data
dict_realm = {}

# Loop through every sub-dictionary, within the dictionary until we get to the final level
# Once there, create an entry in dict_realm using the character name as the key
for gender_key, gender_value in json_realm.items():
    # Handle `Class`
    for class_key, class_value in gender_value.items():
        # Handle `Race`
        for race_key, race_value in class_value.items():
            # Handle `Level`
            for level_key, level_value in race_value.items():
                # Handle `CharacterName` 
                for char in level_value:
                    # Create dict entry for each character
                    dict_realm[char] = {
                            'gender':gender_key
                            ,'class':class_key
                            ,'race':race_key
                            ,'level':level_key
                            }

Let's double check the formatting by slicing the newly created dictionary using itertools `islice` method to output the first 5 characters in our newly transformed dictionary.

In [17]:
import itertools
dict(itertools.islice(dict_realm.items(), 5))

{'Veronika': {'class': 'Death Knight',
  'gender': 'Female',
  'level': '100',
  'race': 'Blood Elf'},
 'Veylia': {'class': 'Death Knight',
  'gender': 'Female',
  'level': '100',
  'race': 'Blood Elf'},
 'Vlána': {'class': 'Death Knight',
  'gender': 'Female',
  'level': '100',
  'race': 'Blood Elf'},
 'Vondutchy': {'class': 'Death Knight',
  'gender': 'Female',
  'level': '100',
  'race': 'Blood Elf'},
 'Vôrtex': {'class': 'Death Knight',
  'gender': 'Female',
  'level': '100',
  'race': 'Blood Elf'}}

Perfect! To be on the safe side, let's check one more time using one of my characters.  As I'm very familiar with them, I know exactly what values to expect which will help double check my work.

We'll look up data for my level 110 Pandaren Alliance Priest named Magoosh:

In [18]:
dict_realm['Magoosh']

{'class': 'Priest', 'gender': 'Male', 'level': '110', 'race': 'PandarenA'}

Everything looks great so let's move on to the next step. (Note: Pandaren can be found on the Alliance or the Horde, hence the appearance of the "A" after the word Pandaren to indicate this in an alliance character).

Now that I think of it, this data is missing the character's faction (Alliance or Horde).  Why don't we correct this by adding that variable to our data.

# 3. Feature Engineering

In the newest expansion (as of 16 August, 2018), the following races belong to the Horde:
- Blood Elf
- Orc
- Tauren
- Undead
- Troll
- Goblin
- PandarenH
- Nightborne (Not included in RealmPop data)
- Highmountain Tauren (Not included in RealmPop data)

The following races belong to the Alliance:
- Human
- Night Elf
- Draenei
- Worgen
- Dwarf
- Gnome 
- PandarenA
- Lightforged Draenei (Not included in RealmPop data)
- Void Elf (Not included in RealmPop data)

As Pandaren players don't officially choose their faction until they finish their first few hours of gameplay, it's possible they will not have an initial faction in the data which we will indicate as "Unknown."

In [19]:
## Insert a player's faction into the data based on race

# Create lists containing the races within each faction
list_faction_horde = ['Blood Elf', 'Orc', 'Tauren', 'Undead', 'Troll', 'Goblin',
              'PandarenH', 'Nightborne', 'Highmountain Tauren']
list_faction_alliance = ['Human', 'Night Elf', 'Draenei', 'Worgen', 'Dwarf',
                 'Gnome', 'PandarenA', 'Lightforged Draenei', 'Void Elf']

# Insert the faction into each character's data
for name, data in dict_realm.items():
    if data['race'] in list_faction_horde:
        data['faction'] = 'Horde'
    elif data['race'] in list_faction_alliance:
        data['faction'] = 'Alliance'
    else:
        data['faction'] = 'Unknown'
    dict_realm[name] = data

In [22]:
# Double check my character `Magoosh` to make sure it properly lists his faction as Alliance
dict_realm['Magoosh']

{'class': 'Priest',
 'faction': 'Alliance',
 'gender': 'Male',
 'level': '110',
 'race': 'PandarenA'}

Another popular way to group characters in the World of Warcraft is based on the role they play in groups (DPS, Healer or Tank) as well as armor type (Cloth, Leather, Mail, or Plate). For the purposes of fleshing out this data set a little further, let's introduce both variables to the data.

Unfortunately for role types, things are a little complicated. All classes can DPS, some classes can fufill two roles (i.e. a Shaman can DPS and Heal) and some have the capability to fill all three (i.e. a Druid can DPS, Heal and Tank). To counter this, we'll insert two binary variables that indicate whether a class is capable of fulfilling the specialized roles of healing or tanking. Since all classes can DPS, we'll assume they have this ability by default and thus, don't need to track it.

In [23]:
# Create a list for the healing and tanking roles that contains all classes capable of fulfilling those roles in a group
list_role_heal = ['Druid', 'Monk', 'Paladin', 'Priest', 'Shaman']
list_role_tank = ['Death Knight', 'Demon Hunter', 'Druid', 'Monk', 'Paladin', 'Warrior']

for name, data in dict_realm.items():
    if data['class'] in list_role_heal:
        data['healer'] = 1
    else:
        data['healer'] = 0
    
    if data['class'] in list_role_tank:
        data['tank'] = 1
    else:
        data['tank'] = 0
    dict_realm[name] = data

In [24]:
# Double check my character `Magoosh` to make sure it properly lists his Tank role as a 0 and his Healer role as a 1
dict_realm['Magoosh']

{'class': 'Priest',
 'faction': 'Alliance',
 'gender': 'Male',
 'healer': 1,
 'level': '110',
 'race': 'PandarenA',
 'tank': 0}

# 4. Function Creation

The code that we've created above can be combined and functionalized so that we can replicate this process for every server, not just Ysera. Let's go ahead and do that now so that we can examine all servers moving forward.

In [29]:
# Create lists containing the races within each faction
list_horde = ['Blood Elf', 'Orc', 'Tauren', 'Undead', 'Troll', 'Goblin',
              'PandarenH', 'Nightborne', 'Highmountain Tauren']
list_alliance = ['Human', 'Night Elf', 'Draenei', 'Worgen', 'Dwarf',
                 'Gnome', 'PandarenA', 'Lightforged Draenei', 'Void Elf']

# Create a list for classes that can fulfill the healing and tanking roles in groups
list_role_heal = ['Druid', 'Monk', 'Paladin', 'Priest', 'Shaman']
list_role_tank = ['Death Knight', 'Demon Hunter', 'Druid', 'Monk', 'Paladin', 'Warrior']

'''
    This function will ingest realm data, flatten it, and add data for a 
    character's faction and role types (i.e. healer and/or tank).
    
    Input:  realm_data (dictionary) - Original, nested realm data
    Output: dict_realm (dictionary) - Flattened realm data with new variables
'''
def transform_realm(realm_data):
    # Create a new dictionary for storing flattened data
    dict_realm = {}
    # Handle `Gender`
    for gender_key, gender_value in realm_data['characters'].items():
        # Handle `Class`
        for class_key, class_value in gender_value.items():
            # Handle `Race`
            for race_key, race_value in class_value.items():
                # Handle `Level`
                for level_key, level_value in race_value.items():
                    # Handle `CharacterName` 
                    for char in level_value:
                        # Create dict entry for each character
                        dict_char = {
                                'gender':gender_key
                                ,'class':class_key
                                ,'race':race_key
                                ,'level':level_key
                                }
                        
                        # Insert the faction into each character's data
                        if dict_char['race'] in list_horde:
                            dict_char['faction'] = 'Horde'
                        elif dict_char['race'] in list_alliance:
                            dict_char['faction'] = 'Alliance'
                        else:
                            dict_char['faction'] = 'Unknown'
                        
                        # Insert the Healer role into each character's data
                        if dict_char['class'] in list_role_heal:
                            dict_char['healer'] = 1
                        else:
                            dict_char['healer'] = 0

                        # Insert the Tank role into each character's data
                        if dict_char['class'] in list_role_tank:
                            dict_char['tank'] = 1
                        else:
                            dict_char['tank'] = 0                        
                        
                        # Insert the character into the realm dictionary
                        dict_realm[char] = dict_char
    return dict_realm

In [30]:
dict_ysera = transform_realm(json_ysera)

In [31]:
# Double check my character `Magoosh` to make sure the function operates properly
dict_ysera['Magoosh']

{'class': 'Priest',
 'faction': 'Alliance',
 'gender': 'Male',
 'healer': 1,
 'level': '110',
 'race': 'PandarenA',
 'tank': 0}

**Success!**

With our function working properly, we are now capable of flattening data for any (or all) realm(s) that we're interested in analyzing. In our next notebook, we'll begin that process of uncoverning the secrets that lie inside the World of Warcraft player population.

Until next time...

<img src="https://image-cdn.neatoshop.com/styleimg/47191/none/black/default/310622-19;1466158386i.jpg" height='300' width='300'/>