# <center>Analysis of World of Warcraft Population Data</center>
### <center>Data obtained from [realmpop.com](https://realmpop.com/)</center>

This blog post uses names and images from World of Warcraft and data proprietary to World of Warcraft. World of Warcraft, Warcraft and Blizzard Entertainment are trademarks or registered trademarks of Blizzard Entertainment, Inc. in the U.S. and/or other countries.

With the recent launch of World of Warcraft's (WoW) 7th expansion on August 13th, 2018, I personally witnessed a large number of friends returning to Azeroth. While it's been wonderful to see my chat window once again filled with the comforting green text of guild chat, this influx of players caused my mind to drift toward the ramifications for WoW's player base as a whole. 

Several years ago Blizzard made the decision to no longer report overall player subscription numbers as part of their quarterly earning reports. Since that time, investors and players alike have been in the dark concerning the overall player trends. Fortunately, the intrepid folks at [https://realmpop.com](Realm Pop) have done their best to keep shining a light on the WoW player population. 

Their methods are rather simple, as detailed on their website. In summary, the Realm Pop team records character information for every realm in the US and EU regions by identifying character who post on the auction house, then fetching that person's guild roster, and then recording all players in that guild.  Thus, the final recording includes a count of all characters who belong to a guild in which a member has posted an auction. It is expected that this will cover the vast majority of characters, but certainly not all as some players may not utilize the auction house or belong to a guild.  It's also important to note that these numbers are reflective of individual characters and not individual accounts (i.e. a single account may have several characters) nor are they reflective of active accounts/characters (i.e. accounts with an active subscription or characters that are actively played).

As this information is updated once a week, I will aspire to maintain that update frequency with this blog to identify trends and/or significant changes in the data.

## 1. Data Collection
While Realm Pop collects its data from the Battle.net API, we'll keep things simple and simply collect the data that Realm Pop has stored for each realm. While that sounds simple in principle, we first need to understand how exactly the data is currently being stored on the site.

Making use of Chrome's "Inspect" devtool, we dig through both the US and the EU pages and discover that two JSON files exist for each region:  a traditional realm-by-realm JSON and a separate JSON for connected-realms. 
* Note: For those of you unfamiliar with the "Inspect" tool or curious about how to find the data powering a website, please refer to this hand tutorial from the [Online Journalism Blog](https://onlinejournalismblog.com/2017/05/10/how-to-find-data-behind-chart-map-using-inspector/).

To be on the safe side, we'll go ahead and download both JSONs for each region to ensure we have all the data possible. While I plan to start analyzing data on a realm-by-realm basis, the fact that a large number of realms are now connected to each other means that my analysis may inevitable shift in that direction.

Having identified the data we want, let's go and get it.

In [75]:
# Import the necessary packages
from bs4 import BeautifulSoup
import json
import os
import pandas as pd
import requests
import time

In [25]:
### Collect the realm data for each region (US and EU) and the information for all connected realms across both regions

# Make the request to the website and soupify the data
realms_eu = json.loads(requests.get('https://realmpop.com/eu.json').text)
realms_us = json.loads(requests.get('https://realmpop.com/us.json').text)
realms_connected = json.loads(requests.get('https://realmpop.com/connected-realms.json').text)

In [28]:
### Save these files to our local project folder for safe keeping
# EU Realms
with open('data/realms_eu.json', 'wt') as out:
    json.dump(realms_eu, out, sort_keys = True)
    
# US Realms
with open('data/realms_us.json', 'wt') as out:
    json.dump(realms_us, out, sort_keys = True)

# Connected Realms
with open('data/realms_connected.json', 'wt') as out:
    json.dump(realms_connected, out, sort_keys = True)
    # pretty print version
    #json.dump(file, out, sort_keys=True, indent=4, separators=(',', ': '))

Now that we some data, let's see what's hidden inside each of these JSON files.

In [30]:
realms_eu.keys()

dict_keys(['realms', 'demographics'])

Let's dig into to the realm data a little further.

In [37]:
list(realms_eu['realms'].items())[0:2]

[('aegwynn',
  {'counts': {'Alliance': 477283,
    'Horde': 24287,
    'Neutral': 6,
    'Unknown': 4732},
   'name': 'Aegwynn',
   'stats': {'pvp': 'PvP',
    'region': 'German',
    'rp': 'Normal',
    'timezone': 'Europe/Paris'}}),
 ('aerie-peak',
  {'counts': {'Alliance': 94887, 'Horde': 40402, 'Neutral': 3, 'Unknown': 955},
   'name': 'Aerie Peak',
   'stats': {'pvp': 'PvE',
    'region': 'English',
    'rp': 'Normal',
    'timezone': 'Europe/Paris'}})]

The demographics portion of the realm data is much more robust (and nested) so we'll save space (and memory) by not outputting any of that data here.  However, the general structure of the data is as follows:
> Region
> > PvE or PvP
> > > Normal or RP
> > > > Country 
> > > > > Time Zone 
> > > > > > Gender
> > > > > > > Class
> > > > > > > > Race
> > > > > > > > > Level
> > > > > > > > > > PlayerCount

To enable a detailed collection of all realm data, we need to isolate the name of every realm and then query the realmpop site to obtain the JSON file (format: server.json).

In [46]:
## Extract a complete realm list for both regions
list_realms_eu = ['eu-' + realm for realm in list(realms_eu['realms'].keys())]
list_realms_us = ['us-' + realm for realm in list(realms_us['realms'].keys())]

## Extract a complete list of connected realms for both regions
# Simplify the connected realms such that there is only one listing per realm
list_connected_eu = []
list_connected_us = []

for key, value in realms_connected.items():
    # handle EU realms
    if key.startswith('eu-'):
        temp_list = [key] + value
        temp_list.sort()
        if temp_list not in list_connected_eu:
            list_connected_eu.append(temp_list)
    # handle US realms    
    else:
        temp_list = [key] + value
        temp_list.sort()
        if temp_list not in list_connected_us:
            list_connected_us.append(temp_list)

Let's double-check that our code script worked.

In [50]:
list_realms_eu[0:5]

['eu-aegwynn',
 'eu-aerie-peak',
 'eu-agamaggan',
 'eu-aggra-portugues',
 'eu-aggramar']

In [51]:
list_realms_us[0:5]

['us-aegwynn', 'us-aerie-peak', 'us-agamaggan', 'us-aggramar', 'us-akama']

In [52]:
list_connected_eu[0:5]

[['eu-aerie-peak', 'eu-bronzebeard'],
 ['eu-agamaggan',
  'eu-bloodscalp',
  'eu-crushridge',
  'eu-emeriss',
  'eu-hakkar',
  'eu-twilights-hammer'],
 ['eu-aggra-portugues', 'eu-grim-batol'],
 ['eu-aggramar', 'eu-hellscream'],
 ['eu-ahnqiraj',
  'eu-balnazzar',
  'eu-boulderfist',
  'eu-chromaggus',
  'eu-daggerspine',
  'eu-laughing-skull',
  'eu-shattered-halls',
  'eu-sunstrider',
  'eu-talnivarr',
  'eu-trollbane']]

In [53]:
list_connected_us[0:5]

[['us-aegwynn',
  'us-bonechewer',
  'us-daggerspine',
  'us-gurubashi',
  'us-hakkar'],
 ['us-agamaggan',
  'us-archimonde',
  'us-burning-legion',
  'us-jaedenar',
  'us-the-underbog'],
 ['us-aggramar', 'us-fizzcrank'],
 ['us-akama', 'us-dragonmaw', 'us-mugthol'],
 ['us-alexstrasza', 'us-terokkar']]

Everything looks great!

Now that we have finalized realm lists for each region, let's go ahead and acquire the data for each realm.  

In [72]:
# Iterate through every US realm
for realm in list_realms_us:
    # Define the URL and request the JSON
    realm_url = 'https://realmpop.com/' + str(realm) + '.json'
    realm_json = json.loads(requests.get(realm_url).text)
    # Define the filename and write the JSON to a local file with a timestamp
    fname = 'data/realms-us/' + realm + '_' + str(time.strftime("%m-%d-%Y-%H:%M")) + '.json'
    with open(fname, 'wt') as out:
        json.dump(realm_json, out, sort_keys = True)

In [71]:
# Iterate through every EU realm
for realm in list_realms_eu:
    # Define the URL and request the JSON
    realm_url = 'https://realmpop.com/' + str(realm) + '.json'
    realm_json = json.loads(requests.get(realm_url).text)
    # Define the filename and write the JSON to a local file with a timestamp
    fname = 'data/realms-eu/' + realm + '_' + str(time.strftime("%m-%d-%Y-%H:%M")) + '.json'
    with open(fname, 'wt') as out:
        json.dump(realm_json, out, sort_keys = True)

Let's finish things up my quickly counting the number of JSON files we've retrieved for both the US and EU regions.  The counts should match the size of our region lists, so let's get to it!

In [95]:
# Check EU Realms
DIR = 'data/realms-eu'
len(list_realms_eu) == len([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])

True

In [96]:
# Check US Realms
DIR = 'data/realms-us'
len(list_realms_us) == len([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])

True

Perfect, we now have data for all 247 US realms and 269 EU realms to go along with a complete list of all combined realms in both regions. This will set us up perfectly for our next notebook when we start exploring the data in more detail. Stay tuned!

<img src="https://image-cdn.neatoshop.com/styleimg/47191/none/black/default/310622-19;1466158386i.jpg" height='300' width='300'/>