Importing the libraries needed for the project, and reading in the first dataset (boardgames_ranks.csv)

In [None]:
import pandas as pd
import requests
import matplotlib.pyplot as plt
import numpy as np
import xml.etree.ElementTree as ET
import time
from bs4 import BeautifulSoup
import re

df_board_game_rankings = pd.read_csv('.\\.venv\\data\\boardgames_ranks.csv')

Data Read-in & Cleaning
1) Reading in the same dataset under a different variable name to preserve the original.
2) Filtering out board games with an overall rank of 0, as we only want games with an actual ranking.
3) Creating dataframe df_bgr_top which constrains boardgames to those with an overall rank of 1 - 250.
4) Replacing NaN values in the DataFrame with '0'
5) Converting fields with numerical values to int using dictionary.
6) Rounding average fields to 2 places.


In [None]:
##1
df_bgr2 = pd.read_csv('.\\.venv\\data\\boardgames_ranks.csv')

##2
zerorank = df_bgr2[(df_bgr2['rank'] == 0)].index
df_bgr_nozero = pd.DataFrame(df_bgr2.drop(zerorank, inplace = True))

##3
df_bgr_top = pd.DataFrame(df_bgr2.loc[(df_bgr2['rank'] >=1) & (df_bgr2['rank'] <=250)])

##4
df_bgr_top.fillna(0, inplace=True)

##5
convert_dict = {'abstracts_rank': int,
                'cgs_rank': int,
                'childrensgames_rank': int,
                'familygames_rank': int,
                'partygames_rank': int,
                'strategygames_rank': int,
                'thematic_rank': int,
                'wargames_rank': int}
df_bgr_top = df_bgr_top.astype(convert_dict)

##6
df_bgr_top.bayesaverage = df_bgr2.bayesaverage.round(2)
df_bgr_top.average = df_bgr2.average.round(2)

Extracting Information from BGG API
1) Extracting list of 250 BGG_IDs to feed into BGG_XML_API_2
2) Iterating through all 250 BGG_IDs
    a. Pull XML for each of the 250 BGG_IDs, sleeping when the bgg_id%7 is equal to 0, to adhere to API usage terms.
    b. Parse XML for Mechanic Name and ID.
    c. Extract all mechanics associated with a game, and append to a list.
    d. Create pd.Series from list.
    e. Create pd.Dataframe from series.

***NOTE - This block will take some time to run.

In [None]:
##1
bg_ids = df_bgr_top[str('id')].values.tolist()

##2
API_base_string = 'https://boardgamegeek.com/xmlapi2/thing?id='
mech_id_ls = []
mech_name_ls = []
mech_bggid_ls = []

##2a.
for id in bg_ids:
    api_rec = requests.get(API_base_string + str(id))
    api_data = api_rec.content
    root = ET.fromstring(api_data)
    if id%7 == 0: 
      time.sleep(5)

##2b,2c.
      for item in root:
        for link in item.findall('link'):
          if(link.get('type') == 'boardgamemechanic'):
            mech_id_ls.append(link.get('id'))
            mech_name_ls.append(link.get('value'))
            mech_bggid_ls.append(item.get('id'))
    else:
      for item in root:
        for link in item.findall('link'):
           if(link.get('type') == 'boardgamemechanic'):
            mech_id_ls.append(link.get('id'))
            mech_name_ls.append(link.get('value'))
            mech_bggid_ls.append(item.get('id'))

##2d.       
mech_bggid_ser = pd.Series(mech_bggid_ls)
mech_id_ser = pd.Series(mech_id_ls)
mech_name_ser = pd.Series(mech_name_ls)

##2e.
mechv3_frame = {'BGG_ID': mech_bggid_ser, 'Mechanic_ID': mech_id_ser, 'Mechanic_Name': mech_name_ser}
df_mechv3 = pd.DataFrame(mechv3_frame)
df_mechv3

Merging Board Game Ranks dataframe with Mechanics DataFrame on bgg_id

1) Read in Designers with Location CSV file.
2) Creating new dataframe constrained by only the BGG_IDs existing in the top 250.

In [None]:
##1
df_bgdesigner_loc = pd.read_csv('.\\.venv\\data\\BGG_Designer_Location.csv')

##2
df_topbgdes_loc = pd.DataFrame(df_bgdesigner_loc.loc[df_bgdesigner_loc['BGG_ID'].isin(bg_ids)])
df_topbgdes_loc.reset_index(drop=True, inplace=True)