*created by Laura Brin and Jaskaran Singh for CMPT3830: Machine Learning Work Integrated Project at Norquest College*

# Board Game Ratings

### Board Game Insights 
The board game industry is a growing industry. Valued at 13.75 billion USD in 2021, this sector is expected to increase in value to 30.93 billion by 2028 (Globenewswire). There is an increasing trend among board game enthusiasts towards independent developers with word of mouth and kickstarter being the main way of learning about new games (PrintNinja). The ideal user of this application will be indie developers looking to improve their decisions around developing new games for the market. It could also be used by hobby stores to visualize trends to help make purchasing decisions. <br>

The product backlog for this project is a bit ambitious. We would like to provide multiple options for potential game developers or creators to interact with the models. The main output will be an estimation of rating based on provided information about a game the user is developing or looking to acquire. By experimenting with different game details the user can evaluate if there would be value in making changes to their game. We would also like to develop an interactive way to view time-series trends using ipywidgets. <br>

Additional features that would be great to introduce but will be outside of the time scope available include using keywords to return likelihood of kickstarter success and/or potential game rating and inputting game information and receiving a ranking of best months/quarters to launch the game in kickstarter. <br>

load libraries

In [None]:
import sqlite3 
import pandas as pd
from zipfile import ZipFile

importing datasets

In [None]:
#importing datasets for board games

zf = ZipFile('database.zip')
#extracted data is saved in the same directory as notebook
zf.extractall() 
zf.close()

conn = sqlite3.connect("database.sqlite")
cur = conn.cursor()
df_games= pd.read_sql_query("SELECT* FROM BoardGames",conn)
#conn.close()


#importing datasets for Kickstarter

zf = ZipFile('ks-projects-201801.zip')
#extracted data is saved in the same directory as notebook
zf.extractall() 
zf.close()

df_ks=pd.read_csv("ks-projects-201801.csv")

 ### Games Dataset-Laura

In [None]:
#grouping df_games columns to useable sections using game.id as key
details=["game.id","game.type","details.description","details.image","details.maxplayers","details.maxplaytime","details.minage",
"details.minplayers","details.minplaytime","details.name","details.playingtime","details.thumbnail","details.yearpublished"]
attributes=["game.id","attributes.boardgameartist","attributes.boardgamecategory","attributes.boardgamecompilation","attributes.boardgamedesigner","attributes.boardgameexpansion",
"attributes.boardgamefamily","attributes.boardgameimplementation","attributes.boardgameintegration","attributes.boardgamemechanic","attributes.boardgamepublisher","attributes.total","attributes.t.links.concat.2...."]
stats_family=["game.id","stats.family.abstracts.bayesaverage","stats.family.cgs.bayesaverage","stats.family.cgs.pos",
"stats.family.childrensgames.bayesaverage","stats.family.childrensgames.pos","stats.family.familygames.bayesaverage","stats.family.familygames.pos",
"stats.family.partygames.bayesaverage","stats.family.partygames.pos","stats.family.strategygames.bayesaverage","stats.family.strategygames.pos","stats.family.thematic.bayesaverage","stats.family.thematic.pos",
"stats.family.wargames.bayesaverage","stats.family.wargames.pos","stats.family.amiga.bayesaverage","stats.family.amiga.pos","stats.family.arcade.bayesaverage","stats.family.arcade.pos","stats.family.atarist.bayesaverage","stats.family.atarist.pos",
"stats.family.commodore64.bayesaverage","stats.family.commodore64.pos"]
stats=["game.id","stats.average","stats.averageweight","stats.bayesaverage","stats.median","stats.numcomments","stats.numweights","stats.owned","stats.stddev","stats.subtype.boardgame.bayesaverage",
"stats.subtype.boardgame.pos","stats.trading","stats.usersrated","stats.wanting","stats.wishing","stats.subtype.rpgitem.bayesaverage","stats.subtype.rpgitem.pos","stats.subtype.videogame.bayesaverage","stats.subtype.videogame.pos"]
polls=["game.id","polls.language_dependence","polls.suggested_numplayers.1","polls.suggested_numplayers.10","polls.suggested_numplayers.2","polls.suggested_numplayers.3","polls.suggested_numplayers.4",
"polls.suggested_numplayers.5","polls.suggested_numplayers.6","polls.suggested_numplayers.7","polls.suggested_numplayers.8","polls.suggested_numplayers.9","polls.suggested_numplayers.Over","polls.suggested_playerage"]

In [None]:
#creating smaller dataframes to work with
df_games_details=df_games[details]
df_games_attributes=df_games[attributes]
df_games_stats_family=df_games[stats_family]
df_games_stats=df_games[stats]
df_games_polls=df_games[polls]

#### Game Details features

**ID:** BGG item ID <br>
**Type:** Boardgame. BGG has reviews for other products outside of boardgames. this dataset was scrapped using boardgame as the key feature <br>
**Description:** description of the game on the site. description is sometimes supplied by the publisher <br>
**Image:** XML code of the jpeg document number. Becuase this dataset is sourced from a SQLite server, unstructured data like the jpeg did not migrate. <br>
**Max Players:** maximim number of players the game can suppport without <br>
**Max Playtime:** suggested/approximate maximum time for a single playthrough. Generally this represents the average  playtime and is not a hard cap on playtime<br>
**Min Age:** suggesting minimum age for players <br>
**Min players:** minimum required players <br>
**Min players:** suggested minimum time for a single playthrough <br>
**Name:** Name of the boardgame <br>
**Playingtime:** suggested playingtime <br>
**Thumbnail:** XML code of the jpeg document number. Becuase this dataset is sourced from a SQLite server, unstructured data like the jpeg did not migrate. <br>
**Year published:** year the game was published <br>

If a range of times for play is not given, the minplaytime,maxplaytime and playtime features all display the same value. In general most games have a single Min time listed, with the other 2 features reflecting this value.

In [None]:
df_games_details.head(10)

In [None]:
df_games_details.isnull().sum()

In [None]:
df_games_details.shape

In [None]:
df_games_details.dtypes

In [None]:
df_games_details.drop(["game.type","details.image","details.thumbnail"], axis=1)

#### Board Game Attributes 

**ID:** BGG item ID <br>
**Artist:** Name of board game box artist is available <br>
**Category:** board game category based on a site specific list<br>
**Compilation:** Text field indicates if game is a compilation of other games republished or a new special edition of a game. can also indicate an expansion <br>
**Designer:** Name of the board game designer<br>
**Expansion:** If the item is an expansion pack for a different game ID<br>
**Family:** A list of rough categories the game may fit into. sub-families of games with a wide range of options like country, cities, game, theme, creatures <br>
**Board Game Implementation:** Implementation is a designation around restoration and renaming of a game. Any games listed in this feature are either the original or new versions of this game ID (both implemented and reimplimented)<br>
**Intergration:**  Non-Null values indicate other games this item can be combined with for play<br>
**Game Mechanic:** List of game mechanics used, sourced from a site specific list of options. These fields are non-uique and can contain multiple mechanics seperated by commas<br>
**Publisher:** Name of game publisher<br>
**total:** total fields filled in in the full credits section of the game. range 1-13 (Primary Name, Alternate Names, Year Released, Designer, Solo Designer, Artist, Publishers, Developer, Graphic Designer, Sculptor, Editor, Writer, Insert Designer, Categories, Mechanisms, Family) <br>
**t.links.concat.2...:** Unclear definition of feature. Non-null values appear to be a copy of the publisher<br>

In [None]:
df_games_attributes.head(-5)

In [None]:
df_games_attributes.shape

In [None]:
df_games_attributes["attributes.t.links.concat.2...."].isnull().sum()

In [None]:
df_games_attributes["attributes.total"].unique()

#### Game Stat Family Features

The average and best characterization of game family are voted on by users on the BoardGameGeeks website using a poll for each boardgame. 

**ID:** BGG item ID <br>
**average:** scale on 1-10 with 10 being the highest. <br>
**average weight:** difficulty/complexity rating on a scale of 1 to 5. 1=Light, 2-Medium Light, 3=Medium, 4=Medium Heavy, 5=Heavy <br>
**bayesaverage:** "In order to prevent a new or rare game with only a few high ratings from taking the top spots in the ranking, 30 average ratings ... are added to every rating to form the Bayesian average. As more ratings are received, the effect of these "damper ratings" is reduced to nil." (Irving,2005) <br>

*Each of the below Game Families have 2 attributes: Bayes Average and pos (the ranking of that game in the family category. 1=highest rated for that game family) <br>*
**Abstracts:** Abrtract Strategy Games like Chess or Go. Games that minimize luck and do not rely on a theme. Games with no hidden information, non-deterninistic elements and usually two players or teams taking a finite number of alternating turns <br>
**CHS:** Customizable games like collectible card games, collectible dice games, collectible minature games, living card games and trading card games <br>
**Childrens Games:** best for younger kids with little complexity and some elements of chance. Typically there are fewer pieces and are themed towards kids <br>
**Family Games:** fun for kids and adults. Generally something easy to learn, not too long, and fun for mixed ages and abilities <br>
**Party Games:** games that encourage social interaction. Generally easy to set up with few rules, can accomodate large groups of players and provide lots of laughter<br>
**Strategy Games:** more complex games where player's decision-making skills have a high significance in determining the outcome. These games often require decision tree analysis or probabilistic estimation <br>
**Thematic Games:** games with a strong theme and emphasis on narrative. This type of game often features player to player direct conflict and has rules and mechanics that aim to depict the theme. Science fiction and fantasy themes are common <br>
**Wargames:** strategy games that deal with military operations. The can simulate historical, fantasy, near future, or science fiction themes and cover political and strategic choices <br>

**Game family descriptions are taken from BoardGameGeeks.com

Amiga, Arcade, Atarti ST and Commodore features will be omitted from this data, as we are looking at boardgames and they represent console categories

In [None]:
df_games_stats_family=df_games_stats_family.drop(["stats.family.amiga.bayesaverage","stats.family.amiga.pos","stats.family.arcade.bayesaverage","stats.family.arcade.pos","stats.family.atarist.bayesaverage","stats.family.atarist.pos",
"stats.family.commodore64.bayesaverage","stats.family.commodore64.pos"], axis=1)

In [None]:
df_games_stats_family.describe()

###  Kickstarter dataset-Jaskaran

In [None]:
df_ks.head(10)

### References

Board Game Industry Statistics (n.d.) PrintNinja. (https://printninja.com/board-game-industry-statistics/)

SkyQuest Technology (2022, July 19) Board Games Market to Attain Value of $30.93 Billion By 2028 Thanks to Increased Popularity of Online Gaming and Entry of New OTT platforms In Board Gaming. Globe News Wire. (https://www.globenewswire.com/news-release/2022/07/19/2482068/0/en/Board-Games-Market-to-Attain-Value-of-30-93-Billion-By-2028-Thanks-to-Increased-Popularity-of-Online-Gaming-and-Entry-of-New-OTT-platforms-In-Board-Gaming.html)

https://www.gamedesignworkshop.com/understanding-the-tabletop-game-industry

Irving, R. (2005,Jul 10). Re: What is Baysian Average? [Discussion post]. BoardGameGeek Community Forum
