## Board game recommendation engine for new users
### Matrix factorization with ALS
### Method 1: search for similar games

#### John Burt


### Purpose of this notebook:

Implement a board game recommender using a game rating dataset from the boardgamegeek.com website. The recommender scenerio is that a new user (not in the dataset) is seeking recommendations for board games they haven't tried before. They are asked to enter any number of games they like and games they don't like.

Note: This method only works for the case where only one example of a liked game is given.

#### The method:

- Load data into a pandas dataframe from provided csv files.

- Run the filled matrix through Singlar Value Decomposition (SVD) to generate N feature dimensions that describe each game. The result is a set of N dimensional coordinates for each game.

- The recommender takes a user specified game title and uses the SVD game coordinates to select the games that are nearest neighbors as recommendations.

- This process is repeated with several games the user likes, then the most popular recommendations are chosen for the final recommendation.

- The model also searches for recommended games from a list of disliked games. Any game recommended from disliked is excluded from the final recommendations.


## load data from csv file

- Set up plot environment.
- boardgame rating data from csv into pandas dataframe


In [1]:
# remove warnings
import warnings
warnings.filterwarnings('ignore')
# ---

%matplotlib inline
import pandas as pd
pd.options.display.max_columns = 100
from matplotlib import pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
import numpy as np

from datetime import datetime

pd.options.display.max_rows = 100

srcdir = './data/'

# load the boardgame by user ratings matrix data
# note: the matrix was pre-filled with estimated ratings using 
#       Alternating Least Squares method in another notebook.
df_mxfilled = pd.read_csv(srcdir+'mx_items_filled_minr=10.csv')
df_mxfilled = df_mxfilled.reset_index()

# load the boardgame title data
titledata = pd.read_csv(srcdir+'boardgame-titles.csv')

# rename the gameID column
titledata=titledata.rename(columns = {"boardgamegeek.com game ID":'gameID'})
titledata.reset_index(inplace=True)

print('#games X #users:', df_mxfilled.shape)
print('number of titles =', titledata.shape[0])

#games X #users: (402, 108396)
number of titles = 402


## Compute the Truncated SVD. 

This proces will result in a set of feature coordinates for each game, so that games that are "similar" are closer. 

In [2]:
from sklearn.decomposition import PCA, SparsePCA, KernelPCA, TruncatedSVD, NMF

# number of dimensions for analysis
numdims = 5

# coords = TruncatedSVD(n_components=numdims).fit_transform(df_mxfilled)
# select all columns after 'gameID'
# coords = TruncatedSVD(n_components=numdims).fit_transform(
#     df_mxfilled.iloc[:,np.argwhere(df_mxfilled.columns=='gameID').flatten()[0]+1:])

# new method: the PCA uses SVD, but also normalizes each feature
coords = PCA(n_components=numdims, whiten=True).fit_transform(
    df_mxfilled.iloc[:,np.argwhere(df_mxfilled.columns=='gameID').flatten()[0]+1:])

print('coords: #games X #features:',coords.shape)

coords: #games X #features: (402, 5)


### Functions to search for nearest neighbor games in SVD feature space

In [3]:
from scipy.spatial.distance import cdist

def find_nearest_neighbors(coords, x, numnearest):
    """Brute force NN search"""
    
    # get euclidean distances of all points to x    
    dists = cdist(np.reshape(x,(1,-1)),coords) 
    
    # sort the distances
    ind, = np.argsort(dists)

    # return the numnearest nearest neighbors
    return ind[:numnearest]
    
def recommend_games_one(targettitle, gametitles, coords, num2rec=1):
    """Recommend games based on NN to one game title"""
    
    # get coords of target title
    targetcoord = coords[gametitles==targettitle,:]
    
    # find nearest neighbors
    ind = find_nearest_neighbors(coords, targetcoord, num2rec+1)
    
    # Note: first entry will be the target title (distance 0)
    return ind[1:]

def recommend_games_prefs_sets(pref, gametitles, coords, num2rec=1):
    """Recommend games using multiple liked and disliked games.
       This method creates a set of recommended games for each title in prefs and 
         then uses set functions to select the most commonly preferred,
         and then exclude any recs based on disliked games"""
    
    recs = []
    for title in pref['like']:
        recs.extend(recommend_games_one(title, gametitles, coords, num2rec*2))
    unique, counts = np.unique(recs, return_counts=True)
    recs = (np.array([unique,counts])[0,np.argsort(-counts)].T)
    
    norecs = []
    for title in pref['dislike']:
        norecs.extend(recommend_games_one(title, gametitles, coords, num2rec*2))
    norecs = list(np.unique(norecs))
    
    allrecs = []
    for r in recs:
        if ~any(r == norecs):
            allrecs.append(r)
            
    return allrecs[:num2rec]


### Test out the recommender algorithm with a list of board games 

Note: this recommender uses two kinds of input: games liked, and games disliked.


In [4]:
gameprefs = [
    {'type':'heavy eurogames',
    'like':['Agricola','Caverna: The Cave Farmers','Terraforming Mars'],
    'dislike':['Catan','Apples to Apples','Love Letter','Mice and Mystics','Zombicide']},
    
    {'type':'light eurogames',
    'like':['Ticket to Ride','Carcassonne','Catan'],
    'dislike':['Monopoly','Apples to Apples','Love Letter','Zombicide']},
    
    {'type':'thematic',
    'like':['Mice and Mystics','Zombicide', 'Eldritch Horror', 'Star Wars: Rebellion', 'Dead of Winter: A Crossroads Game'],
    'dislike':['Terraforming Mars','Agricola','Catan','Apples to Apples','Love Letter']},

    {'type':'light/party',
    'like':['Apples to Apples','Love Letter','Codenames', 'Dixit', 'One Night Ultimate Werewolf'],
    'dislike':['Mice and Mystics','Terraforming Mars','Agricola','Zombicide']},
    ]

# number of recommended games to present
num2recommend = 5

# get array indices of user perference titles
# title_by_title = titledata.set_index('title')
# get game titles from titledata
# gametitles = titledata.title[df_mxfilled.index].values
titles_by_gameid = titledata.set_index('gameID')
gametitles = titles_by_gameid.title.loc[df_mxfilled['gameID']].values

# loop through test user game preferences
for pref in gameprefs:
    recs = recommend_games_prefs_sets(pref, gametitles, coords, num2rec=num2recommend)
    print('\npref: %s, If you like %s, you should try:\n  %s\n' % 
      (pref['type'],', '.join(pref['like']), '\n  '.join(gametitles[recs])))
    


pref: heavy eurogames, If you like Agricola, Caverna: The Cave Farmers, Terraforming Mars, you should try:
  Scythe
  Terra Mystica
  Tichu
  Mombasa
  The Voyages of Marco Polo


pref: light eurogames, If you like Ticket to Ride, Carcassonne, Catan, you should try:
  Splendor
  Sushi Go!
  Carcassonne: Hunters and Gatherers
  Telestrations
  Ticket to Ride: Nordic Countries


pref: thematic, If you like Mice and Mystics, Zombicide, Eldritch Horror, Star Wars: Rebellion, Dead of Winter: A Crossroads Game, you should try:
  Star Wars: X-Wing Miniatures Game
  Memoir '44
  Sentinels of the Multiverse
  Descent: Journeys in the Dark (Second Edition)
  Elder Sign


pref: light/party, If you like Apples to Apples, Love Letter, Codenames, Dixit, One Night Ultimate Werewolf, you should try:
  Biblios
  Port Royal
  The Resistance: Avalon
  Dixit Quest
  Mysterium

