## Board game recommendation engine using SIngle Value Decomposition
### with unrated game cells filled using imputer

#### John Burt

#### Note: code for this notebook was copied from
- JMB_recommendation_engine_SVD_imputer_vs_ALS_user_rows_v3
- JMB_recommendation_engine_method_3.1_v3_MB-ALS


### Purpose of this notebook:

Implement a board game recommender using a game rating dataset from the boardgamegeek.com website. 

#### The method:

- Load data into a pandas dataframe from provided csv files.

- Use pivot to convert the data into a games(rows) X users(cols) rating matrix, with NaNs where users haven't rated games (majority of cells).

- Drop users who rated too few games, or gave outlier ratings.

- Replace all unrated games (NaNs) with estimated ratings values using Alternating Least Squares (ALS). 

- Run the filled matrix through Single Value Decomposition (SVD) to generate N feature dimensions that describe each game. The result is a set of N dimensional coordinates for each game.

- The recommender takes a user specified game title and uses the SVD coordinates to present the games that are nearest neighbors as recommendations.


## load data from csv file

- Set up plot environment.
- boardgame rating data from csv into pandas dataframe


In [1]:
# remove warnings
import warnings
warnings.filterwarnings('ignore')
# ---

%matplotlib inline
import pandas as pd
pd.options.display.max_columns = 100
from matplotlib import pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
import numpy as np

from datetime import datetime

pd.options.display.max_rows = 100

# load the boardgame user data
#userdata = pd.read_csv('boardgame-users.csv') # NOTE: ALS can take a LONG time to process this many users!
#userdata = pd.read_csv('boardgame-elite-users.csv')
userdata = pd.read_csv('boardgame-frequent-users.csv')

# rename the userID column
userdata=userdata.rename(columns = {"Compiled from boardgamegeek.com by Matt Borthwick":'userID'})

# load the boardgame title data
titledata = pd.read_csv('boardgame-titles.csv')

# rename the gameID column
titledata=titledata.rename(columns = {"boardgamegeek.com game ID":'gameID'})

# for titledata set game ID as the index
titledata = titledata.set_index("gameID")

#print(userdata.head())
#print("\n", titledata.head())


Pivot user rating data, creating users (row) x gameID (col) x rating

In [2]:
# pivot the user data to create rows of games, with columns of users. 
# If a user rated a game, it will be at game x user and if not, then the cell will be NAN
#rp = userdata.pivot(index="userID", columns="gameID", values="rating")
rp = userdata.pivot(index="gameID", columns="userID", values="rating")
#rp.head()

## Do some filtering.

The purpose of this filtering is to 1) reduce the dataset size but keep the most ratings, and 2) remove potentially malicious and outlier users (people who rate everything a narrow range of values, or rate everything low).

Drop users:
- Who have rated < threshold # games
- Whose scores have range < threshold
- Whose score max < threshold

Note: this filtering is mostly only useful for the all users data, which has a lot of users who rate few games or only rate high or low, with no range of preference. It has very little effect on the other data sets.

In [3]:
mincount = 10 # min num ratings threshold
minrange = 1 # min rating range threshold
minmax = 7 # max rating min threshold

# number of ratings by each user
usercounts = np.count_nonzero(~np.isnan(rp.values),0)

# drop users with fewer than mincount ratings
rp_filt = rp.drop(rp.columns[usercounts<mincount], axis=1)

print("dropped %d < %d ratings"%(rp.shape[1]-rp_filt.shape[1], mincount))

# rating range for each user
userrange = rp_filt.max(axis=0) - rp_filt.min(axis=0)

oldnumusers = rp_filt.shape[1]

# drop users with rating range less than minrange
rp_filt = rp_filt.drop(rp_filt.columns[userrange<minrange], axis=1)

print("dropped %d < %d rating range"%(oldnumusers-rp_filt.shape[1], minrange))

# max rating range for each user
usermax = rp_filt.max(axis=0)

oldnumusers = rp_filt.shape[1]

# drop users with rating max less than minmax
rp_filt = rp_filt.drop(rp_filt.columns[usermax<minmax], axis=1)

print("dropped %d max rating < %d"%(oldnumusers-rp_filt.shape[1],minmax))

print("\ntotal #users now = %d"%(rp_filt.shape[1]))

dropped 0 < 10 ratings
dropped 2 < 1 rating range
dropped 0 max rating < 7

total #users now = 2471


Range user scores between 1 - 10.

This ensures that all users have the same ratings range. This seems to help the SVD produce more meaningful feature dimensions.

In [4]:
userratingmax = rp_filt.max(axis=0)
userratingmin = rp_filt.min(axis=0)
rp_fixed = 9 * (rp_filt - userratingmin) / (userratingmax-userratingmin) + 1

## Alternating Least Squares method

Implement a weighted Alternating Least Squares algorithm. The method attempts to solve for a specified number of missing factors that influence users to rate the games as they do, then uses the factors to guess how users would rate all of the other games they haven't rated. For more on this method, see the link below.

I got the idea to use this from:
https://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/


In [6]:
# Matt Borthwick's implementation of Alternating Least Squares

from numpy import eye
from numpy.linalg import solve
from numpy.random import rand
 
# def do_ALS(Q, n_iterations=10, lambda_=0.1, n_factors=100, weighted=True, verbose=True ):
def do_ALS_MB(ratings, n_factors=4, n_iterations=10, regularization=0.01, 
              weighted=True, verbose=True ):
    #    unrated items should be recorded as zero
    #
    #    ratings should have an element-wise multiply method, an element-wise minimum method, 
    #    and a shape attribute, like scipy.sparse matrices do
    if verbose: print("setting up matrices . . .")
    n_users, n_items = ratings.shape
    X = rand(n_users, n_factors)
    Y_T = rand(n_items, n_factors)
    r = ratings.minimum(1)
    if regularization:
        regularization *= eye(n_factors)
    for iteration in range(1, n_iterations+1):
        if verbose: print("\titeration %d of %d . . ."%(iteration,n_iterations))
        for u in range(n_users):
            A = r[u].multiply(Y_T.T) @ Y_T
            b = ratings[u] @ Y_T
            X[u] = solve(A + regularization, b[0])
        for i in range(n_items):
            A = (r[:, i].multiply(X)).T @ X
            b = ratings[:, i].T @ X
            Y_T[i] = solve(A + regularization, b[0])
    Q_hat = np.dot(X,Y_T.T)
    
    return Q_hat, X, Y_T.T

## Compute the missing ratings using the ALS algorithm.

The parameters given seem to work OK for this dataset.

In [9]:
from scipy import sparse

# run ALS on pivot data to fill in NaN cells with a useful ratings estimate
lambda_ = 0.1 # note: changing this doesn't seem to affect much
n_factors = 10 # smaller #factors seems to give better results
n_iterations = 8 # 8-10 iterations works best for all-user data, 15-20 for elite & frequent users data

# replace NaNs (unrated games) with zeros
# rp_fixed2 = rp_fixed_users.fillna(0) # user based
# rp_fixed2 = rp_fixed.fillna(0) # game based
rp_fixed2 = sparse.csr_matrix(rp_fixed.fillna(0))

Q_hat, X, Y = do_ALS_MB(rp_fixed2, n_iterations=n_iterations, 
                        regularization=lambda_, n_factors=n_factors, weighted=True )
# Q_hat is our filled in matrix, errors lets us plot how things went
# Q_hat, X, Y, errors = do_ALS(rp_fixed2.values, n_iterations=n_iterations, lambda_=lambda_, n_factors=n_factors, weighted=True )



setting up matrices . . .
	iteration 1 of 8 . . .
	iteration 2 of 8 . . .
	iteration 3 of 8 . . .
	iteration 4 of 8 . . .
	iteration 5 of 8 . . .
	iteration 6 of 8 . . .
	iteration 7 of 8 . . .
	iteration 8 of 8 . . .


## Compute the Truncated SVD. 

This proces will result in a set of feature values for each game, allowing them to be plotted onto a map that theoretically indicates which games are related to each other by receiving similar ratings by similar users. 

In [10]:
from sklearn.decomposition import PCA, SparsePCA, KernelPCA, TruncatedSVD, NMF

# number of dimensions for analysis
numdims = 4

coords = TruncatedSVD(n_components=numdims).fit_transform(Q_hat)

### Functions to search for nearest neighbor games in SVD feature space

In [13]:
from scipy.spatial.distance import cdist

def find_nearest_neighbors(coords, x, numnearest):
    
    # get euclidean distances of all points to x    
    dists = cdist(np.reshape(x,(1,-1)),coords) 
    
    ind, = np.argsort(dists)

    # return the numnearest nearest neighbors
    return ind[:numnearest]
    
def recommend_games(targettitle, gametitles, coords, num2rec):
    
    # get coords of target title
    targetcoord = coords[gametitles==targettitle,:]
    
    # find nearest neighbors
    ind = find_nearest_neighbors(coords, targetcoord, num2rec+1)
    
    # Note: first entry will be the target title (distance 0)
    return ind[1:]

### Test out the recommender algorithm with a list of board games 

Note: this recommender only uses one example game

In [14]:
targettitles = [
    "Monopoly", # low rated mass market
    "Apples to Apples", # higher rated mass market and party
    "Zombicide", # Thematic co-op, adult theme, miniatures
    "Mice and Mystics", # Thematic co-op, family theme, miniatures
    "Love Letter", # social deduction, party, "filler"
    "Ticket to Ride", # very light-weight gateway eurogame 
    "Catan", # light-weight gateway eurogame
    "Carcassonne", # light-weight gateway eurogame
    "Agricola", # mid-weight eurogame 
    "Terraforming Mars", # mid-heavy-weight eurogame
    "Caverna: The Cave Farmers" # heavy eurogame
    ]

# number of recommended games to present
num2recommend = 5

# get game titles from titledata
gametitles = titledata.title[rp.index].values

# give recommendations for each target game 
for title in targettitles:
    recs = recommend_games(title, gametitles, coords, num2recommend)
    print('If you like %s, you should try: %s\n' % 
          (title, ', '.join(gametitles[recs])))

If you like Monopoly, you should try: Battleship, The Game of Life, UNO, Checkers, Exploding Kittens

If you like Apples to Apples, you should try: Stratego, Scattergories, Monopoly Deal Card Game, Rummikub, Once Upon a Time: The Storytelling Card Game

If you like Zombicide, you should try: Dungeons & Dragons: Castle Ravenloft Board Game, Sentinels of the Multiverse, XCOM: The Board Game, Firefly: The Game, Pathfinder Adventure Card Game: Rise of the Runelords – Base Set

If you like Mice and Mystics, you should try: Merchants & Marauders, Zombicide: Black Plague, Pathfinder Adventure Card Game: Rise of the Runelords – Base Set, Blood Bowl: Team Manager – The Card Game, XCOM: The Board Game

If you like Love Letter, you should try: Telestrations, Thebes, PitchCar, The Downfall of Pompeii, Kingdomino

If you like Ticket to Ride, you should try: Ticket to Ride: Europe, Jaipur, Carcassonne: Expansion 1 – Inns & Cathedrals, Carcassonne, Sushi Go Party!

If you like Catan, you should try: 