# Game Recommendation
<i>Authors: Markus Viljanen</i>

This notebook is a simple demonstration of models in our paper. We focus on the following four new recommendation algorithms:
1. Multivariate Normal Distribution (MVN): recommender based on game likes (collaborative filtering).
2. Questions: recommender based on game likes and player features (content based).
3. Tags: recommender based on game likes and game features (content based).
4. QuestionsXTags: recommender based on game likes, player features and game features (content based).

In the first experient, we benchmark the algorithms against various baselines (Random, Popularity, SVD) on their recommendation accuracy in the Top-N recommendation task. We use the normalized Discounted Cumulative Gain (nDCG) and precision@k metrics.  The experiment demonstrates that the best model depends on the recommendation task: we differentiate four settings based on whether we recommend to new games, new players, or both simulatenously.

In the second experiment, we contrast recommendation accuracy and the subjective quality of the recommenders. We measure accuracy with a simple training / test set split, the subjective quality by the actual recommendations for 7 example players. It seems that accuracy does not necessarily correlate with subjective quality. The main reason for this is popularity bias; recommeding more popular games that are not directly related to liked games or question answers improves accuracy.

Finally, we interpret the parameters of the four models and demonstrate that they produce useful information about why certain games are liked by certain players. This can be used to explain the reasoning behind recommendations or for fine-tuning the models to improve the subjective quality.


## Imports

In [1]:
import numpy as np
import pandas as pd
import scipy.sparse as sp
from sklearn.model_selection import ShuffleSplit
from ipywidgets import interact

from data import Data
from validation import split_data_k, precision_score_player, precision_score_array, dcg_score_player, dcg_score_array
from methods import Popularity, Random, MVN, MVN_nobias, PureSVD, SVD, Questions, Tags, QuestionsXTags
from widgets import display_validation_questions, display_online_questions, display_online
from IPython.display import Image, HTML

def fmt_allcols(df):
    return (dict(zip(df.columns, [lambda s: '<p>%s</p>' % ("<br>".join(s.split(",")))] * len(df.columns))))


  pd.set_option('display.max_colwidth', -1)


## Data Set

> Unfortunately this data set is not ours and the contributors wish to keep it private.

<img src="data_matrices.png" alt="Illustration of data matrices" style="height: 200px;"/>

The recommendations are based on three data sets that we load as matrices:
1. likes: (players, games) matrix of game like status for every player and game (0/1) 
2. questions: (players, questions) matrix of question preferences for every player (-2,-1,0,1,2).
3. tags: (games, tags) matrix of game tag status for every game (0/1)

In addition we have the following mapping to human readable explanations for each id:
1. userids: mapping of userids to player names
2. gameids: mapping of gameids to game names
3. questionids: mapping of questionids to question names
4. tagids: mapping of tagids to tag names

We load a local copy of the data set:

In [2]:
data = Data()
likes, questions, tags, userids, gameids, questionids, tagids = data.get_game_matrix_with_features()

Players: 15894
Games: 6465
Questions: 61
Tags: 379


For these quantities (15894 players, 6465 games, 61 questions, 379 tags), the dimensions are as expected:

In [3]:
print(likes.shape, questions.shape, tags.shape, len(userids), len(gameids), len(questionids), len(tagids))

(15894, 6465) (15894, 61) (6465, 379) 15894 6465 61 379


The last 7 rows are players that we use as example recommendations:

In [4]:
validation_rows = np.arange(likes.shape[0])[-data.n_validation:]
print(validation_rows)

[15887 15888 15889 15890 15891 15892 15893]


## Train and test set split

In the previous picture, we illustrated four different validation settings:
1. Setting 1: recommendations when both the player and the game belong the training set
2. Setting 2: recommendations when the game does not belong to the training set.
3. Setting 3: recommendations when the player does not belong to the training set.
4. Setting 4: recommendations when neither the player or the game belongs to the training set.

This is achieved with a simple training and test set split. We split the columns (games) into training and test, and the rows (players) into training and test:

In [5]:
split = 0.25
row_split = ShuffleSplit(n_splits=1, test_size=split, random_state=0)
train_rows, test_rows = next(row_split.split(likes))
col_split = ShuffleSplit(n_splits=1, test_size=split, random_state=0)
train_cols, test_cols = next(col_split.split(likes.T))

This allocates the player features and the game features into training and test feature matrices:

In [6]:
X1_train = questions[train_rows,]
X1_test = questions[test_rows,]
X2_train = tags[train_cols,]
X2_test = tags[test_cols,]

X1_test = np.array(X1_test.toarray(), dtype=float)
X2_test = np.array(X2_test.toarray(), dtype=float)

print(X1_train.shape, X1_test.shape, X2_train.shape, X2_test.shape)

(11920, 61) (3974, 61) (4848, 379) (1617, 379)


This is the game like matrix split into the four settings:

In [7]:
Y = np.array(likes.toarray(), dtype=float)
Y1 = sp.csr_matrix(Y[np.ix_(train_rows, train_cols)])
Y2 = sp.csr_matrix(Y[np.ix_(train_rows, test_cols)])
Y3 = sp.csr_matrix(Y[np.ix_(test_rows, train_cols)])
Y4 = sp.csr_matrix(Y[np.ix_(test_rows, test_cols)])
print(Y1.shape, Y2.shape, Y3.shape, Y4.shape)

(11920, 4848) (11920, 1617) (3974, 4848) (3974, 1617)


In Setting 1 the recommendations are based on 3 game likes and the goal is to guess the remaining game likes, so we split it further:

In [8]:
Y1_train, Y1_test = split_data_k(Y1, 3)

## Fit models

We now fit the models and evaluate their nDCG and Precision@k in each of the 4 different settings.

It is actually possible to think four different models (SVD, Questions, Tags, Questions x Tags) as instances of a similar model. In the SVD, we learn latent player features and latent game features based on the likes. In the Questions model, we assume that the player features (questions) are given and learn how each game responds to each feature based on the likes. In the Tags model, we assume that the game features (tags) are given and learn how each player responds to each feature based on the likes. In the Questions x Tags model, we assume both player features (questions) and game features (tags) are given and learn how each player feature interacts with each game feature based on the likes. 

This means that the models have varying degrees of flexibility with the SVD having the most and the Questions x Tags having the last. However, as illustrated in the picture, this means that the models can generalize to varying degrees with the SVD having the least and the Questions x Tags having the most settings.

<img src="data_models.png" alt="Illustration of models" style="width: 600px;"/>

In [9]:
def evaluate_precision(model):
    p = model.predict1()
    prc1 = precision_score_array(p, Y1_test, Y1_train)
    p = model.predict2(X2_test)
    prc2 = precision_score_array(p, Y2)
    p = model.predict3(X1_test)
    prc3 = precision_score_array(p, Y3)
    p = model.predict4(X1_test, X2_test)
    prc4 = precision_score_array(p, Y4)
    return(prc1, prc2, prc3, prc4)


In [10]:
def evaluate_dcg(model):
    p = model.predict1()
    acc1 = dcg_score_array(p, Y1_test, Y1_train)
    p = model.predict2(X2_test)
    acc2 = dcg_score_array(p, Y2)
    p = model.predict3(X1_test)
    acc3 = dcg_score_array(p, Y3)
    p = model.predict4(X1_test, X2_test)
    acc4 = dcg_score_array(p, Y4)
    return(acc1, acc2, acc3, acc4)


In [11]:
dcg = {}
prc = {}

model = Random()
model.fit(Y1_train)
prc['Random'] = evaluate_precision(model)
dcg['Random'] = evaluate_dcg(model)

model = Popularity()
model.fit(Y1_train)
prc['Popularity'] = evaluate_precision(model)
dcg['Popularity'] = evaluate_dcg(model)

model = MVN(Y1_train)
prc['MVN'] = evaluate_precision(model)
dcg['MVN'] = evaluate_dcg(model)

model = PureSVD(k=20)
model.fit(Y1_train)
prc['PureSVD'] = evaluate_precision(model)
dcg['PureSVD'] = evaluate_dcg(model)

model = SVD(k=20, epochs=256, reg=16)
model.fit(Y1_train)
prc['SVD'] = evaluate_precision(model)
dcg['SVD'] = evaluate_dcg(model)

model = QuestionsXTags()
model.fit(Y1_train, X1_train, X2_train, reg=1)
prc['QuestionsXTags'] = evaluate_precision(model)
dcg['QuestionsXTags'] = evaluate_dcg(model)

model = Tags()
model.fit(Y1_train, X2_train, reg=1000)
prc['Tags'] = evaluate_precision(model)
dcg['Tags'] = evaluate_dcg(model)

model = Questions()
model.fit(Y1_train, X1_train, reg=1)
prc['Questions'] = evaluate_precision(model)
dcg['Questions'] = evaluate_dcg(model)



## Plot

In terms of precision the most accurate model is the MVN in Setting 1, the Tags in Setting 2, the Questions in Setting 3, and the Questions x Tags in Setting 4:

In [14]:
df = pd.DataFrame(prc, index=['Setting 1', 'Setting 2', 'Setting 3', 'Setting 4'])
(df.T*100).round(1)

Unnamed: 0,Setting 1,Setting 2,Setting 3,Setting 4
Random,0.3,0.1,0.1,0.1
Popularity,3.3,0.1,4.0,0.1
MVN,5.1,0.1,4.0,0.1
PureSVD,3.8,0.1,0.1,0.1
SVD,4.3,0.1,0.1,0.1
QuestionsXTags,2.6,1.1,2.7,1.2
Tags,2.5,1.7,0.1,0.1
Questions,3.4,0.1,4.3,0.1


The results are same for nDCG that considers the quality of the entire recommendation list:

In [16]:
df = pd.DataFrame(dcg, index=['Setting 1', 'Setting 2', 'Setting 3', 'Setting 4'])
(df.T*100).round(1)

Unnamed: 0,Setting 1,Setting 2,Setting 3,Setting 4
Random,15.0,13.2,14.6,13.4
Popularity,26.4,13.2,29.8,13.4
MVN,32.6,13.2,29.8,13.4
PureSVD,28.2,13.2,14.6,13.4
SVD,30.7,13.2,14.6,13.4
QuestionsXTags,23.8,19.9,24.4,20.0
Tags,23.6,22.3,14.6,13.4
Questions,26.7,13.2,32.2,13.4


## Qualitative

It is often not enough to consider only the quantitative accuracy of the methods. The subjective quality of the recommendations is also important and we should at least verify that more accurate models mean better recommendations. For this purpose, we split the data into Setting 1, where the goal is to recommend games based on 3 game likes:

In [17]:
games_train, games_test = split_data_k(likes, 3)

### MVN

In [18]:
model = MVN(games_train)
prc = precision_score_player(model.predict, games_test, games_train, k=20)
print("Precision@k %.3f" % prc)

recs_mvn = display_validation_questions(validation_rows, model.predict, userids, games_train, gameids, 
                             data.games_covers, data.games_description,
                            questions, questionids, data.items_description, n_recommendations=10)

Precision@k 0.059


### Questions

In [20]:
model = Questions()
model.fit(games_train, questions, reg=1)
prc = precision_score_player(model.predict, games_test, games_train, k=20)
print("Precision@k %.3f" % prc)

recs_questions = display_validation_questions(validation_rows, model.predict, userids, games_train, gameids, 
                             data.games_covers, data.games_description,
                            questions, questionids, data.items_description, n_recommendations=10)

Precision@k 0.036


### Tags

In [22]:
model = Tags()
model.fit(games_train, tags, reg=1000)
prc = precision_score_player(model.predict, games_test, games_train, k=20)
print("Precision@k %.3f" % prc)

recs_tags = display_validation_questions(validation_rows, model.predict, userids, games_train, gameids, 
                             data.games_covers, data.games_description,
                            questions, questionids, data.items_description, n_recommendations=10)

Precision@k 0.024


### Questions X Tags

In [24]:
model = QuestionsXTags()
model.fit(games_train, questions, tags, reg=1)
prc = precision_score_player(model.predict, games_test, games_train, k=20)
print("Precision@k %.3f" % prc)

recs_questionsxtags = display_validation_questions(validation_rows, model.predict, userids, games_train, gameids, 
                             data.games_covers, data.games_description,
                            questions, questionids, data.items_description, n_recommendations=10)

Precision@k 0.025


### Results:

These are the Top 10 recommendations for each of the 7 example players:

In [26]:
recs = pd.concat([recs_mvn, recs_questions['recommendations'], recs_tags['recommendations'], 
                       recs_questionsxtags['recommendations']], axis=1)
recs.columns = ['items', 'games', 'MVN', 'Questions', 'Tags', 'QuestionsXTags']
display(HTML(recs.to_html(escape=False, formatters=fmt_allcols(recs), na_rep='')))

Unnamed: 0_level_0,items,games,MVN,Questions,Tags,QuestionsXTags
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
93519,Engaging in battle Weapons and skills selection for characters Searching and collecting rare treasures,Child of Light Dungeon Master Shin Megami Tensei: Persona 3,Persona 5 Shin Megami Tensei: Persona 4 Chrono Cross FINAL FANTASY IX Bravely Default Xenogears Pokémon GO Final Fantasy VII NetHack Final Fantasy X,World of Warcraft Overwatch The Witcher 3: Wild Hunt Clash Royale Pokémon GO Diablo Mass Effect Fallout 4 Fortnite FINAL FANTASY,Costume Quest Ori and the Blind Forest Abyss Odyssey Fortune Summoners Bahamut Lagoon Dust: An Elysian Tail Hollow Knight LiEat Momodora: Reverie Under the Moonlight FINAL FANTASY IX,Fallout 4 Fallout: New Vegas Fallout 3 Mass Effect 2 Deus Ex Warframe Grand Theft Auto V Red Dead Redemption Dragon's Dogma: Dark Arisen Mass Effect
93520,Piloting and steering vehicles Racing in a high speed Challenges of tactics,Project CARS Gran Turismo 5 Forza Horizon 2,BioShock Infinite NHL 15 World of Warships Portal 2 NBA Live 16 Forza Motorsport 6 Star Wars: Battlefront II Mini Metro Call of Duty: World at War Halo: Reach,Call of Duty Grand Theft Auto Clash of Clans Angry Birds Battle-field StarCraft Dota 2 World of Tanks Age of Empires Red Dead Redemption 2,DiRT 4 Gran Turismo 2 Gran Turismo (PSP) Forza Motorsport 4 Forza Motorsport 2 Need for Speed: Hot Pursuit 2 Test Drive Unlimited 2 Forza Motorsport 6 Forza Motorsport 5 Forza Motorsport,StarCraft: Brood War StarCraft Grand Theft Auto IV StarCraft II: Legacy of the Void Kerbal Space Program Doom II RPG Call of Duty Dota 2 Grand Theft Auto V The Banner Saga 2
93521,Running in a fast speed while avoiding obstacles Developing skills and abilities Challenges of fast reaction,Shovel Knight Super Mario 3D World The Legend of Zelda: Ocarina of Time,The Legend of Zelda: Breath of the Wild Super Mario 64 The Legend of Zelda: Majora's Mask Pokémon GO The Legend of Zelda: The Wind Waker The Legend of Zelda: A Link to the Past Super Mario Galaxy Super Mario Sunshine Super Mario World The Witcher 3: Wild Hunt,TETRIS League of Legends Crash Bandicoot Minecraft Call of Duty Overwatch Fortnite Candy Crush Saga RuneScape Tomb Raider,The Legend of Zelda: Twilight Princess Rogue Legacy Assassin's Creed IV: Black Flag The Legend of Zelda: A Link to the Past Power Stone 2 The Legend of Zelda: Phantom Hourglass Super Mario Bros. 2 The Elder Scrolls IV: Oblivion Final Fantasy XII Castle Crashers,StarCraft StarCraft: Brood War Counter-Strike: Global Offensive Dota 2 Tomb Raider Fortnite League of Legends Counter-Strike Paladins Super Smash Bros. for Wii U
93522,Hugging  kissing and making out Investigating the story and its mysteries Challenges of logical problem-solving,Heavy Rain Steins;Gate Life Is Strange,Call of Duty TETRIS Angry Birds League of Legends World of Warcraft StarCraft Candy Crush Saga Fortnite Red Dead Redemption Clash Royale,Sudoku Pokémon GO World of Warcraft The Sims Fallout 3 The Witcher 3: Wild Hunt Farm Heroes Saga Angry Birds Counter-Strike: Global Offensive NetHack,Beyond: Two Souls The Wolf Among Us Zero Escape: Zero Time Dilemma Persona 4 Golden Alan Wake The Walking Dead: Season Two Shadow of Memories Hate Plus To the Moon Analogue: A Hate Story,Fallout 3 Mass Effect 2 The Elder Scrolls V: Skyrim Fallout 2 Fallout: New Vegas The Elder Scrolls IV: Oblivion Europa Universalis IV Fallout 4 Assassin's Creed IV: Black Flag Dragon Age: Origins
93523,Decorating rooms and houses Hugging  kissing and making out Challenges of logical problem-solving,Cities: Skylines Overcooked The Sims 2,The Sims 3 The Sims 4 The Sims Civilization V Stardew Valley Overwatch Life Is Strange Pokémon GO Dragon Age: Inquisition Minecraft,TETRIS Sudoku The Sims Gardenscapes World of Warcraft Farm Heroes Saga The Elder Scrolls IV: Oblivion Fallout 2 Homescapes NetHack,Train Valley Prison Architect The Sims RollerCoaster Tycoon 3: Platinum Tropico 4 Game Dev Tycoon Goat Simulator Tropico 3 Turbo Dismount AdVenture Capitalist,Fallout 3 Fallout 2 Fallout: New Vegas The Elder Scrolls IV: Oblivion Mass Effect 2 Warframe The Sims Fallout 4 Dragon's Dogma: Dark Arisen The Elder Scrolls III: Morrowind
93524,Waging war and conquering Managing and directing cities and their inhabitants Cratfing items and valuables by combining raw materials,Europa Universalis IV Rome: Total War Sid Meier's Civilization IV,Medieval II: Total War Empire: Total War Total War: Rome II The Elder Scrolls V: Skyrim Shogun: Total War Total War: SHOGUN 2 Mass Effect Medieval: Total War Napoleon: Total War Total War: WARHAMMER,Civilization V League of Legends Call of Duty The Witcher 3: Wild Hunt The Elder Scrolls V: Skyrim StarCraft Age of Empires World of Warcraft Diablo III The Elder Scrolls IV: Oblivion,Victoria II Crusader Kings II Civilization V Medieval II: Total War Empire: Total War Total War: Shogun 2 - Fall of the Samurai Making History: The Great War Total War: SHOGUN 2 Sid Meier's Civilization IV Civilization VI,Civilization V The Elder Scrolls IV: Oblivion The Elder Scrolls V: Skyrim StarCraft Galactic Civilizations III StarCraft II: Legacy of the Void Sins of a Solar Empire: Rebellion Transport Tycoon Deluxe Transport Tycoon Hearts of Iron IV
93525,Doing tricks in extreme sports Team sports Performing in sports,NHL 17 FIFA 18 WWE 2K19,FIFA 17 Fortnite WWE 2K18 Ashes Cricket Guitar Hero Candy Crush Saga FIFA 19 WWE SuperCard HITMAN 2 Sea of Thieves,Call of Duty Fortnite The Elder Scrolls V: Skyrim Grand Theft Auto Football Manager TETRIS FIFA 19 Fallout Clash of Clans Clash Royale,Madden NFL 17 Madden NFL 18 NHL 16 FIFA 16 FIFA 06 Pro Evolution Soccer 6 Pro Evolution Soccer 2018 FIFA Football 2002 NHL 18 FIFA 06: Road to FIFA World Cup,The Elder Scrolls V: Skyrim StarCraft StarCraft: Brood War Fortnite Call of Duty Fallout Civilization V The Elder Scrolls III: Morrowind StarCraft II: Legacy of the Void XCOM 2


Personally, I make the following judgments from these recommendations:
1. The MVN model works well. The recommendations sometimes include popular but unrelated games.
2. The Questions model recommends popular but unrelated games. The recommendations sometimes include related games.
3. The Tags model works well. The recommendations only include very similar games.
4. The Questions x Tags model recommends popular but unrelated games. 

## Qualitative (fixed)

There are some ways to fix the 'popularity bias' in the recommendations. Here we present a very simple one.

More popular games have more tags, so we create a more equal game features by filling possibly missing tags and normalizing each game feature vector:

In [27]:
k = 32
u, s, vt = sp.linalg.svds(sp.csr_matrix(tags, dtype=float), k=k)
tagsx = tags.dot(vt.T)
n = np.linalg.norm(tagsx, axis=1)
n = np.where(n > 0.0, n, 1.0).reshape(-1,1)
tagsx = sp.csr_matrix(tagsx/n)

Instead of directly predicting game likes, we predict the standardized devations from baseline popularity. This is analogous to the z-score normalization:

In [28]:
mu = games_train.sum(axis=0).A.flatten() / games_train.shape[0]
std = np.where(mu > 0.0, np.sqrt(mu*(1.0-mu)), 1.0)
games_train2 = sp.csr_matrix((games_train - mu) / std)

In [29]:
model = MVN_nobias(games_train)
prc = precision_score_player(model.predict, games_test, games_train, k=20)
print("Precision@k %.3f" % prc)

recs_mvn = display_validation_questions(validation_rows, model.predict, userids, games_train, gameids, 
                             data.games_covers, data.games_description,
                            questions, questionids, data.items_description, n_recommendations=10)

Precision@k 0.037


In [30]:
model = Questions()
model.fit(games_train2, questions, reg=1)
prc = precision_score_player(model.predict, games_test, games_train, k=20)
print("Precision@k %.3f" % prc)

recs_questions = display_validation_questions(validation_rows, model.predict, userids, games_train, gameids, 
                             data.games_covers, data.games_description,
                            questions, questionids, data.items_description, n_recommendations=10)

Precision@k 0.010


In [31]:
model = Tags()
model.fit(games_train2, tagsx, reg=1000)
prc = precision_score_player(model.predict, games_test, games_train, k=20)
print("Precision@k %.3f" % prc)

recs_tags = display_validation_questions(validation_rows, model.predict, userids, games_train, gameids, 
                             data.games_covers, data.games_description,
                            questions, questionids, data.items_description, n_recommendations=10)

Precision@k 0.007


In [32]:
model = QuestionsXTags()
model.fit(games_train2, questions, tagsx, reg=1)
prc = precision_score_player(model.predict, games_test, games_train, k=20)
print("Precision@k %.3f" % prc)

recs_questionsxtags = display_validation_questions(validation_rows, model.predict, userids, games_train, gameids, 
                             data.games_covers, data.games_description,
                            questions, questionids, data.items_description, n_recommendations=10)

Precision@k 0.004


These are the results:

In [33]:
recs = pd.concat([recs_mvn, recs_questions['recommendations'], recs_tags['recommendations'], 
                       recs_questionsxtags['recommendations']], axis=1)
recs.columns = ['items', 'games', 'MVN', 'Questions', 'Tags', 'QuestionsXTags']
display(HTML(recs.to_html(escape=False, formatters=fmt_allcols(recs), na_rep='')))

Unnamed: 0_level_0,items,games,MVN,Questions,Tags,QuestionsXTags
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
93519,Engaging in battle Weapons and skills selection for characters Searching and collecting rare treasures,Child of Light Dungeon Master Shin Megami Tensei: Persona 3,Xenogears Chrono Cross Bravely Default Shin Megami Tensei: Persona 4 Persona 5 Parallax Skool Daze Castle Master Planetfall Operation Stealth,Dead or Alive 5 Last Round 100 PICS Quiz Stranglehold Aldo's Adventure Call of Juarez: Gunslinger Turrican 80 Days Diablo Need for Speed: Porsche Unleashed Mercs,Persona Q: Shadow of the Labyrinth FINAL FANTASY XI: Ultimate Collection Seekers Edition The Lords of Midnight King's Field IV Mordor: The Depths of Dejenol Eye of the Beholder II: The Legend of Darkmoon Might and Magic Book One: The Secret of the Inner Sanctum The Bard's Tale II: The Destiny Knight The Bard's Tale III: Thief of Fate Sword Art Online: Memory Defrag,The Elder Scrolls V: Skyrim The Elder Scrolls IV: Oblivion Dragon's Dogma Gothic II Castlevania: Lords of Shadow 2 Dragon's Dogma: Dark Arisen Wolfenstein: The Old Blood BioShock Infinite Fallout: New Vegas METAL GEAR SOLID V: THE PHANTOM PAIN
93520,Piloting and steering vehicles Racing in a high speed Challenges of tactics,Project CARS Gran Turismo 5 Forza Horizon 2,NBA Live 16 Gran Turismo 6 10000000 Die Hard Trilogy Bullets And More VR - BAM VR Time Clickers Gran Turismo 4 Need for Speed: Underground Forza Motorsport 6 DiRT Rally,Turtles Call of Duty Total War Battles: Shogun Star Defender 4 Forza Horizon 4 Battle-field Legions: Overdrive Bomber Grand Theft Auto Same,Corvette Evolution GT Gran Turismo 3: A-Spec Need for Speed: Shift Gran Turismo (PSP) Gran Turismo 2 Forza Motorsport 4 Forza Motorsport 2 Midnight Club: Los Angeles WRC 2: FIA World Rally Championship Need for Speed: Porsche Unleashed,Forza Motorsport 4 Forza Motorsport 2 Real Racing Blur DiRT 3 Complete Edition Crazy Taxi Classic Rally Speedway Hugo Troll Race Car Racing Jet Car Stunts 2
93521,Running in a fast speed while avoiding obstacles Developing skills and abilities Challenges of fast reaction,Shovel Knight Super Mario 3D World The Legend of Zelda: Ocarina of Time,The Legend of Zelda: Majora's Mask Titan Souls Super Mario 64 The Legend of Zelda: The Wind Waker The Legend of Zelda: Breath of the Wild ReCore Yoshi's Woolly World Teenage Zombies: Invasion of the Alien Brain Thingys! NBA Playgrounds Boost,Crossy Road Dance Dance Revolution SuperNova FIFA Football 2002 Vigilante 8 Mean City Luminosity Coin Master Micro Machines Ragnarok Type:Rider,Terrian Saga: KR-17 Flicky SuperTux Papi Trampoline Monkey Ball Knytt Stories Cookie Run Tombi 2 Random Heroes Donkey King,LittleBigPlanet Karting Mario Bros. Sonic Dash 2: Sonic Boom Zombie Tsunami Wizball NBA Playgrounds Ski Safari Street Fighter X Tekken BasketBall APB: All Points Bulletin
93522,Hugging  kissing and making out Investigating the story and its mysteries Challenges of logical problem-solving,Heavy Rain Steins;Gate Life Is Strange,Beyond: Two Souls Until Dawn Life Is Strange: Before the Storm Top Drives Primal The Walking Dead L.A. Noire Life is Strange 2 Brothers: A Tale of Two Sons Vampyr,Sudoku Farm Heroes Saga Star Wars: X-Wing Alliance Hyperdimension Neptunia Victory Mystery Chronicle: One Way Heroics Prince of Persia Classic Prince of Persia 2: The Shadow and the Flame Prince of Persia 3D The Lost Valley Galactic Junk League,Higurashi When They Cry Hou - Ch.1 Onikakushi Higurashi When They Cry Hou - Ch.2 Watanagashi The House in Fata Morgana Higurashi When They Cry Hou - Ch.3 Tatarigoroshi Analogue: A Hate Story The Fruit of Grisaia Lucid9: Inciting Incident NEKOPARA Vol. 0 CLANNAD G-senjou no Maou - The Devil on G-String,Pixel People Trauma Center: Under the Knife 2 Riven: The Sequel to MYST VVVVVV Teslagrad Underground Antichamber The Talos Principle A Story About My Uncle OPUS: The Day We Found Earth
93523,Decorating rooms and houses Hugging  kissing and making out Challenges of logical problem-solving,Cities: Skylines Overcooked The Sims 2,The Sims 3 The Sims 4 Shop Heroes Medieval Engineers IRONSIGHT Tiny Brains Solitairica The Sims FreePlay Planet Coaster SimCity 4,Sudoku Gardenscapes Farm Heroes Saga Dangerous Dungeons Bubble Trouble 1010 Flow Lume Galactic Junk League Pro Evolution Soccer 2018,Solitaire Deluxe 2 Rummikub ACT IT OUT! A Game of Charades Minions Paradise Knowledge is Power Agricultural Simulator 2013 Amateur Surgeon 4 Drawful 2 KleptoCats Stars!,Infinifactory Pixel People Riven: The Sequel to MYST Toon Blast Buzz! Junior: Robo Jam VVVVVV SpaceChem Trauma Center: Under the Knife 2 Underground Myst V
93524,Waging war and conquering Managing and directing cities and their inhabitants Cratfing items and valuables by combining raw materials,Europa Universalis IV Rome: Total War Sid Meier's Civilization IV,Medieval II: Total War Hearts of Iron IV Crusader Kings II Total War: Rome II Empire: Total War Medieval: Total War Victoria II Total War: SHOGUN 2 Shogun: Total War Age of Conquest IV,T-Kara Puzzles Captain Tsubasa J: The Way to World Youth Enemy Territory: Quake Wars Captain Tsubasa J: Get In The Tomorrow Saber Marionette J: Battle Sabers Terrian Saga: KR-17 X-Men Legends II: Rise of Apocalypse Poker Night at the Inventory Epic Battle Fantasy 3 The Flame in the Flood,Order of Battle: World War II Civilization V Panzer Corps Close Combat - Gateway to Caen Combat Mission: Beyond Overlord Battle of the Bulge Sengoku Jidai: Shadow of the Shogun The Operational Art of War I: 1939-1955 Gary Grigsby's War in the East Civilization VI,Total Annihilation Command & Conquer: Generals Great Little War Game StarCraft: Brood War Commandos 2: Men of Courage Supreme Commander: Forged Alliance Command & Conquer: Tiberian Sun StarCraft Company of Heroes 2 Company of Heroes Online
93525,Doing tricks in extreme sports Team sports Performing in sports,NHL 17 FIFA 18 WWE 2K19,WWE 2K18 Ashes Cricket WWE SuperCard FIFA 17 HITMAN 2 Guitar Hero Live Guitar Hero Smash Hits Guitar Hero: Van Halen Automobilista Guitar Hero World Tour,FIFA 19 Football Manager Fortnite Call of Duty X-Plane 11 NBA 2K Command & Conquer (2013) Her Story Mario Golf Pro Evolution Soccer,Volleyball MLB The Show 17 FIFA Football 2002 FIFA 06 Pro Evolution Soccer 6 FIFA 16 Pro Evolution Soccer 2018 Athens 2004 Madden NFL 13 NBA 2K17,MLB The Show 18 NBA Live 19 Virtual Pool Strike TAP SPORTS BASEBALL 2016 Pocket Soccer FIFA 16 Soccer NHL Powerplay '96 Cricket Captain River City Super Sports Challenge


Personally, I make the following judgements from these recommendations:
1. The MVN model works pretty well. There are sometimes unrelated and unpopular recommendations.
2. The Question model works pretty well, though the questions are not very specific. There are sometimes unrelated and unpopular recommendations.
3. The Tags model works pretty well. Now there are sometimes unpopular recommendations.
4. The Questions x Tags model works pretty well, considering how unspecific the questions are.



## Interpretation of results

The four new models are very nice because they can be used to explain the game recommendations. In the following figure, we display the parameter matrices that the models have learned:
1. MVN: learns the correlation of game likes from the observed player and game likes. This provides a 'game similarity matrix' of how similar each game is.
2. Questions: learns how each game responds to each question. This provides a 'game profile' of what gaming preferences the game responds to.
3. Tags: learns how each player responds to each tag. This provides a 'player profile' of what game content the player responds to.
4. Questions x Tags: learns how each question interacts with each game tag. This provides an 'interaction matrix' between gaming prefences and game content.

<img src="data_parameters.png" alt="Illustration of parameter matrices for different models" style="height: 200px;"/>

We use the following examples to visualize certain rows or columns of these matrices:

In [41]:
example_games = ['World of Warcraft', 'The Elder Scrolls V: Skyrim', 'Overwatch', 'TETRIS',  'The Witcher 3: Wild Hunt'] 
example_tags = ['Puzzle', 'Real-time Strategy', 'Dating Sim', 'Racing', 'Lara Croft']
example_ids = userids[-data.n_validation:]
example_questions = ['Exploring the gameworld and its secrets','Commanding units or troops',
                 'Breeding, training and taking care of pets','Sniping to eliminate', 'Challenges of fast reaction',
                 'Developing skills and abilities','Jumping on platforms and bouncing on walls']

### SVD

The SVD, or any other matrix factorization method, is not very interpretable. For example, lets learn 20 latent factors for each game and each player:

In [54]:
model = SVD(k=20, epochs=256, reg=16)
model.fit(games_train)

These are the first 10 latent factors for our players:

In [59]:
U = model.model.user_factors
U = pd.DataFrame(U, index=np.arange(U.shape[0])+1, columns=["Factor #%d" % (i+1) for i in range(U.shape[1])])
U.iloc[validation_rows,:10]

Unnamed: 0,Factor #1,Factor #2,Factor #3,Factor #4,Factor #5,Factor #6,Factor #7,Factor #8,Factor #9,Factor #10
15888,-0.000203,0.000389,0.00052,1.5e-05,0.0021,-0.001579,0.001242,0.000367,0.001913,-0.000577
15889,0.000366,1.8e-05,0.000795,0.000275,0.00067,-0.000181,0.000907,0.000368,-0.001548,0.000257
15890,-5.6e-05,0.000142,0.000761,1.7e-05,0.002771,-0.001779,0.001884,0.000199,0.001411,6.2e-05
15891,0.002287,0.001272,0.001033,-0.003574,0.005826,-0.002732,0.008972,0.001522,0.002634,-0.002412
15892,0.008355,-0.018209,0.005429,0.000252,-0.00298,0.000391,0.009923,-0.003525,0.002598,0.003376
15893,0.005661,-0.006626,0.005488,0.00507,-0.000143,0.000857,0.000273,-0.00041,0.002138,0.010162
15894,0.000818,0.000618,0.003166,0.000203,0.000893,-0.000356,0.001973,-0.000464,-0.005015,-0.00027


These are the first 10 latent factors for our games:

In [60]:
V = model.model.item_factors
V = pd.DataFrame(V, index=gameids, columns=["Factor #%d" % (i+1) for i in range(V.shape[1])])
V.index = V.index.map(data.games_description)
V.loc[example_games,].iloc[:,:10]

Unnamed: 0,Factor #1,Factor #2,Factor #3,Factor #4,Factor #5,Factor #6,Factor #7,Factor #8,Factor #9,Factor #10
World of Warcraft,-0.959256,0.551591,0.383498,-0.145334,-0.594459,1.941971,0.743882,0.962612,0.663666,-0.294085
The Elder Scrolls V: Skyrim,1.928381,0.084203,-0.278434,0.742805,0.024795,0.429161,0.722782,0.950737,0.713456,0.658172
Overwatch,0.281211,1.085757,0.283146,0.170403,0.191141,0.948119,-0.200128,-0.506644,0.617173,0.56372
TETRIS,-1.139836,0.002695,1.289299,0.825857,-0.233939,-0.327397,0.770481,-0.003112,0.041457,-0.256872
The Witcher 3: Wild Hunt,-0.212724,-0.063128,0.086844,0.271733,1.925287,-0.107191,0.360373,0.151076,0.107342,1.581252


But who has any idea of what they mean? 

(actually there is a trick to extract a similarity matrix from these latent factors... but the MVN model makes it more formal and explicit)

### MVN

We use the unbiased MVN model because the correlation matrix is simpler to interpret than the covariance matrix:

In [35]:
model = MVN_nobias(games_train)

The correlation matrix provides a similarity score between every game. For example, we infer from the data that 'The Elder Scrolls V: Skyrim' is very close to 'The Withcer 3: Wild Hunt' but dissimilar to 'TETRIS':

In [36]:
coefs = pd.DataFrame(model.cov, index=data.games_description[gameids], columns=data.games_description[gameids])
coefs.loc[example_games, example_games]

name,World of Warcraft,The Elder Scrolls V: Skyrim,Overwatch,TETRIS,The Witcher 3: Wild Hunt
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
World of Warcraft,1.0,0.017016,0.065028,-0.040957,0.024343
The Elder Scrolls V: Skyrim,0.017016,1.0,0.028462,-0.046123,0.159341
Overwatch,0.065028,0.028462,1.0,-0.036149,0.03419
TETRIS,-0.040957,-0.046123,-0.036149,1.0,-0.042
The Witcher 3: Wild Hunt,0.024343,0.159341,0.03419,-0.042,1.0


Using this matrix, we can for example fetch the 10 most related games to every example game. The results are very good:

In [37]:
for game in example_games:
    print(game)
    for game_related in coefs[game].sort_values(ascending=False).head(10).index.values[1:]:
        print("\t", game_related)

World of Warcraft
	 Diablo II
	 Warcraft III: Reign of Chaos
	 Heroes of the Storm
	 Diablo III
	 Hearthstone
	 Star Wars: The Old Republic
	 Overwatch
	 Dark Age of Camelot
	 Warcraft: Orcs & Humans
The Elder Scrolls V: Skyrim
	 The Elder Scrolls IV: Oblivion
	 The Witcher 3: Wild Hunt
	 The Elder Scrolls III: Morrowind
	 Fallout 4
	 Fallout: New Vegas
	 Dragon Age: Origins
	 Fallout 3
	 The Elder Scrolls Online
	 Dragon Age: Inquisition
Overwatch
	 League of Legends
	 World of Warcraft
	 Team Fortress 2
	 Destiny 2
	 Heroes of the Storm
	 Sudden Attack
	 The Impossible Game
	 Seasons after Fall
	 Cuphead
TETRIS
	 Space Invaders
	 Super Mario Bros.
	 Snake
	 Candy Crush Saga
	 1942
	 The New Tetris
	 Bubble Bobble
	 Raid over Moscow
	 Paper Toss
The Witcher 3: Wild Hunt
	 The Witcher 2: Assassins Of Kings
	 The Elder Scrolls V: Skyrim
	 The Witcher: Enhanced Edition Director's Cut
	 The Witcher
	 Mass Effect 2
	 Mass Effect 3
	 Dragon Age: Origins
	 Mass Effect
	 Gwent: The Witcher Ca

### Questions X Tags

In [38]:
model = QuestionsXTags()
model.fit(games_train, questions, tags, reg=1)

It is a bit of work to extract the interaction matrix from a vector of learned parameters:

In [39]:
W = model.learner.predictor.W
d1 = questions.shape[1]
d2 = tags.shape[1]
coefs = pd.DataFrame([(questionids[i], tagids[j], W[j*d1 + i]) for i in range(d1) for j in range(d2)], 
                     columns=['questionid', 'tagid', 'coef'])
coefs['question'] = coefs['questionid'].map(data.items_description)
coefs['tag'] = coefs['tagid'].map(data.tags_description)
coefs = coefs.pivot(columns='tag', index='question', values='coef')

Each game preference is associated to each game tag. For example, 'Dating Sim' interacts strongly with liking 'Breeding, training and taking care of pets', and 'Lara Croft' intercats strongly with liking 'Jumping on platforms and bouncing on walls' in the questionnaire:

In [42]:
df = coefs.loc[example_questions,example_tags]
(df*10000).round(2)

tag,Puzzle,Real-time Strategy,Dating Sim,Racing,Lara Croft
question,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Exploring the gameworld and its secrets,-0.04,-0.09,0.68,-0.12,0.34
Commanding units or troops,0.16,-0.07,0.26,0.02,-1.02
"Breeding, training and taking care of pets",-0.12,-0.12,0.88,-0.12,0.44
Sniping to eliminate,-0.39,0.08,-1.25,0.11,0.53
Challenges of fast reaction,-0.2,-0.22,-0.98,0.07,1.22
Developing skills and abilities,0.04,-0.01,0.24,-0.28,1.99
Jumping on platforms and bouncing on walls,-0.13,-0.22,0.59,0.12,3.6


For example, we can take the example tags and see what gaming preferences interact most strongly with them. It is quite magical how well this can be learned from the game likes:

In [43]:
for tag in example_tags:
    print(tag)
    for item_related in coefs[tag].sort_values(ascending=False).head(5).index.values:
        print("\t", item_related)
    

Puzzle
	 Challenges of crosswords and other word puzzles
	 Challenges of logical problem-solving
	 Matching tiles or shapes together
	 Challenges of creative problem-solving
	 Empathizing and taking on roles
Real-time Strategy
	 Challenges of strategy and strategic thinking
	 Challenges of tactics
	 Waging war and conquering
	 Doing tricks in extreme sports
	 Challenges of acting in a constant hurry
Dating Sim
	 Gardening and taking care of farms
	 Dancing to the music
	 Decorating rooms and houses
	 Breeding, training and taking care of pets
	 Dressing up, applying make up and choosing looks
Racing
	 Fighting with martial arts
	 Challenges of logical problem-solving
	 Piloting and steering vehicles
	 Producing vehicles, units or weaponry
	 Defending your territory, city or base (Tower defense)
Lara Croft
	 Searching and collecting rare treasures
	 Jumping on platforms and bouncing on walls
	 Challenges of mazes and labyrinths
	 Investigating the story and its mysteries
	 Sneaking and 

### Questions

In [47]:
model = Questions()
model.fit(games_train, items, reg=1)

Each game has its own vector of what game preferences it responds to. For example, 'World of Warcraft' responds to liking 'Exploring the gameworld and its secrets', whereas 'TETRIS' responds to 'Challenges of fast reaction':

In [48]:
coefs = pd.DataFrame(model.item_features.T, index=data.items_description[itemids], columns=data.games_description[gameids])
df = coefs.loc[example_items,example_games]
(df*100).round(2)

name,World of Warcraft,The Elder Scrolls V: Skyrim,Overwatch,TETRIS,The Witcher 3: Wild Hunt
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Exploring the gameworld and its secrets,0.7,0.48,0.39,-0.28,0.35
Commanding units or troops,-0.04,0.03,0.08,0.39,0.11
"Breeding, training and taking care of pets",0.23,-0.06,0.2,-0.3,0.02
Sniping to eliminate,-0.85,-1.11,0.68,-0.78,-0.87
Challenges of fast reaction,-0.54,-0.6,0.36,1.16,-0.62
Developing skills and abilities,0.62,0.65,0.31,-0.11,0.4
Jumping on platforms and bouncing on walls,-0.18,0.31,0.11,0.15,0.18


We can take the example games and visualize what game prefences they respond to most strongly:

In [49]:
for game in example_games:
    print(coefs[game].name)
    for item_related in coefs[game].sort_values(ascending=False).head(5).index.values:
        print("\t", item_related)


World of Warcraft
	 Exploring the gameworld and its secrets
	 Developing skills and abilities
	 Investigating the story and its mysteries
	 Searching and collecting rare treasures
	 Killing and Murdering
The Elder Scrolls V: Skyrim
	 Developing skills and abilities
	 Befriending with in-game characters
	 Killing and Murdering
	 Investigating the story and its mysteries
	 Searching and collecting rare treasures
Overwatch
	 Challenges of acting in a constant hurry
	 Sniping to eliminate
	 Exploring the gameworld and its secrets
	 Searching and collecting rare treasures
	 Defending your territory, city or base (Tower defense)
TETRIS
	 Challenges of memorizing
	 Challenges of crosswords and other word puzzles
	 Challenges of fast reaction
	 Matching tiles or shapes together
	 Challenges of logical problem-solving
The Witcher 3: Wild Hunt
	 Making meaningful choices and having decisive dialogues
	 Searching and collecting rare treasures
	 Moving to the beat and staying in the rhythm
	 Helpi

### Tags

In [50]:
model = Tags()
model.fit(games_train, tags, reg=1000)

Each player has a vector of how they respond to game tags. For example, the first player seems to like games with the tag 'Dating Sim', and the second player games with the tag 'Racing':

In [52]:
coefs = pd.DataFrame(model.user_features, index=userids, columns=data.tags_description[tagids])
df = coefs.loc[example_ids,example_tags]
(df*10000).round(2)

name,Puzzle,Real-time Strategy,Dating Sim,Racing,Lara Croft
93519,-2.37,-0.04,8.37,-0.79,0.07
93520,-1.22,0.06,-0.31,11.74,0.01
93521,0.88,-0.11,-0.28,5.66,-0.18
93522,0.58,0.35,7.33,-1.75,-0.16
93523,-2.52,-3.71,7.96,-2.63,0.09
93524,-0.07,7.59,-0.26,-1.1,0.01
93525,-0.74,-0.82,0.09,-3.34,-0.06


Instead of userids, it is perhaps easiest to describe players by the games they liked. For example, we print the following example players and what tags they respond most to:

In [53]:
for row in validation_rows:
    print(row, data.games_description[gameids[games_train.getrow(row).A.flatten().astype(bool)]].values)
    for tag_related in coefs.iloc[row,:].sort_values(ascending=False).head(5).index.values:
        print("\t", tag_related)


15887 ['Child of Light' 'Dungeon Master' 'Shin Megami Tensei: Persona 3']
	 Turn-based Strategy
	 First-person shooter
	 Friendship
	 Hand-drawn
	 Asian
15888 ['Project CARS' 'Gran Turismo 5' 'Forza Horizon 2']
	 Motorsports
	 Driving
	 Driving Simulator
	 Customization
	 Linear
15889 ['Shovel Knight' 'Super Mario 3D World'
 'The Legend of Zelda: Ocarina of Time']
	 Character development
	 4 Player Local
	 Fishing
	 Real-Time
	 Medieval
15890 ['Heavy Rain' 'Steins;Gate' 'Life Is Strange']
	 Cinematic
	 Quick-Time Events
	 Story Rich
	 Drama
	 Time Attack
15891 ['Cities: Skylines' 'Overcooked' 'The Sims 2']
	 Family Friendly
	 Funny
	 Vehicles
	 Great Soundtrack
	 Sandbox
15892 ['Europa Universalis IV' 'Rome: Total War' "Sid Meier's Civilization IV"]
	 Trading
	 Grand Strategy
	 Replay Value
	 Naval
	 4X
15893 ['NHL 17' 'FIFA 18' 'WWE 2K19']
	 Non-fiction
	 Sports
	 Ice Hockey
	 Co-op
	 Multiplayer


## Widgets

The following widgets can be used to obtain predictions in real time from the four different models. The inputs to each of the models are:
1. MVN: recommend by game likes (based on likes)
2. Questions: recommend by question answers (based on likes)
3. Tags: recommend by game likes (based on likes & game content)
4. Questions X Tags: recommend by question answers (based on likes & game content)

In [44]:
mu = likes.sum(axis=0).A.flatten() / likes.shape[0]
std = np.where(mu > 0.0, np.sqrt(mu*(1.0-mu)), 1.0)
games2 = sp.csr_matrix((likes - mu) / std)

### MVN

In [45]:
in_data = data.games_description.index.isin(gameids)
@interact
def search(Search='mass effect 2'):
    return(data.games_description[data.games_description.str.contains(Search, case=False) & in_data])


interactive(children=(Text(value='mass effect 2', description='Search'), Output()), _dom_classes=('widget-inte…

In [46]:
model = MVN_nobias(sp.csr_matrix(likes, dtype=float))

In [47]:
display_online(model.predict_online, gameids, data.games_covers, data.games_description)

interactive(children=(Text(value='442', description='fav_game1'), Text(value='', description='fav_game2'), Tex…

### Questions

In [48]:
model = Questions()
model.fit(games2, questions, reg=1)

In [49]:
display_online_questions(model.predict_online, gameids, data.games_covers, data.games_description, questionids, data.items_description)

interactive(children=(IntSlider(value=0, description='Engaging in battle', layout=Layout(width='600px'), max=2…

### Tags

In [50]:
model = Tags()
model.fit(games2, tagsx, reg=1000)

In [51]:
display_online(model.predict_online, gameids, data.games_covers, data.games_description)

interactive(children=(Text(value='442', description='fav_game1'), Text(value='', description='fav_game2'), Tex…

### Questions x Tags

In [52]:
model = QuestionsXTags()
model.fit(games2, questions, tagsx, reg=1)

In [53]:
display_online_questions(model.predict_online, gameids, data.games_covers, data.games_description, questionids, data.items_description)

interactive(children=(IntSlider(value=0, description='Engaging in battle', layout=Layout(width='600px'), max=2…