# Video Game Recommender - Magic Mirror

The goal is to build a video game recommender which takes in a user id and recommends a few games.  
This recommender will be based on similar profiles of users and their preferred games.  
In human terms, think of it as groups of gamers who enjoy the same genres and who advise each other on which new games to try.  

The dataset represents video game/user data from Steam. It is grouped by user id and game name and indicates if the game was purchased plus the number of hours played.

## I. Imports and Data

In [203]:
import pandas as pd
import numpy as np

from scipy import sparse
from lightfm import LightFM

#### Note: purchase_or_hours always takes the value "1" when the game was purchased and the number of hours played otherwise.

In [2]:
df = pd. read_csv(
    "data/steam-200k.csv",
    header=None,
    names=["userid", "game", "action", "hours_played", "useless"]
)[["userid", "game", "action", "hours_played"]]

df.head()

Unnamed: 0,userid,game,action,hours_played
0,151603712,The Elder Scrolls V Skyrim,purchase,1.0
1,151603712,The Elder Scrolls V Skyrim,play,273.0
2,151603712,Fallout 4,purchase,1.0
3,151603712,Fallout 4,play,87.0
4,151603712,Spore,purchase,1.0


## II. Preprocessing

### a) Missing Values

In [3]:
df.isnull().sum()

userid          0
game            0
action          0
hours_played    0
dtype: int64

### b) Disregarding the "purchase" action

We only keep values for which "action" == "play" because we wish to score games proportionally to the number of hours played

In [4]:
df_play = df[df["action"] == "play"]

# Fixing the index after the slicing
df_play["new_index"] = [*range(0, len(df_play))]
df_play = df_play.set_index("new_index").drop(columns="action")

df_play.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_play["new_index"] = [*range(0, len(df_play))]


Unnamed: 0_level_0,userid,game,hours_played
new_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,151603712,The Elder Scrolls V Skyrim,273.0
1,151603712,Fallout 4,87.0
2,151603712,Spore,14.9
3,151603712,Fallout New Vegas,12.1
4,151603712,Left 4 Dead 2,8.9


### c) Appreciation

In this dataset, we do not have an explicit feature translating a game's rate of appreciation by a user.  
Therefore, we consider that the user LOVED the game if he played at least 50 hours.  
God knows there are video games we hated but still played for more than 50 hours, which justifies a higher threshold. However, that would unvoluntarily discard small video games (which take few hours to complete).  


We add other score classes to make it more realistic. The logic behind is the following:
   
- 3 if plays more than 50 hours (even if it's a small game, if it's really good then it should be played multiple times)
- 2 if between 20 and 50 hours
- 1 if less than 20 hours
- 0 if less than 2 hours (it's shit)

In [5]:
df_play["score"] = [
    3 if df_play["hours_played"][i] >= 50 else
    2 if df_play["hours_played"][i] >= 20 and df_play["hours_played"][i] < 50 else
    1 if df_play["hours_played"][i] > 2 and df_play["hours_played"][i] < 20 else
    0 for i in range(len(df_play))
]

df_play.head(5)

Unnamed: 0_level_0,userid,game,hours_played,score
new_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,151603712,The Elder Scrolls V Skyrim,273.0,3
1,151603712,Fallout 4,87.0,3
2,151603712,Spore,14.9,1
3,151603712,Fallout New Vegas,12.1,1
4,151603712,Left 4 Dead 2,8.9,1


## III. The Architecture

### a) Interaction Matrix and User Dictionnary

#### Interaction Matrix

This matrix will have users as rows and games as columns. It is required for our Matrix Factorization (MF) algorithm.

In [6]:
def create_interaction_matrix(df, user_col, game_col, rating_col, norm=False, threshold=None):
    '''
    Function to create an interaction matrix dataframe from transactional type interactions
    
    Input:
        - df = Pandas DataFrame containing user-game interactions
        - user_col = user_id
        - game_col = game
        - rating col = score

    Output:
        - Pandas dataframe with user-game interactions ready to be fed in a recommendation algorithm
    '''
    
    interactions = df.groupby([user_col, game_col])[rating_col].sum().unstack().reset_index().fillna(0).set_index(user_col)
    
    return interactions

In [7]:
interaction_matrix = create_interaction_matrix(df_play, "userid", "game", "score")
interaction_matrix.head(5)

game,007 Legends,0RBITALIS,1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby),10 Second Ninja,"10,000,000",100% Orange Juice,1000 Amps,12 Labours of Hercules,12 Labours of Hercules II The Cretan Bull,12 Labours of Hercules III Girl Power,...,rFactor,rFactor 2,realMyst,realMyst Masterpiece Edition,resident evil 4 / biohazard 4,rymdkapsel,sZone-Online,the static speaks my name,theHunter,theHunter Primal
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5250,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76767,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
86540,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
144736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
181212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### User Dictionnary

Dictionnary where the first user_id of the interaction matrix takes an index of 0, the second an index of 1 and so on.

In [8]:
def create_user_dict(interactions):
    '''
    Function to create a user dictionary based on their index in the interaction dataset
    
    Input:
        interactions - interaction matrix
    
    Output:
        user_dict - Dictionary type output containing interaction_index as key and user_id as value
    '''
    
    user_id = list(interactions.index)
    user_dict = {}
    counter = 0 
    
    for i in user_id:
        user_dict[i] = counter
        counter += 1
        
    return user_dict

In [9]:
user_dict = create_user_dict(interaction_matrix)
list(user_dict.items())[:5]

[(5250, 0), (76767, 1), (86540, 2), (144736, 3), (181212, 4)]

### b) Matrix Factorization Algorithm

In [12]:
def runMF(interactions, n_components=30, loss="warp", k=15, epoch=30):
    '''
    Function to run the matrix-factorization algorithm
    
    Input:
        - interactions = interaction matrix
        - n_components = number of embeddings to define game and user
        - loss = logistic or brp
        - epoch = number of epochs 
    
    Output:
        Model - Trained model
    '''
    
    x = sparse.csr_matrix(interactions.values)
    model = LightFM(
        no_components=n_components, loss=loss, k=k
    )
    model.fit(x, epochs=epoch)
    
    return model

In [176]:
MFmodel = runMF(interaction_matrix)
MFmodel

<lightfm.lightfm.LightFM at 0x6702934d00>

## IV. Recommender

In [226]:
def sample_recommendation_user(
    model, interactions, user_id, user_dict, nrec_games=5
):
    '''
    Function to produce game recommendations when given a user id
    
    Input:
        - model = Trained matrix factorization model
        - interactions = interaction matrix
        - user_id = user ID for which we need to generate the recommendations
        - user_dict = predefined user dictionnary
        - nrec_games = Number of games to recommend
        
    Output: 
        - List of the user's favorite games
        - List of recommended games
    '''
    
    n_users, n_games = interactions.shape
    user_x = user_dict[user_id]
    
    scores = pd.Series(
        model.predict(user_x, np.arange(n_games))
    )
    scores.index = interactions.columns
    game_score_df = pd.DataFrame(scores).reset_index().rename(columns={0: "score"})
    
    scores = list(pd.Series(
        scores.sort_values(ascending=False).index)
    )
    
    known_games = list(pd.Series(
        interactions.loc[user_id,:][interactions.loc[user_id,:] > 0].index).sort_values(ascending=False)
    )
    
    scores = [x for x in scores if x not in known_games]
    
    print("Favorite Games:")
    favorite_games = game_score_df[
        game_score_df["game"].isin(known_games)
    ].sort_values(by="score", ascending=False)
    for i in range(nrec_games):
        game_score = "%.3f" % favorite_games['score'].iloc[i]
        print(f"{favorite_games['game'].iloc[i]} - {game_score}")
    
    print("\nRecommended Games:")
    best_predicted_games = game_score_df[
        game_score_df["game"].isin(scores)
    ].sort_values(by="score", ascending=False)
    for i in range(nrec_games):
        game_score = "%.3f" % best_predicted_games['score'].iloc[i]
        print(f"{best_predicted_games['game'].iloc[i]} - {game_score}")

#### Testing on a Random User

In [228]:
test_id = df_play["userid"][0]

print("- Magic mirror on the wall, who's the fairest one of all?")
print("\n- Leave me alone and go play these games, nerd:")

print("")
sample_recommendation_user(
    MFmodel,
    interaction_matrix,
    test_id,
    user_dict,
    nrec_games=5,
)

- Magic mirror on the wall, who's the fairest one of all?

- Leave me alone and go play these games, nerd:

Favorite Games:
The Elder Scrolls V Skyrim - 2.847
Team Fortress 2 - 2.794
Left 4 Dead 2 - 2.210
Fallout New Vegas - 2.012
Spore - 1.781

Recommended Games:
Terraria - 3.035
Borderlands 2 - 2.403
Garry's Mod - 2.133
Portal 2 - 2.127
Sid Meier's Civilization V - 2.051
