#### User-Based Recommender
Now that we have the normalized user-by-game matrix prepared, we can work on the recommender.

In [1]:
# Import Libraries
# Packages for work.
import pandas as pd
import numpy as np

In [2]:
# Reading in the user by game matrix.
ubg_logged = pd.read_csv('ubg_logged.csv', index_col=0)
ubg_logged.head(10)

Unnamed: 0,Portal,Tom Clancy's Ghost Recon: Advanced Warfighter,Tom Clancy's Ghost Recon,Crysis,Crysis Warhead,Left 4 Dead 2,Torchlight,Devil May Cry 4,Batman: Arkham Asylum GOTY Edition,Battlefield: Bad Company™ 2,...,HomeWork Is Crazy / 作业疯了,Chinatris,Trine 4: The Nightmare Prince,A Way Out,Battlefield 1 ™,Kingdom Rush Vengeance,雀魂麻将(MahjongSoul),Draw & Guess,Rubber Bandits: Summer Prologue,Stacklands
76561198010430483,5.905362,3.091042,3.583519,6.719013,0.0,8.852379,6.71174,6.50279,6.659294,8.329417,...,,,,,,,,,,
76561198039495811,,,,,,,,,,,...,,,,,,,,,,
76561198040564894,,,,,,7.663408,,,,,...,,,,,,,,,,
76561197994644797,6.613384,,,,,10.249486,,,,,...,,,,,,,,,,
76561198064970505,,,,,,9.295508,,,,,...,,,,,,,,,,
76561198004670799,,,,,,7.781556,,,,,...,,,,,,,,,,
76561197971034129,,,,,,9.446834,,,,,...,,,,,,,,,,
76561197962050254,,,,,,7.070724,,,,,...,,,,,,,,,,
76561198001262177,6.327937,,,,,9.731453,6.602588,,,,...,,,,,,,,,,
76561198084453258,,,,,,,,,,,...,,,,,,,,,,


### Collaborative Filtering
### 1. User-Based Collaborative Filtering

Here we need to create the user-to-user similarity matrix. <br>
Basically we are trying to find the similarity (similarity scores) between users. <br>

The technique we are using is Cosine Similarity. This technique requires that the user-by-game matrix has no missing values. <br>
We will impute / replace the NaN values with 0. This is 1 possible option for now. <br> 

Note that there are implications to this: <br>

The 0 values imputed this way should be interpreted with caution. <br>
In the original normalized user-by-game matrix that was created, the normalized values represents a measure of the user's preference towards the game by using playtime. A value of 0 referred to the lowest preference on the scale. <br>
However, the 0 values imputed this way indicates that the user does not own or did not play the game at all. The 0 values here should not be interpreted as a measure of user preference. <br>
Ultimately, we are using predicted preferences scores to find out how much a user would prefer a game, and then rank the games recommended. For this purpose, the magnitude of the predicted preference is not crucial. <br>

In [3]:
# Zero imputation
ubg_logged0 = ubg_logged.fillna(0)
ubg_logged0

Unnamed: 0,Portal,Tom Clancy's Ghost Recon: Advanced Warfighter,Tom Clancy's Ghost Recon,Crysis,Crysis Warhead,Left 4 Dead 2,Torchlight,Devil May Cry 4,Batman: Arkham Asylum GOTY Edition,Battlefield: Bad Company™ 2,...,HomeWork Is Crazy / 作业疯了,Chinatris,Trine 4: The Nightmare Prince,A Way Out,Battlefield 1 ™,Kingdom Rush Vengeance,雀魂麻将(MahjongSoul),Draw & Guess,Rubber Bandits: Summer Prologue,Stacklands
76561198010430483,5.905362,3.091042,3.583519,6.719013,0.0,8.852379,6.71174,6.50279,6.659294,8.329417,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76561198039495811,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76561198040564894,0.0,0.0,0.0,0.0,0.0,7.663408,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76561197994644797,6.613384,0.0,0.0,0.0,0.0,10.249486,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76561198064970505,0.0,0.0,0.0,0.0,0.0,9.295508,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76561198004670799,0.0,0.0,0.0,0.0,0.0,7.781556,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76561197971034129,0.0,0.0,0.0,0.0,0.0,9.446834,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76561197962050254,0.0,0.0,0.0,0.0,0.0,7.070724,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76561198001262177,6.327937,0.0,0.0,0.0,0.0,9.731453,6.602588,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76561198084453258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The user-by-game matrix with the 0 values is also known as the sparse matrix. <br>

Next, we will proceed to use Cosine Similarities to build the user-to-user similarity matrix. <br>

In [4]:
# Importing the Cosine Similarity function.
from sklearn.metrics.pairwise import cosine_similarity

In [5]:
# Applying Cosine Similarity
sim_scores = cosine_similarity(ubg_logged0) # Applying the function to get the similarity matrix. This shows the similarity scores of each user to every other user.
utu_simscores = pd.DataFrame(sim_scores, columns=ubg_logged0.index, index=ubg_logged0.index) # Organize the similarity matrix into a dataframe.

In [12]:
# Taking a look at the user-to-user similarity matrix.
utu_simscores

Unnamed: 0,76561198010430483,76561198039495811,76561198040564894,76561197994644797,76561198064970505,76561198004670799,76561197971034129,76561197962050254,76561198001262177,76561198084453258,...,76561197985630263,76561197965062542,76561197966467800,76561198024567076,76561198004803656,76561197965527053,76561197970683033,76561197991050584,76561198056613809,76561198092753159
76561198010430483,1.0,0.129335,0.39483,0.419691,0.314102,0.175471,0.298043,0.270702,0.207043,0.226741,...,0.255618,0.21536,0.226861,0.151862,0.291572,0.151318,0.14892,0.181928,0.1161,0.176264
76561198039495811,0.129335,1.0,0.104384,0.091992,0.085091,0.103077,0.164383,0.113894,0.171003,0.071887,...,0.093141,0.167892,0.094737,0.061213,0.115465,0.045938,0.249611,0.166794,0.104369,0.038414
76561198040564894,0.39483,0.104384,1.0,0.522843,0.374284,0.199818,0.247985,0.261437,0.196809,0.338028,...,0.276655,0.200921,0.23669,0.194221,0.293327,0.134457,0.204872,0.234984,0.018762,0.219477
76561197994644797,0.419691,0.091992,0.522843,1.0,0.388453,0.293204,0.28807,0.264699,0.209789,0.315848,...,0.33427,0.205372,0.311061,0.226306,0.23908,0.149216,0.127252,0.206675,0.032463,0.191225
76561198064970505,0.314102,0.085091,0.374284,0.388453,1.0,0.20287,0.226563,0.217272,0.216549,0.297112,...,0.245583,0.172909,0.231083,0.111413,0.269422,0.083683,0.164806,0.239558,0.195271,0.149556
76561198004670799,0.175471,0.103077,0.199818,0.293204,0.20287,1.0,0.327892,0.19271,0.167592,0.156765,...,0.281502,0.185336,0.119796,0.055076,0.14067,0.09458,0.145802,0.254709,0.181678,0.146084
76561197971034129,0.298043,0.164383,0.247985,0.28807,0.226563,0.327892,1.0,0.254005,0.316655,0.176879,...,0.202812,0.316244,0.309654,0.127295,0.23724,0.079125,0.188219,0.19093,0.090448,0.15519
76561197962050254,0.270702,0.113894,0.261437,0.264699,0.217272,0.19271,0.254005,1.0,0.185302,0.268615,...,0.198265,0.206947,0.232378,0.186756,0.232976,0.127808,0.196197,0.227,0.076784,0.174136
76561198001262177,0.207043,0.171003,0.196809,0.209789,0.216549,0.167592,0.316655,0.185302,1.0,0.200943,...,0.240033,0.347551,0.288745,0.19395,0.305065,0.11968,0.160662,0.25013,0.068816,0.119904
76561198084453258,0.226741,0.071887,0.338028,0.315848,0.297112,0.156765,0.176879,0.268615,0.200943,1.0,...,0.171273,0.129362,0.216895,0.1433,0.213128,0.039777,0.062643,0.238172,0.043471,0.199549


Checkpoint 1 <br>
Save the user-to-user similarity score matrix. <br>

In [14]:
utu_simscores.to_csv("utu_simscores.csv")

In [15]:
utu_simscores = pd.read_csv('utu_simscores.csv', index_col=0)

#### User-Based Collaborative Filtering
With both the user-by-game matrix, and the user-to-user similarity score matrix, we can proceed to build the recommender function, and make game recommendations. <br>

As mentioned, we are predicting the playtimes of unplayed games for a selected user. <br>
We will do so by predicting the preferences (playtimes) of unplayed games for a selected user, based on other users' preferences (playtimes) and similarity to the selected user.

An additional condition is that the recommender must not recommend games that the selected user has played before. <br>


Several things to note: <br>
- The predicted figure is the normalized hours played, not actual hours played. <br>
- The actual hours played is not important. Just know that we are using hours played as a proxy for preference. <br>

Recommendation Steps:

1. Get the selected user's similarity scores with all other users. 
2. Convert the selected user's similarity scores with all other users to weights by dividing each score by the *total* score.
3. Get the selected user's unplayed games, and all the other users' playtimes for each of those unplayed games.
4. Get the predicted playtimes (predicted preferences) of the selected user on all the unplayed games. Do so by multiplying the similarity weightages of each other user, with their playtime for each of the unplayed games.
5. With the list of predicted playtimes, filter out the top 5 games with the 5 highest predicted playtimes. These are the top 5 games we will recommend to the selected user.

##### Step 1: Get the selected user's similarity scores with all other users. In these steps, we will use Steam ID "76561198010430483" as the selected user.

In [60]:
seluser_sim = utu_simscores[["76561198010430483"]].drop(76561198010430483) # Dropping '76561198010430483' from the list. I do not need the similarity score to the same user.
seluser_sim

Unnamed: 0,76561198010430483
76561198039495811,0.129335
76561198040564894,0.39483
76561197994644797,0.419691
76561198064970505,0.314102
76561198004670799,0.175471
76561197971034129,0.298043
76561197962050254,0.270702
76561198001262177,0.207043
76561198084453258,0.226741
76561198098188285,0.302885


##### Step 2: Calculate the weights of similarity scores
##### Why do we need to calculate and assign weightages to the similarity scores?

We want to make a recommendation not just based on the most similar user. We should consider multiple similar users within the similarity matrix. This provides for a more balanced recommendation. <br>
But when we consider multiple users, we would need a way to rank how similar each other user is with the selected user, relative to all other users. We can do this by assigning weightages across users based on their similarity scores. <br>
More similar users will have higher weightages, and less similar users will have lower weightages. <br>

In [29]:
seluser_weights = seluser_sim.values/np.sum(seluser_sim.values)
seluser_weights

array([[0.0164352 ],
       [0.0501729 ],
       [0.05333212],
       [0.03991441],
       [0.02229797],
       [0.03787373],
       [0.03439939],
       [0.02630997],
       [0.02881305],
       [0.03848901],
       [0.01889802],
       [0.0473715 ],
       [0.03268899],
       [0.02906295],
       [0.02576507],
       [0.0332581 ],
       [0.02170539],
       [0.02841073],
       [0.0255064 ],
       [0.01810833],
       [0.02936714],
       [0.01799988],
       [0.02889876],
       [0.02410112],
       [0.0273696 ],
       [0.03248259],
       [0.02736681],
       [0.02882833],
       [0.01929785],
       [0.03705145],
       [0.01922874],
       [0.01892398],
       [0.0231184 ],
       [0.01475341],
       [0.0223987 ]])

In [32]:
# Step 3: Get all unplayed games for the selected user (76561198010430483), and all the other users' playtimes (preferences) for those unplayed games.

gbu_logged0 = ubg_logged0.T # .T is transpose. Transposing the dataframe so that the games are the rows and the users are the columns.
unplayedgames = gbu_logged0[gbu_logged0[76561198010430483]==0] # Filtering the dataframe to only include rows of games that the selected user has not played. This would also give us the playtimes of other users for those games.
unplayedgames = unplayedgames.drop(columns=[76561198010430483]) # Drop the selected user's own playtimes (which are all 0 anyway since he has not played those games)

In [33]:
unplayedgames

Unnamed: 0,76561198039495811,76561198040564894,76561197994644797,76561198064970505,76561198004670799,76561197971034129,76561197962050254,76561198001262177,76561198084453258,76561198098188285,...,76561197985630263,76561197965062542,76561197966467800,76561198024567076,76561198004803656,76561197965527053,76561197970683033,76561197991050584,76561198056613809,76561198092753159
Crysis Warhead,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000
Railroad Tycoon 3,3.178054,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000
Titan Quest,4.442651,0.0,0.0,0.0,0.0,0.693147,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000
Titan Quest Anniversary Edition,7.647309,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000
Total War: MEDIEVAL II - Definitive Edition,4.110874,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,4.564348,0.0,0.0,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Kingdom Rush Vengeance,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,5.973810
雀魂麻将(MahjongSoul),0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,8.984192
Draw & Guess,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,5.846439
Rubber Bandits: Summer Prologue,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,3.931826


In [34]:
# Step 4: Predict playtimes based on the aggregated (weighted) similarity across all users.
# Here, we are getting the predicted playtimes (predicted preference) of the selected user on all the unplayed games. We do so by multiplying the weighted similarity of each other user to the selected user, with the other users' playtime on those unplayed games. Then we sum up the weighted playtimes of all other users for each unplayed game, to get the predicted playtime of the selected user on each unplayed game.

unplayedgames_weighted = np.dot(unplayedgames.values, seluser_weights)
unplayedgames_predictedplaytimes = pd.DataFrame(unplayedgames_weighted, index=unplayedgames.index)
unplayedgames_predictedplaytimes

Unnamed: 0,0
Crysis Warhead,0.000000
Railroad Tycoon 3,0.052232
Titan Quest,0.099268
Titan Quest Anniversary Edition,0.331107
Total War: MEDIEVAL II - Definitive Edition,0.155330
...,...
Kingdom Rush Vengeance,0.133806
雀魂麻将(MahjongSoul),0.201234
Draw & Guess,0.130953
Rubber Bandits: Summer Prologue,0.088068


In [44]:
# Step 5: Sort by predicted playtime, and recommend the top 5 unplayed games that the selected user has a high predicted playtime for.
# Recall that playtime is a proxy for preference. So here we have the predicted preferences of the selected user on unplayed games, and we are simply recommending the top 5 games with the highest predicted preferences.

recommended_top5 = unplayedgames_predictedplaytimes.sort_values(by=0, ascending=False).head(5)
recommended_top5

Unnamed: 0,0
Borderlands 2,4.73941
Monster Hunter: World,3.764319
PUBG: BATTLEGROUNDS,3.70243
Counter-Strike: Source,3.014105
Killing Floor,2.984534


#### Putting the above steps into a function. This is the User-Based Recommender

In [49]:
def ub_recommend(steamid, n=5):
    user_sim = utu_simscores[str(steamid)].drop(steamid)
    user_weights = user_sim.values/np.sum(user_sim.values)
    gbu_logged0 = ubg_logged0.T
    unplayedgames = gbu_logged0[gbu_logged0[steamid]==0]
    unplayedgames = unplayedgames.drop(columns=[steamid])
    unplayedgames_weighted = np.dot(unplayedgames.values, user_weights)
    unplayedgames_predictedplaytimes = pd.DataFrame(unplayedgames_weighted, index=unplayedgames.index, columns=['predicted preference'])
    recommended_topn = unplayedgames_predictedplaytimes.sort_values(by="predicted preference", ascending=False).head(n)
    return recommended_topn

In [52]:
ub_recommend(76561198010430483)

Unnamed: 0,predicted preference
Borderlands 2,4.73941
Monster Hunter: World,3.764319
PUBG: BATTLEGROUNDS,3.70243
Counter-Strike: Source,3.014105
Killing Floor,2.984534


#### Applying the recommender to all users within the dataset (user-by-game matrix).

In [53]:
# Get all the user names.
users = list(ubg_logged0.index)

In [57]:
# Run the user names into the function.
ub_top5recgames = pd.DataFrame(columns=users)
for user in users:
    recommendations = ub_recommend(user)
    ub_top5recgames[user] = recommendations.index

In [58]:
ub_top5recgames

Unnamed: 0,76561198010430483,76561198039495811,76561198040564894,76561197994644797,76561198064970505,76561198004670799,76561197971034129,76561197962050254,76561198001262177,76561198084453258,...,76561197985630263,76561197965062542,76561197966467800,76561198024567076,76561198004803656,76561197965527053,76561197970683033,76561197991050584,76561198056613809,76561198092753159
0,Borderlands 2,Counter-Strike: Global Offensive,Borderlands 2,PUBG: BATTLEGROUNDS,Monster Hunter: World,Counter-Strike: Global Offensive,PUBG: BATTLEGROUNDS,Borderlands 2,Tom Clancy's Rainbow Six Siege,Left 4 Dead 2,...,PUBG: BATTLEGROUNDS,Counter-Strike: Global Offensive,Borderlands 2,Dota 2,PAYDAY 2,Dota 2,Counter-Strike: Global Offensive,Tom Clancy's Rainbow Six Siege,Dota 2,Team Fortress 2
1,Monster Hunter: World,Left 4 Dead 2,PUBG: BATTLEGROUNDS,Monster Hunter: World,PUBG: BATTLEGROUNDS,PAYDAY 2,Torchlight II,PAYDAY 2,Grand Theft Auto V,PAYDAY 2,...,Torchlight II,Tom Clancy's Rainbow Six Siege,PAYDAY 2,Borderlands 2,Monster Hunter: World,Borderlands 2,Borderlands 2,Torchlight II,Team Fortress 2,Left 4 Dead 2
2,PUBG: BATTLEGROUNDS,PUBG: BATTLEGROUNDS,Monster Hunter: World,Alien Swarm,Tom Clancy's Rainbow Six Siege,PUBG: BATTLEGROUNDS,Tom Clancy's Rainbow Six Siege,Tom Clancy's Rainbow Six Siege,The Witcher 3: Wild Hunt,Borderlands 2,...,Counter-Strike: Source,Apex Legends,Monster Hunter: World,PAYDAY 2,PUBG: BATTLEGROUNDS,PAYDAY 2,PAYDAY 2,Killing Floor,Left 4 Dead 2,PAYDAY 2
3,Counter-Strike: Source,Counter-Strike: Source,Alien Swarm,Torchlight II,Alien Swarm,Alien Swarm,Path of Exile,Killing Floor 2,Stardew Valley,Tom Clancy's Rainbow Six Siege,...,Killing Floor 2,Counter-Strike: Source,Torchlight II,PUBG: BATTLEGROUNDS,Killing Floor 2,Tom Clancy's Rainbow Six Siege,PUBG: BATTLEGROUNDS,Hades,Counter-Strike: Global Offensive,Borderlands 2
4,Killing Floor,Tom Clancy's Rainbow Six Siege,Killing Floor 2,Killing Floor,Path of Exile,Torchlight II,Apex Legends,Killing Floor,The Elder Scrolls V: Skyrim,Path of Exile,...,Grand Theft Auto V,Magicka,Tom Clancy's Rainbow Six Siege,Tom Clancy's Rainbow Six Siege,Path of Exile,Torchlight II,Warframe,Killing Floor 2,PAYDAY 2,Monster Hunter: World


In [59]:
ub_top5recgames.to_csv("ub_top5recgames.csv",index=False)