# CONTENT BASED RECOMMENDATION

In the previous notebook named "explanatory-5-content-based-data-development" we have created **df_content** dictionary which contains 4 dataframes:
<br>
* **Board Game Rank**
* **one-hot boardgamecategory**
* **one-hot boardgamemechanic**
* **one-hot boardgamefamily**
columns
Now we can read this dataframe and proceed to develop content-based recommendation system.
We had a dictionaries **df_id2game** and  **df_game2id**  in item-based collaborative filtering.

In this notebook we will use these 3 dataframes.

In [3]:
# File link for df_content_dict:  https://drive.google.com/file/d/1KLH9vP9sC1LBXpi9bRPWpK1FVbhQJ4zm/view?usp=drive_link
# File link df_id2game: https://drive.google.com/file/d/1H15QwTWm3eysF4vFW-L-1jtv0L6IEF1s/view?usp=sharing
# File link df_game2id: https://drive.google.com/file/d/1IDMw7Vwr_hBklq1o_42aXZxsHcJDWEK6/view?usp=sharing
!gdown 1KLH9vP9sC1LBXpi9bRPWpK1FVbhQJ4zm
!gdown 1H15QwTWm3eysF4vFW-L-1jtv0L6IEF1s
!gdown 1IDMw7Vwr_hBklq1o_42aXZxsHcJDWEK6/view?usp=sharing

In [5]:
import numpy as np
import pandas as pd
from functools import partial
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
from zipfile import ZipFile
import pickle
from sklearn.metrics.pairwise import pairwise_distances

In [11]:
with open('df_content_dict.pkl', 'rb') as file:
    df_content_dict = pickle.load(file)
df_id2game = pd.read_csv("df_id2game.csv",index_col=0)
df_game2id = pd.read_csv("df_game2id.csv",index_col=0)

Examine data

In [12]:
df_id2game.head(2)

Unnamed: 0_level_0,game_name
game_id,Unnamed: 1_level_1
1,Die Macher
2,Dragonmaster


In [13]:
df_game2id.head(2)

Unnamed: 0_level_0,game_id
game_name,Unnamed: 1_level_1
Die Macher,1
Dragonmaster,2


In [14]:
df_content_dict.keys()

dict_keys(['rank', 'category', 'mechanic', 'family'])

## 1-Select a game

In [15]:
target_game = input("Enter a game id: (0 for random game) or game name:")
if target_game.isdigit(): # if the user enters a game id
    target_game_id = int(target_game)
    if target_game_id == 0:
        target_game_id = np.random.choice(df_content_dict["rank"].index)
elif target_game in df_game2id.index:
    target_game_id = df_game2id.loc[target_game,"game_id"]
    if not isinstance(target_game_id, int): # if there are multiple games with the same name.
        print("There are more than one game with the same name. Ids are:", target_game_id.values)
        print("Accepting the first one")
        target_game_id = target_game_id.values[0]

if target_game_id not in df_content_dict["rank"].index:
    print("Game not found")
else:
    print("Target game id = ",target_game_id," Target game name:",df_id2game.loc[target_game_id,"game_name"])

Enter a game id: (0 for random game) or game name:0
Target game id =  192934  Target game name: Colony


## 2-Calculate distances with other games

First column is Board Game Rank column
Columns 1-84 (84 inclusive) are category and the rest are mechanic columns. <br>
We can calculate distance for category and mechanics seperately. <br>
Then, we can find a final weighted distance.

In [16]:
def get_similarities(self, df, target_id, similarity_metric="jaccard"):
        return 1-pairwise_distances(np.asarray(df,dtype=bool), np.asarray(df.loc[target_id],dtype=bool).reshape(1, -1), metric=similarity_metric)

In [None]:
def calculate_distance(df_content,target_game_id,category_weight = 0.7, mechanics_weight =0.3):
    df_target = df_content.loc[target_game_id]
    df_distance_category  = abs(df_target[1:85]-df_content.iloc[:,1:85]).mean(axis=1).to_frame(name="Distance")
    # Scaled categorical distance
    df_distance_mechanics = abs(df_target[85:]-df_content.iloc[:,85:]).mean(axis=1).to_frame(name="Distance")
    # Scaled mechanics distance
    df_distance = category_weight*df_distance_category + mechanics_weight*df_distance_mechanics
    df_distance["Board_Game_Rank"]= df_content["Board_Game_Rank"]
    return df_distance

## 3-Give recommendations

Calculate distances. Sort the df by **distances** first, then by **Board Game Rank**.

In [None]:
number_of_recommendations = int(input("Enter the number of recommendations: "))
df_distance = calculate_distance(df_content_dict[],target_game_id,category_weight = 0.7, mechanics_weight =0.3)
df_distance = df_distance.sort_values(by=["Distance","Board_Game_Rank"],ascending=[""])
print(f"Recommended games: ")
df_distance.iloc[1:number_of_recommendations+1] # First is the target game itself

Enter the number of recommendations: 10
Recommended games: 


Unnamed: 0,Distance,Board_Game_Rank
267271,0.0,1626
993,0.008242,7617
73650,0.01163,6988
26961,0.01163,8024
141087,0.01163,16415
245934,0.013278,346
227515,0.014927,902
241831,0.014927,1586
272427,0.014927,1620
35761,0.014927,1870


Finally we can add names and ranks of the games while displaying results to the user.

In [None]:
df_recommendations = df_id2game.loc[ df_distance.iloc[1:number_of_recommendations+1].index ]
df_recommendations= pd.concat( (df_recommendations,df_distance.iloc[1:number_of_recommendations+1] ),axis=1 )
df_recommendations

Unnamed: 0,gameName,Distance,Board_Game_Rank
267271,Egizia: Shifting Sands,0.0,1626
993,An den Ufern des Nils,0.008242,7617
73650,Porto Carthago,0.01163,6988
26961,Moai,0.01163,8024
141087,Agora,0.01163,16415
245934,Carpe Diem,0.013278,346
227515,Riverboat,0.014927,902
241831,Reykholt,0.014927,1586
272427,Terramara,0.014927,1620
35761,Sylla,0.014927,1870
