Game Dataset Build a recommender system with the given data using UBCF.

This dataset is related to the video gaming industry and a survey was conducted to build a 
recommendation engine so that the store can improve the sales of its gaming DVDs. Snapshot of the dataset is given below. Build a Recommendation Engine and suggest top selling DVDs to the store customers

In [1]:
# Let us start with importing the data on which we need to work and importing the libraries as well
import pandas as pd

game_data = pd.read_csv("game.csv")

In [2]:
game_data.shape

(5000, 3)

In [3]:
game_data.columns

Index(['userId', 'game', 'rating'], dtype='object')

In [4]:
game_data.head()

Unnamed: 0,userId,game,rating
0,3,The Legend of Zelda: Ocarina of Time,4.0
1,6,Tony Hawk's Pro Skater 2,5.0
2,8,Grand Theft Auto IV,4.0
3,10,SoulCalibur,4.0
4,11,Grand Theft Auto IV,4.5


**Data Description: Game Dataset**

userId -- User ID

game -- name of the game

rating -- Review rating of the movies by the users

We happen to notice that the data has the "game" column, which are in text format. We will have to decrypt the same using **TFIDF - "Term Frequency Inverse Document Frequency"** which will help us create a matrix of items and find the similarity matrix among the **game**.

In [5]:
# Importing the TfidfVectorizer from sklearn
from sklearn.feature_extraction.text import TfidfVectorizer

# Creating TfidfVectorizer to remove all stop words

Tfidf = TfidfVectorizer(stop_words="english")

In [6]:
# Checking for the NaN values in category
game_data["rating"].isnull().sum()


0

In [7]:
#creating tfidf matrix
tfidf_matrix = Tfidf.fit_transform(game_data.game)
tfidf_matrix.shape

(5000, 3068)

**Cosine Similarity**: Measures the cosine of the angle between two vectors. It is a judgment of orientation rather than magnitude between two vectors with respect to the origin. The cosine of 0 degrees is 1 which means the data points are similar and cosine of 90 degrees is 0 which means data points are dissimilar.

In [8]:
# To find the similarity scores we import linear_kernel from sklearn
from sklearn.metrics.pairwise import linear_kernel

In [9]:
# Creating Cosine similarity matrix, which will create the matrix of similarities 
# based on the magnitude calculated based on the cosine similarities

cos_sim_matrix = linear_kernel(tfidf_matrix, tfidf_matrix)

In [10]:
# We now create a series of the game ratings, while removing the duplicate values
game_data_index = pd.Series(game_data.index, index= game_data["userId"]).drop_duplicates()

In [11]:
game_data_index.head(10)

userId
3     0
6     1
8     2
10    3
11    4
12    5
13    6
14    7
16    8
19    9
dtype: int64

In [12]:
# Checking the same for a random movie picked up
game_data_id = game_data_index[269]

game_data_id

89

In [13]:
# We will have to create a user defined function for generating recommendations for the games as under
def get_recommendations(UserId, topN):
    
    #getting the game index sing its userid
    game_data_id = game_data_index[UserId]
    
    # Getting the pair wise similarity score for all the anime's with that
    cosine_scores = list(enumerate(cos_sim_matrix[game_data_id]))
    
    cosine_scores = sorted(cosine_scores, key = lambda x:x[1], reverse=True)
    
    cosine_scores_N = cosine_scores[0:topN+1]
    
    
    # Getting the game index 
    game_data_idx = [i[0] for i in cosine_scores_N]
    
    game_data_scores = [i[1] for i in cosine_scores_N]
    
    
    games_similar = pd.DataFrame(columns=["game", "rating"])
    
    games_similar["game"] = game_data.loc[game_data_idx, "game"]    
    
    games_similar["rating"] = game_data_scores
    
    games_similar.reset_index(inplace = True) 
    
    #games_similar.drop(["game"], axis=1, inplace=True)
    print(games_similar)


The above defined function helps us to recommend the games based on the similarity on the ratings given. The scores are calculated for n number of similar games and the recomendation for the similar games is printed out. To understand better we write the code as below.

In [14]:
# We are trying to recommend using the above defined function top 10 games 
# that stand similar in ratings as that of the game defined in the code

get_recommendations(285, topN=10)

game_data_index[285]


    index                         game    rating
0      95          Burnout 3: Takedown  1.000000
1     108          Burnout 3: Takedown  1.000000
2    4315                      Burnout  0.621807
3    4585                      Burnout  0.621807
4    1102              Burnout Legends  0.456389
5     405              Burnout Revenge  0.428381
6     496              Burnout Revenge  0.428381
7     577              Burnout Revenge  0.428381
8     654             Burnout Paradise  0.425606
9     855             Burnout Paradise  0.425606
10   2814  Burnout Paradise Remastered  0.357666


95

Hence, we see the result that clearly show the games recommended as above which match the closest in the ratings.