# IGN Games - Recommendation System

## About Dataset

- In 20 years, the gaming industry has grown sophisticated. By exploring this dataset, one will be able to find trends about industries, compare consoles against each other, search through the most popular genres and more.
- It contains 18625 data points with various features such as release dates with different platform along with IGN scores.

In this project, a Recommendation System has been built using the cosine similarity metric, and the approach employed is Content-Based Recommendation. This recommendation system analyzes and suggests game recommendations to users based on the content and features of the games themselves. The content features used include game genres, score, score phrase, release date, platform, and URL. By calculating the cosine similarity between game content vectors represented using TF-IDF (Term Frequency-Inverse Document Frequency) features, the system identifies games that are similar in terms of their content and provides personalized recommendations to users.

Source: https://www.kaggle.com/datasets/joebeachcapital/ign-games

# Importing Libaray and Loading Data

In [37]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

In [38]:
data = pd.read_csv("https://raw.githubusercontent.com/rachanabv07/Games-Recommendation-System/main/IGN%20Game%20Dataset.csv")
data.head()

Unnamed: 0.1,Unnamed: 0,score_phrase,title,url,platform,score,genre,editors_choice,release_year,release_month,release_day
0,0,Amazing,LittleBigPlanet PS Vita,/games/littlebigplanet-vita/vita-98907,PlayStation Vita,9.0,Platformer,Y,2012,9,12
1,1,Amazing,LittleBigPlanet PS Vita -- Marvel Super Hero E...,/games/littlebigplanet-ps-vita-marvel-super-he...,PlayStation Vita,9.0,Platformer,Y,2012,9,12
2,2,Great,Splice: Tree of Life,/games/splice/ipad-141070,iPad,8.5,Puzzle,N,2012,9,12
3,3,Great,NHL 13,/games/nhl-13/xbox-360-128182,Xbox 360,8.5,Sports,N,2012,9,11
4,4,Great,NHL 13,/games/nhl-13/ps3-128181,PlayStation 3,8.5,Sports,N,2012,9,11


In [39]:
# original shape of dataset
data.shape

(18625, 11)

# Data Preprocessing

In [40]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18625 entries, 0 to 18624
Data columns (total 11 columns):
Unnamed: 0        18625 non-null int64
score_phrase      18625 non-null object
title             18625 non-null object
url               18625 non-null object
platform          18625 non-null object
score             18625 non-null float64
genre             18589 non-null object
editors_choice    18625 non-null object
release_year      18625 non-null int64
release_month     18625 non-null int64
release_day       18625 non-null int64
dtypes: float64(1), int64(4), object(6)
memory usage: 1.6+ MB


In [41]:
# Checking for null values
data.isnull().sum()

Unnamed: 0         0
score_phrase       0
title              0
url                0
platform           0
score              0
genre             36
editors_choice     0
release_year       0
release_month      0
release_day        0
dtype: int64

In [42]:
# viewing NaN rows
data_nonnull = data[data.isnull().any(axis = 1)]
data_nonnull

Unnamed: 0.1,Unnamed: 0,score_phrase,title,url,platform,score,genre,editors_choice,release_year,release_month,release_day
12,12,Good,Wild Blood,/games/wild-blood/iphone-139363,iPhone,7.0,,N,2012,9,10
113,113,Good,Retro/Grade,/games/retrograde-138590/ps3-21766,PlayStation 3,7.0,,N,2012,8,15
160,160,Good,10000000,/games/10000000/iphone-139135,iPhone,7.5,,N,2012,8,9
176,176,Okay,Colour Bind,/games/colour-bind/pc-143757,PC,6.2,,N,2012,10,15
9375,9375,Great,Duke Nukem Arena,/games/duke-nukem-arena/cell-893821,Wireless,8.0,,Y,2007,6,15
9488,9488,Okay,Rengoku,/games/rengoku/cell-924924,Wireless,6.5,,N,2007,6,26
9767,9767,Good,Super Sketcher,/games/super-sketcher/cell-874054,Wireless,7.5,,N,2007,9,14
9774,9774,Amazing,Critter Crunch,/games/critter-crunch/cell-963486,Wireless,9.0,,Y,2007,9,13
10494,10494,Awful,Clue / Mouse Trap / Perfection / Aggravation,/games/clue-mouse-trap-perfection-aggravation/...,Nintendo DS,3.5,,N,2008,1,23
11367,11367,Painful,Jeep Thrills,/games/jeep-thrills/ps2-14246598,PlayStation 2,2.0,,N,2008,8,18


In [43]:
# Dropping NaN rows
rem_na_data = data.dropna(axis=0)

In [44]:
# Checking for NaN values
rem_na_data.isnull().sum()

Unnamed: 0        0
score_phrase      0
title             0
url               0
platform          0
score             0
genre             0
editors_choice    0
release_year      0
release_month     0
release_day       0
dtype: int64

In [45]:
rem_na_data.columns

Index(['Unnamed: 0', 'score_phrase', 'title', 'url', 'platform', 'score',
       'genre', 'editors_choice', 'release_year', 'release_month',
       'release_day'],
      dtype='object')

In [46]:
# Merging Year, Month, Day columns to one column
rem_na_data['release_date'] = pd.to_datetime(rem_na_data[['release_year', 'release_month', 'release_day']].astype(str).agg('-'.join, axis=1))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [47]:
rem_na_data.head()

Unnamed: 0.1,Unnamed: 0,score_phrase,title,url,platform,score,genre,editors_choice,release_year,release_month,release_day,release_date
0,0,Amazing,LittleBigPlanet PS Vita,/games/littlebigplanet-vita/vita-98907,PlayStation Vita,9.0,Platformer,Y,2012,9,12,2012-09-12
1,1,Amazing,LittleBigPlanet PS Vita -- Marvel Super Hero E...,/games/littlebigplanet-ps-vita-marvel-super-he...,PlayStation Vita,9.0,Platformer,Y,2012,9,12,2012-09-12
2,2,Great,Splice: Tree of Life,/games/splice/ipad-141070,iPad,8.5,Puzzle,N,2012,9,12,2012-09-12
3,3,Great,NHL 13,/games/nhl-13/xbox-360-128182,Xbox 360,8.5,Sports,N,2012,9,11,2012-09-11
4,4,Great,NHL 13,/games/nhl-13/ps3-128181,PlayStation 3,8.5,Sports,N,2012,9,11,2012-09-11


In [48]:
#Dropping 
rem_na_data.drop(['Unnamed: 0','release_year', 'release_month', 'release_day'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [49]:
rem_na_data.head()

Unnamed: 0,score_phrase,title,url,platform,score,genre,editors_choice,release_date
0,Amazing,LittleBigPlanet PS Vita,/games/littlebigplanet-vita/vita-98907,PlayStation Vita,9.0,Platformer,Y,2012-09-12
1,Amazing,LittleBigPlanet PS Vita -- Marvel Super Hero E...,/games/littlebigplanet-ps-vita-marvel-super-he...,PlayStation Vita,9.0,Platformer,Y,2012-09-12
2,Great,Splice: Tree of Life,/games/splice/ipad-141070,iPad,8.5,Puzzle,N,2012-09-12
3,Great,NHL 13,/games/nhl-13/xbox-360-128182,Xbox 360,8.5,Sports,N,2012-09-11
4,Great,NHL 13,/games/nhl-13/ps3-128181,PlayStation 3,8.5,Sports,N,2012-09-11


In [50]:
rem_na_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18589 entries, 0 to 18624
Data columns (total 8 columns):
score_phrase      18589 non-null object
title             18589 non-null object
url               18589 non-null object
platform          18589 non-null object
score             18589 non-null float64
genre             18589 non-null object
editors_choice    18589 non-null object
release_date      18589 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(1), object(6)
memory usage: 1.3+ MB


In [51]:
dataset = rem_na_data

In [52]:
# dropping duplicates with refers to url column
dataset.drop_duplicates(subset='url', keep='first', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [53]:
#Shape of dataset after cleaning
dataset.shape

(18541, 8)

In [54]:
dataset.head()

Unnamed: 0,score_phrase,title,url,platform,score,genre,editors_choice,release_date
0,Amazing,LittleBigPlanet PS Vita,/games/littlebigplanet-vita/vita-98907,PlayStation Vita,9.0,Platformer,Y,2012-09-12
1,Amazing,LittleBigPlanet PS Vita -- Marvel Super Hero E...,/games/littlebigplanet-ps-vita-marvel-super-he...,PlayStation Vita,9.0,Platformer,Y,2012-09-12
2,Great,Splice: Tree of Life,/games/splice/ipad-141070,iPad,8.5,Puzzle,N,2012-09-12
3,Great,NHL 13,/games/nhl-13/xbox-360-128182,Xbox 360,8.5,Sports,N,2012-09-11
4,Great,NHL 13,/games/nhl-13/ps3-128181,PlayStation 3,8.5,Sports,N,2012-09-11


In [55]:
# Exploring unique platform names
print("Platform Names : ",dataset['platform'].unique())
print("Number of different Platform : ", len(dataset['platform'].unique()))

Platform Names :  ['PlayStation Vita' 'iPad' 'Xbox 360' 'PlayStation 3' 'Macintosh' 'PC'
 'iPhone' 'Nintendo DS' 'Nintendo 3DS' 'Android' 'Wii' 'PlayStation 4'
 'Wii U' 'Linux' 'PlayStation Portable' 'PlayStation' 'Nintendo 64'
 'Saturn' 'Lynx' 'Game Boy' 'Game Boy Color' 'NeoGeo Pocket Color'
 'Game.Com' 'Dreamcast' 'Dreamcast VMU' 'WonderSwan' 'Arcade'
 'Nintendo 64DD' 'PlayStation 2' 'WonderSwan Color' 'Game Boy Advance'
 'Xbox' 'GameCube' 'DVD / HD Video Game' 'Wireless' 'Pocket PC' 'N-Gage'
 'NES' 'iPod' 'Genesis' 'TurboGrafx-16' 'Super NES' 'NeoGeo'
 'Master System' 'Atari 5200' 'TurboGrafx-CD' 'Atari 2600' 'Sega 32X'
 'Vectrex' 'Commodore 64/128' 'Sega CD' 'Nintendo DSi' 'Windows Phone'
 'Web Games' 'Xbox One' 'Windows Surface' 'Ouya' 'New Nintendo 3DS'
 'SteamOS']
Number of different Platform :  59


In [56]:
# Exploring unique genre names
print("Genre Names : ",dataset['genre'].unique())
print("Number of different Genre : ", len(dataset['genre'].unique()))

Genre Names :  ['Platformer' 'Puzzle' 'Sports' 'Strategy' 'Fighting' 'RPG'
 'Action, Adventure' 'Adventure' 'Action' 'Action, RPG' 'Shooter' 'Music'
 'Board' 'Racing' 'Strategy, RPG' 'Racing, Action' 'Shooter, RPG'
 'Simulation' 'Action, Simulation' 'Flight, Action' 'Puzzle, Action'
 'Action, Compilation' 'Educational, Puzzle' 'Wrestling'
 'Fighting, Action' 'Productivity' 'Sports, Simulation' 'Music, Action'
 'Sports, Action' 'Party' 'Battle' 'Puzzle, Adventure' 'Puzzle, Word Game'
 'Card, Battle' 'Simulation, Adventure' 'Compilation' 'Flight' 'Pinball'
 'Hunting' 'Casino' 'Sports, Racing' 'Fighting, Compilation'
 'Flight, Simulation' 'Trivia' 'Action, Platformer' 'Other' 'Virtual Pet'
 'Music, Editor' 'Sports, Editor' 'Racing, Simulation' 'RPG, Editor'
 'Educational, Action' 'Card' 'Card, RPG' 'Wrestling, Simulation'
 'Fighting, Adventure' 'Sports, Compilation' 'RPG, Compilation'
 'Flight, Racing' 'RPG, Simulation' 'Shooter, Platformer' 'Fighting, RPG'
 'Card, Compilation' 'Hunting, 

In [57]:
# Merging the same games with different platforms
merged_df = dataset.groupby('title').agg({
    'score_phrase': 'first',          # Take the first value of 'score_phrase'
    'url': ' & '.join,                  # Join all 'url' values with '|'
    'platform': ', '.join,            # Join all 'platform' values with '|'
    'score': 'mean',                 # Calculate the mean of 'score'
    'genre': 'first',                # Take the first value of 'genre'
    'editors_choice': 'first',       # Take the first value of 'editors_choice'
    'release_date': 'first'          # Take the first value of 'release_date'
}).reset_index()

In [58]:
merged_df.tail()

Unnamed: 0,title,score_phrase,url,platform,score,genre,editors_choice,release_date
12551,kill.switch,Great,/games/killswitch/xbox-566615 & /games/killswi...,"Xbox, PlayStation 2, PC, Game Boy Advance",7.825,Shooter,N,2003-10-27
12552,realMyst,Okay,/games/realmyst/pc-15612,PC,6.5,Adventure,N,2000-11-13
12553,ruthless.com,Mediocre,/games/ruthlesscom/pc-10831,PC,5.2,Strategy,N,1999-01-19
12554,xXx,Okay,/games/xxx/gba-481995,Game Boy Advance,6.0,Action,N,2002-08-09
12555,xXx: State of the Union,Great,/games/xxx-state-of-the-union/cell-745524,Wireless,8.0,Action,Y,2005-05-03


In [59]:
# Final dataset shape
merged_df.shape

(12556, 8)

# Content Vectorization

In [60]:
# merging genre, score, score_phrase, platform, url, date to one column
merged_df['content'] = merged_df["genre"]+", "+ merged_df["score_phrase"]+", "+merged_df["score"].astype(str)+", "+merged_df["platform"]+", "+merged_df["url"]

In [61]:
merged_df.tail()

Unnamed: 0,title,score_phrase,url,platform,score,genre,editors_choice,release_date,content
12551,kill.switch,Great,/games/killswitch/xbox-566615 & /games/killswi...,"Xbox, PlayStation 2, PC, Game Boy Advance",7.825,Shooter,N,2003-10-27,"Shooter, Great, 7.825, Xbox, PlayStation 2, PC..."
12552,realMyst,Okay,/games/realmyst/pc-15612,PC,6.5,Adventure,N,2000-11-13,"Adventure, Okay, 6.5, PC, /games/realmyst/pc-1..."
12553,ruthless.com,Mediocre,/games/ruthlesscom/pc-10831,PC,5.2,Strategy,N,1999-01-19,"Strategy, Mediocre, 5.2, PC, /games/ruthlessco..."
12554,xXx,Okay,/games/xxx/gba-481995,Game Boy Advance,6.0,Action,N,2002-08-09,"Action, Okay, 6.0, Game Boy Advance, /games/xx..."
12555,xXx: State of the Union,Great,/games/xxx-state-of-the-union/cell-745524,Wireless,8.0,Action,Y,2005-05-03,"Action, Great, 8.0, Wireless, /games/xxx-state..."


In [62]:
# Use techniques like TF-IDF (Term Frequency-Inverse Document Frequency) to convert textual features into numerical vectors.
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(merged_df['content'])

# Similarity Calculation

In [63]:
#Calculating the cosine similarity between game content
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

In [64]:
# Create a reverse mapping of game titles to DataFrame indices
indices = pd.Series(merged_df.index, index=merged_df['title'])

In [65]:
indices

title
#IDARB                               0
'Splosion Man                        1
.deTuned                             2
.hack//G.U. Vol. 1: Rebirth          3
.hack//G.U. Vol. 2: Reminisce        4
                                 ...  
kill.switch                      12551
realMyst                         12552
ruthless.com                     12553
xXx                              12554
xXx: State of the Union          12555
Length: 12556, dtype: int64

# Function to get Game Recommendations

In [66]:
# Function to get game recommendations 
def get_recommendations(title, cosine_sim=cosine_sim):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx])) #similarity scores of all games with that game
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) # Sort the games based on the similarity scores
    sim_scores = sim_scores[1:11]  # Top 10 similar games (excluding the input game)
    game_indices = [i[0] for i in sim_scores]
    recommendations = merged_df.iloc[game_indices][['title','genre','score', 'score_phrase','release_date','platform', 'url']]
    recommendations['Number of Recommendations'] = range(1,len(recommendations)+1)
    
    recommendations = recommendations.rename(columns={
        'title': 'Game Names',
        'score_phrase': 'Review',
        'url': 'URL',
        'platform': 'Platform',
        'score': 'Score',
        'genre': 'Genre',
        'release_date': 'Release Date'
    })
    return recommendations


# Top 10 Game Recommendations 

In [79]:
#Randomly picking Game name for dataset
import random
for i in range(len(merged_df['title'])+50):   
    ran = int(random.random()*len(merged_df['title']))
    Game_name = merged_df['title'][ran]
merged_df.loc[merged_df['title'] == Game_name]

Unnamed: 0,title,score_phrase,url,platform,score,genre,editors_choice,release_date,content
4771,Hole In The Wall,Mediocre,/games/hole-in-the-wall-108893/xbox-360-108892,Xbox 360,5.0,"Puzzle, Action",N,2011-08-25,"Puzzle, Action, Mediocre, 5.0, Xbox 360, /game..."


In [80]:
# Top 10 recommendations based on one input game
recommendations = get_recommendations(Game_name)
print("Recommendations for", Game_name, ":")
result = pd.DataFrame(recommendations)
result

Recommendations for Hole In The Wall :


Unnamed: 0,Game Names,Genre,Score,Review,Release Date,Platform,URL,Number of Recommendations
11843,WALL-E,"Action, Adventure",6.1,Good,2008-07-03,"PlayStation Portable, Nintendo DS, PlayStation...",/games/wall-e/psp-14226081 & /games/wall-e/nds...,1
8156,Puzzle Arcade,Puzzle,5.0,Mediocre,2009-01-05,Xbox 360,/games/puzzle-arcade/xbox-360-14290789,2
12186,Word Puzzle,Puzzle,5.5,Mediocre,2007-11-07,Xbox 360,/games/word-puzzle/xbox-360-949562,3
10156,Super Contra,Action,5.7,Mediocre,2007-07-26,Xbox 360,/games/super-c/xbox-360-952133,4
3419,Exit 2,"Puzzle, Action",6.5,Okay,2009-02-25,Xbox 360,/games/exit-2/xbox-360-14318701,5
9240,Shrek-N-Roll,Puzzle,5.5,Mediocre,2007-11-15,Xbox 360,/games/shrek-n-roll/xbox-360-14214944,6
10500,Tenchu Z,"Action, Adventure",5.2,Mediocre,2007-06-19,Xbox 360,/games/tenchu-z/xbox-360-772077,7
4167,Geon: Emotions,"Puzzle, Action",6.8,Okay,2007-09-27,Xbox 360,/games/geon/xbox-360-949558,8
1055,Bejeweled 2: Deluxe,Puzzle,7.8,Good,2006-06-13,Xbox 360,/games/bejeweled-2/xbox-360-777187,9
3242,Eets: Chowdown,"Puzzle, Action",7.4,Good,2007-04-26,Xbox 360,/games/eets/xbox-360-881247,10
