# **Content-based videogames recommendation system**

We are going to work with a dataset of videogames from [Steam](https://store.steampowered.com) shop.

This dataset is available in Kaggle. Click [here](https://www.kaggle.com/trolukovich/steam-games-complete-dataset) for get more details about this data.

## **Data pre-processing**

In [1]:
import pandas as pd
from pandas_profiling import ProfileReport

In [2]:
df = pd.read_csv("data/steam_games.csv")
df.head()

Unnamed: 0,url,types,name,desc_snippet,recent_reviews,all_reviews,release_date,developer,publisher,popular_tags,game_details,languages,achievements,genre,game_description,mature_content,minimum_requirements,recommended_requirements,original_price,discount_price
0,https://store.steampowered.com/app/379720/DOOM/,app,DOOM,Now includes all three premium DLC packs (Unto...,"Very Positive,(554),- 89% of the 554 user revi...","Very Positive,(42,550),- 92% of the 42,550 use...","May 12, 2016",id Software,"Bethesda Softworks,Bethesda Softworks","FPS,Gore,Action,Demons,Shooter,First-Person,Gr...","Single-player,Multi-player,Co-op,Steam Achieve...","English,French,Italian,German,Spanish - Spain,...",54.0,Action,"About This Game Developed by id software, the...",,"Minimum:,OS:,Windows 7/8.1/10 (64-bit versions...","Recommended:,OS:,Windows 7/8.1/10 (64-bit vers...",$19.99,$14.99
1,https://store.steampowered.com/app/578080/PLAY...,app,PLAYERUNKNOWN'S BATTLEGROUNDS,PLAYERUNKNOWN'S BATTLEGROUNDS is a battle roya...,"Mixed,(6,214),- 49% of the 6,214 user reviews ...","Mixed,(836,608),- 49% of the 836,608 user revi...","Dec 21, 2017",PUBG Corporation,"PUBG Corporation,PUBG Corporation","Survival,Shooter,Multiplayer,Battle Royale,PvP...","Multi-player,Online Multi-Player,Stats","English,Korean,Simplified Chinese,French,Germa...",37.0,"Action,Adventure,Massively Multiplayer",About This Game PLAYERUNKNOWN'S BATTLEGROUND...,Mature Content Description The developers de...,"Minimum:,Requires a 64-bit processor and opera...","Recommended:,Requires a 64-bit processor and o...",$29.99,
2,https://store.steampowered.com/app/637090/BATT...,app,BATTLETECH,Take command of your own mercenary outfit of '...,"Mixed,(166),- 54% of the 166 user reviews in t...","Mostly Positive,(7,030),- 71% of the 7,030 use...","Apr 24, 2018",Harebrained Schemes,"Paradox Interactive,Paradox Interactive","Mechs,Strategy,Turn-Based,Turn-Based Tactics,S...","Single-player,Multi-player,Online Multi-Player...","English,French,German,Russian",128.0,"Action,Adventure,Strategy",About This Game From original BATTLETECH/Mec...,,"Minimum:,Requires a 64-bit processor and opera...","Recommended:,Requires a 64-bit processor and o...",$39.99,
3,https://store.steampowered.com/app/221100/DayZ/,app,DayZ,The post-soviet country of Chernarus is struck...,"Mixed,(932),- 57% of the 932 user reviews in t...","Mixed,(167,115),- 61% of the 167,115 user revi...","Dec 13, 2018",Bohemia Interactive,"Bohemia Interactive,Bohemia Interactive","Survival,Zombies,Open World,Multiplayer,PvP,Ma...","Multi-player,Online Multi-Player,Steam Worksho...","English,French,Italian,German,Spanish - Spain,...",,"Action,Adventure,Massively Multiplayer",About This Game The post-soviet country of Ch...,,"Minimum:,OS:,Windows 7/8.1 64-bit,Processor:,I...","Recommended:,OS:,Windows 10 64-bit,Processor:,...",$44.99,
4,https://store.steampowered.com/app/8500/EVE_On...,app,EVE Online,EVE Online is a community-driven spaceship MMO...,"Mixed,(287),- 54% of the 287 user reviews in t...","Mostly Positive,(11,481),- 74% of the 11,481 u...","May 6, 2003",CCP,"CCP,CCP","Space,Massively Multiplayer,Sci-fi,Sandbox,MMO...","Multi-player,Online Multi-Player,MMO,Co-op,Onl...","English,German,Russian,French",,"Action,Free to Play,Massively Multiplayer,RPG,...",About This Game,,"Minimum:,OS:,Windows 7,Processor:,Intel Dual C...","Recommended:,OS:,Windows 10,Processor:,Intel i...",Free,


We are going to delete some columns for simplify the recommendation system development

In [3]:
cols_to_remove = ['url', 'recent_reviews', 'all_reviews', 'achievements', 'languages', 'mature_content', 'minimum_requirements', 'recommended_requirements']

df = df \
        .drop(columns=cols_to_remove) \
        .copy()
df.shape

(40833, 12)

We are going to export a profiling report in a html file

There are three types of packages in the dataset: 'app', 'bundle' and 'sub', these are the values for the 'types' column. We are going to filter only the rows with 'app' value for this variable

In [4]:
print(*df['types'].unique())
df = df[df['types'] == 'app'].copy()

app bundle sub nan


In [5]:
print(df.shape)

(38021, 12)


In the profiling report, we have see many columns with missing values. So, we are going to fill these missing values with an empty string.

In [6]:
cols_to_fill = ['desc_snippet', 'developer', 'game_description', 'game_details', 'genre', 'popular_tags', 'publisher']
for col in cols_to_fill:
    df[col] = df[col].fillna('')

We are going to review the columns with null values.

In [7]:
df.isnull().sum()

types                   0
name                   14
desc_snippet            0
release_date          367
developer               0
publisher               0
popular_tags            0
game_details            0
genre                   0
game_description        0
original_price       3022
discount_price      26275
dtype: int64

In [8]:
# Removing the rows with null values in the 'name' column
df.dropna(subset=['name'], inplace=True)

# Verifying the null values in this column
df['name'].isnull().sum()

0

We are going to work with some columns that describe to the game and will be interesting for generate recommendations. These are: 'popular_tag', 'game_details', 'genre', 'dveloper', 'publisher'. The first three are items separated by comma or only one item so, we can transform them in a list.

In [9]:
col_features = ['popular_tags', 'genre', 'game_details']

In [10]:
## Viewing an example of these features
for c in col_features:
    print('Column:', c)
    print('Value: ', df.iloc[1][c])

Column: popular_tags
Value:  Survival,Shooter,Multiplayer,Battle Royale,PvP,FPS,Third-Person Shooter,Action,Online Co-Op,Tactical,Co-op,First-Person,Early Access,Strategy,Competitive,Third Person,Team-Based,Difficult,Simulation,Stealth
Column: genre
Value:  Action,Adventure,Massively Multiplayer
Column: game_details
Value:  Multi-player,Online Multi-Player,Stats


In [11]:
for c in col_features:
    df[c] = df[c].apply(lambda x: x.split(','))

In [12]:
df.head(3)

Unnamed: 0,types,name,desc_snippet,release_date,developer,publisher,popular_tags,game_details,genre,game_description,original_price,discount_price
0,app,DOOM,Now includes all three premium DLC packs (Unto...,"May 12, 2016",id Software,"Bethesda Softworks,Bethesda Softworks","[FPS, Gore, Action, Demons, Shooter, First-Per...","[Single-player, Multi-player, Co-op, Steam Ach...",[Action],"About This Game Developed by id software, the...",$19.99,$14.99
1,app,PLAYERUNKNOWN'S BATTLEGROUNDS,PLAYERUNKNOWN'S BATTLEGROUNDS is a battle roya...,"Dec 21, 2017",PUBG Corporation,"PUBG Corporation,PUBG Corporation","[Survival, Shooter, Multiplayer, Battle Royale...","[Multi-player, Online Multi-Player, Stats]","[Action, Adventure, Massively Multiplayer]",About This Game PLAYERUNKNOWN'S BATTLEGROUND...,$29.99,
2,app,BATTLETECH,Take command of your own mercenary outfit of '...,"Apr 24, 2018",Harebrained Schemes,"Paradox Interactive,Paradox Interactive","[Mechs, Strategy, Turn-Based, Turn-Based Tacti...","[Single-player, Multi-player, Online Multi-Pla...","[Action, Adventure, Strategy]",About This Game From original BATTLETECH/Mec...,$39.99,


For each game, we are going to group all these features in a string and load them in a new column

In [13]:
def group_features(x):
    return f"{' '.join(x['popular_tags'])} {' '.join(x['genre'])} {' '.join(x['game_details'])}"

In [14]:
df['game_metadata'] = df.apply(group_features, axis=1)

In [15]:
df.iloc[0]['game_metadata']

'FPS Gore Action Demons Shooter First-Person Great Soundtrack Multiplayer Singleplayer Fast-Paced Sci-fi Horror Classic Atmospheric Difficult Blood Remake Zombies Co-op Memes Action Single-player Multi-player Co-op Steam Achievements Steam Trading Cards Partial Controller Support Steam Cloud'

## **Recommendations**

## **Recommendations based on game features**
These features will be: developer and publisher of the game, genre, top tags, details.

In [16]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [17]:
count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(df['game_metadata'].iloc[:10000])

In [18]:
cosine_sim = cosine_similarity(count_matrix, count_matrix)

In [19]:
def features_recommender(top, name, cosine_sim=cosine_sim, df=df):
    indices = pd.Series(df.index, index=df['name']).drop_duplicates()
    idx = indices[name]
    
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:top+1]
    game_indices = [i[0] for i in sim_scores]
    
    return df.iloc[game_indices]['name']

In [20]:
features_recommender(10, 'DOOM')

1294                Painkiller Hell & Damnation
2348        Serious Sam HD: The First Encounter
1458           The Typing of The Dead: Overkill
183                 Call of Duty®: Black Ops II
2283                                 F.E.A.R. 3
139                                        DUSK
9225        Red Faction Guerrilla Steam Edition
2432       Serious Sam HD: The Second Encounter
2721    Dead Island: Riptide Definitive Edition
839                         Doom 3: BFG Edition
Name: name, dtype: object

### **Recommendations based on game descriptions**

In this case, we are going to do a syntactic analysis on the game's description, the recommendations will be generated by the similarity between these descriptions

In [21]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(min_df=1, stop_words='english')
bag_of_words = vectorizer.fit_transform(df.iloc[:10000]['game_description'])

In [22]:
print('Distinct words: ', bag_of_words.shape[1])

Distinct words:  60072


In [23]:
from sklearn.metrics.pairwise import linear_kernel
cosine_sim2 = linear_kernel(bag_of_words, bag_of_words)

In [24]:
def description_recommender(top, game, cosine_sim=cosine_sim2, df=df):
    indices = pd.Series(df.index, index=df['name']).drop_duplicates()
    idx = indices[game]
    
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:top+1]
    song_indices = [i[0] for i in sim_scores]
    
    return df['name'].iloc[song_indices]

In [25]:
description_recommender(10, 'DOOM')

839             Doom 3: BFG Edition
788                        DOOM VFR
366                    DOOM Eternal
2105    DOOM 3 Resurrection of Evil
1652           Hell is Other Demons
96                    Ultimate Doom
7687       The Haunted: Hells Reach
7780                         HordeZ
8548       Hellbound: Survival Mode
8648                   Reflex Arena
Name: name, dtype: object