# üéÆ Game Recommender System
### A Roy Liu Data Science Capstone Project 
- roy.liu@sage.com
- roycliu@gmail.com

### ¬© Dataset Description
Dataset Source: Kaggle 
- [Stem Video Games](https://www.kaggle.com/trolukovich/steam-games-complete-dataset)
- [Stem User Behavior](https://www.kaggle.com/tamber/steam-video-games)


#### Context
Steam is the world's most popular PC Gaming hub, with over 6,000 games and a community of millions of gamers. With a massive collection that includes everything from AAA blockbusters to small indie titles, great discovery tools are a highly valuable asset for Steam. How can we make them better?

### üß∞ Import Required Libs

In [1]:
import json
import numpy as np
import pandas as pd
import re
from scipy import sparse
from sklearn.metrics.pairwise import pairwise_distances, cosine_distances, cosine_similarity

### üíæ Load & Sanity Check Dataset - Steam User Dataset

- 199,999 data rows
- 3,600 game titles
- 11,350 users
- 5 columns
    - __user_id__: User ID
    - __game_title__: Name of the steam game
    - __behavior__: behavior name (purchase/play)
    - __hours__: Hours if behavior is play, 1.0 if behavior is purchase

In [2]:
df_user_behavior = pd.read_csv("../data/steam-200k.csv")

In [3]:
df_user_behavior.shape

(199999, 5)

In [4]:
df_user_behavior.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 199999 entries, 0 to 199998
Data columns (total 5 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   151603712                   199999 non-null  int64  
 1   The Elder Scrolls V Skyrim  199999 non-null  object 
 2   purchase                    199999 non-null  object 
 3   1.0                         199999 non-null  float64
 4   0                           199999 non-null  int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 7.6+ MB


In [5]:
df_user_behavior.columns = ['user_id', 'game_title', 'behavior', 'hours', 'zero']

In [6]:
#we don't need the last column
df_user_behavior.drop(['zero'], axis='columns', inplace=True)

In [7]:
df_user_behavior.dtypes

user_id         int64
game_title     object
behavior       object
hours         float64
dtype: object

In [8]:
df_user_behavior.head().T

Unnamed: 0,0,1,2,3,4
user_id,151603712,151603712,151603712,151603712,151603712
game_title,The Elder Scrolls V Skyrim,Fallout 4,Fallout 4,Spore,Spore
behavior,play,purchase,play,purchase,play
hours,273,1,87,1,14.9


In [9]:
print(df_user_behavior['game_title'].nunique())
print(df_user_behavior['user_id'].nunique())

5155
12393


### üíæ Load & Sanity Check Dataset - Steam Game Dataset

- __url__: Url of a game
- __types__: type of package - __app, sub or bundle
- __name__: Name of a game
- __desc_snippet__: short description of a game
- __recent_reviews__: recent reviews
- __all_reviews__: all reviews
- __release_date__: release date
- __developer__: developer of a game
- __publisher__: publisher or publishers of a game
- __popular_tags__: tags
- __game_details__: details of a game
- __languages__: supported languages
- __achievements__: number of achievements
- __genre__: genre(s) of a game
- __game_description__: game description
- __mature_content__: description of mature content in a game
- __minimum_requirements__: minimum specs for a game
- __recommended_requirements__: recommended specs for a game
- __original_price__: price without discount
- __discount_price__: price with discount

In [10]:
df_game = pd.read_csv("../data/steam_games.csv")

In [11]:
df_game.shape

(40833, 20)

In [12]:
df_game.dtypes

url                          object
types                        object
name                         object
desc_snippet                 object
recent_reviews               object
all_reviews                  object
release_date                 object
developer                    object
publisher                    object
popular_tags                 object
game_details                 object
languages                    object
achievements                float64
genre                        object
game_description             object
mature_content               object
minimum_requirements         object
recommended_requirements     object
original_price               object
discount_price               object
dtype: object

In [13]:
df_game['title_str'] = df_game['name'].astype('unicode')

### ‚úÇ Trim Down  Game Title DataFrame

For this project, we only need a small portion of steam_games.csv. 

Let's copy those needed to a smaller DataFrame for the speed of process and limitation deployment

In [14]:
df_game_short = df_game[['title_str','url','types','genre','developer']]

In [15]:
df_game_short.set_index('title_str',inplace=True)

In [16]:
df_game_short.loc['EVE Online',:]

url          https://store.steampowered.com/app/8500/EVE_On...
types                                                      app
genre        Action,Free to Play,Massively Multiplayer,RPG,...
developer                                                  CCP
Name: EVE Online, dtype: object

In [17]:
df_game_short.columns

Index(['url', 'types', 'genre', 'developer'], dtype='object')

In [18]:
df_game_short.dtypes

url          object
types        object
genre        object
developer    object
dtype: object

In [19]:
df_game_short.sample(3).T

title_str,DEFCON VR,Space God,Spec Ops: The Line
url,https://store.steampowered.com/app/579040/DEFC...,https://store.steampowered.com/app/637120/Spac...,https://store.steampowered.com/app/50300/Spec_...
types,app,app,app
genre,"Indie,Strategy","Action,Casual,Indie","Action,Adventure"
developer,Introversion Software,Jellypig Games,YAGER


# üë®‚Äçüë®‚Äçüëß‚Äçüëßüë©‚Äçüë©‚Äçüëß‚Äçüëß Popularity-Based Recommender

### ü•á Most Played üïô

In [20]:
series_game_title_pop = df_user_behavior[df_user_behavior['behavior']=='play'].groupby(['game_title'])['hours'].sum()

In [21]:
print('Type:' + str(type(series_game_title_pop)))
print('==================================')
df_top10_played = pd.DataFrame(series_game_title_pop.sort_values(ascending=False)[:10])
df_top10_played

Type:<class 'pandas.core.series.Series'>


Unnamed: 0_level_0,hours
game_title,Unnamed: 1_level_1
Dota 2,981684.6
Counter-Strike Global Offensive,322771.6
Team Fortress 2,173673.3
Counter-Strike,134261.1
Sid Meier's Civilization V,99821.3
Counter-Strike Source,96075.5
The Elder Scrolls V Skyrim,70889.3
Garry's Mod,49725.3
Call of Duty Modern Warfare 2 - Multiplayer,42009.9
Left 4 Dead 2,33596.7


### üèÜ Top Sell üéÅ 

In [22]:
series_game_title_sell = df_user_behavior[df_user_behavior['behavior']=='purchase'].groupby(['game_title'])['hours'].sum()

In [23]:
print('Type:' + str(type(series_game_title_pop)))
print('==================================')
df_top10_sell = pd.DataFrame(series_game_title_sell.sort_values(ascending=False)[:10])
df_top10_sell

Type:<class 'pandas.core.series.Series'>


Unnamed: 0_level_0,hours
game_title,Unnamed: 1_level_1
Dota 2,4841.0
Team Fortress 2,2323.0
Unturned,1563.0
Counter-Strike Global Offensive,1412.0
Half-Life 2 Lost Coast,981.0
Counter-Strike Source,978.0
Left 4 Dead 2,951.0
Counter-Strike,856.0
Warframe,847.0
Half-Life 2 Deathmatch,823.0


# üõíItem-Based Collaborative Recommender

#### üí° Tech Tip:
Filtering columns can be achieved using either 
- __drop__( ) function, *or*
- __double-bracket__ selection

### Create Pivot Table
Since this is an item-based collaborative recommendar system, game title will be pivot index, user_ids will be the columns, and behavior as the rating value assuming the longer hours an user plays the higher the rate the user would give.

In [24]:
df_user_behavior = df_user_behavior[df_user_behavior['behavior'] == 'play']

In [25]:
df_user_behavior.sample(5)

Unnamed: 0,user_id,game_title,behavior,hours
69887,99355046,Dota 2,play,2.2
72720,156941467,Left 4 Dead 2,play,13.1
89536,156675178,Far Cry 2,play,2.1
29439,11403772,Deponia,play,1.4
142637,8776918,Counter-Strike Global Offensive,play,11.5


### Remove duplicate rows
Getting the following error message when building pivot table, so let we try  to identify and remove duplicates

üõë ```Index contains duplicate entries, cannot reshape```

```pivot = pd.pivot(df_user_behavior, index='game_title', columns='user_id', values='hours')```


In [26]:
duplicateRows = df_user_behavior[df_user_behavior.duplicated()]

In [27]:
duplicateRows.shape

(0, 4)

### ‚ùì No duplicates found
somehow no duplicate rows are identified
so will try other approach

### üîß Convert __game_title__ object column to string type

In [28]:
# df_user_behavior['title_str'] = df_user_behavior['game_title'].astype("|S")
df_user_behavior['title_str'] = df_user_behavior['game_title'].astype("unicode")

In [29]:
print(df_user_behavior['title_str'].dtype)
df_user_behavior['title_str']

object


0         The Elder Scrolls V Skyrim
2                          Fallout 4
4                              Spore
6                  Fallout New Vegas
8                      Left 4 Dead 2
                     ...            
199990                  Fallen Earth
199992                   Magic Duels
199994                   Titan Souls
199996    Grand Theft Auto Vice City
199998                          RUSH
Name: title_str, Length: 70489, dtype: object

In [30]:
df_user_behavior.sample(5).T

Unnamed: 0,7988,37618,149616,92711,74110
user_id,226923581,182961604,16645459,71527252,975449
game_title,Counter-Strike Global Offensive,Realm of the Mad God,The Binding of Isaac,Action! - Gameplay Recording and Streaming,Steel Storm Burning Retribution
behavior,play,play,play,play,play
hours,557,3.7,0.1,83,0.2
title_str,Counter-Strike Global Offensive,Realm of the Mad God,The Binding of Isaac,Action! - Gameplay Recording and Streaming,Steel Storm Burning Retribution


In [31]:
df_user_behavior.dtypes

user_id         int64
game_title     object
behavior       object
hours         float64
title_str      object
dtype: object

In [32]:
df_ub_short = df_user_behavior[['user_id','title_str','hours']]

In [33]:
df_ub_short

Unnamed: 0,user_id,title_str,hours
0,151603712,The Elder Scrolls V Skyrim,273.0
2,151603712,Fallout 4,87.0
4,151603712,Spore,14.9
6,151603712,Fallout New Vegas,12.1
8,151603712,Left 4 Dead 2,8.9
...,...,...,...
199990,128470551,Fallen Earth,2.4
199992,128470551,Magic Duels,2.2
199994,128470551,Titan Souls,1.5
199996,128470551,Grand Theft Auto Vice City,1.5


In [34]:
pivot_game = pd.pivot_table(df_ub_short, index='title_str', columns='user_id', values='hours')

In [35]:
pivot_game.shape

(3600, 11350)

#### ü§® pivot table is created
somehow the same error won't show up if 
- convert __game_title__ column seperately
- drop all unnecessary columns

üòí ps, althought problem is resolved, but not sure about the what is the exact solution and root cause yet..

Will spend time to drill down to the root cause before (if have sufficient time) or after the capstone project due date

In [36]:
pivot_game.sample(5)

user_id,5250,76767,86540,144736,181212,229911,298950,381543,547685,554278,...,309228590,309255941,309262440,309265377,309404240,309434439,309554670,309626088,309824202,309903146
title_str,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Eldevin,,,,,,,,,,,...,,,,,,,,,,
Battle Academy,,,,,,,,,,,...,,,,,,,,,,
Super Crate Box,,,,,,,,,,,...,,,,,,,,,,
Hamilton's Great Adventure,,,,,,,,,,,...,,,,,,,,,,
Thief 2,,,,,,,,,,,...,,,,,,,,,,


### Create Sparse Matrix

Let's now create a sparse matrix before we can calculate the cosine similarity
### üí°[What is Sparse Matrix](http://www.btechsmartclass.com/data_structures/sparse-matrix.html)
<img src="../assets/Sparse_Matrix.png">

In [37]:
sparse_pivot = sparse.csr_matrix(pivot_game.fillna(0))

In [38]:
print(type(sparse_pivot))
print(sparse_pivot[:5])

<class 'scipy.sparse.csr.csr_matrix'>
  (0, 949)	0.7
  (1, 183)	0.6
  (1, 2286)	0.3
  (1, 2571)	0.3
  (2, 1070)	2.4
  (2, 1096)	5.0
  (2, 1320)	11.2
  (2, 1545)	1.2
  (2, 2007)	0.2
  (3, 183)	0.5
  (3, 3888)	5.4
  (4, 634)	3.6


### Calculate Consine Similarity

__pairwise_distances__ will return a square matrix comparing every game with every other game in the dataset 

In [39]:
dists = pairwise_distances(sparse_pivot, metric='cosine')

In [40]:
dists

array([[0.        , 1.        , 1.        , ..., 1.        , 1.        ,
        1.        ],
       [1.        , 0.        , 1.        , ..., 1.        , 0.9960849 ,
        1.        ],
       [1.        , 1.        , 0.        , ..., 1.        , 1.        ,
        1.        ],
       ...,
       [1.        , 1.        , 1.        , ..., 0.        , 0.99744235,
        1.        ],
       [1.        , 0.9960849 , 1.        , ..., 0.99744235, 0.        ,
        0.97026777],
       [1.        , 1.        , 1.        , ..., 1.        , 0.97026777,
        0.        ]])

#### üí° Cosine Distance vs Similarity
__Distance__ is not the same as __Similarity__. For example, a similarity of 1 is a distance of 0!

Because of this, the similarity is defined as 1 - dist. To compute this, we can use cosine_similarity instead.

In [41]:
similarities = cosine_similarity(sparse_pivot)

In [42]:
similarities

array([[1.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.        , 0.        , ..., 0.        , 0.0039151 ,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 0.00255765,
        0.        ],
       [0.        , 0.0039151 , 0.        , ..., 0.00255765, 1.        ,
        0.02973223],
       [0.        , 0.        , 0.        , ..., 0.        , 0.02973223,
        1.        ]])

### ‚úî Verify that ` similarity is defined as 1 - dist` is true

In [43]:
np.all(np.isclose((1.0 - dists), similarities))

True

## üìä Create Distances DataFrame

In [44]:
recommender_df = pd.DataFrame(dists, columns=pivot_game.index, index=pivot_game.index)

In [45]:
recommender_df.head().T

title_str,007 Legends,0RBITALIS,1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby),10 Second Ninja,"10,000,000"
title_str,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
007 Legends,0.0,1.000000,1.000000,1.00000,1.0
0RBITALIS,1.0,0.000000,1.000000,0.92472,1.0
1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby),1.0,1.000000,0.000000,1.00000,1.0
10 Second Ninja,1.0,0.924720,1.000000,0.00000,1.0
10000000,1.0,1.000000,1.000000,1.00000,0.0
...,...,...,...,...,...
rymdkapsel,1.0,1.000000,1.000000,1.00000,1.0
sZone-Online,1.0,1.000000,0.998421,1.00000,1.0
the static speaks my name,1.0,1.000000,1.000000,1.00000,1.0
theHunter,1.0,0.996085,1.000000,1.00000,1.0


### üìà Evaluate Recommender Performance

In [46]:
pivot_game.index

Index(['007 Legends', '0RBITALIS',
       '1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby)',
       '10 Second Ninja', '10,000,000', '100% Orange Juice', '1000 Amps',
       '12 Labours of Hercules', '12 Labours of Hercules II The Cretan Bull',
       '12 Labours of Hercules III Girl Power',
       ...
       'rFactor', 'rFactor 2', 'realMyst', 'realMyst Masterpiece Edition',
       'resident evil 4 / biohazard 4', 'rymdkapsel', 'sZone-Online',
       'the static speaks my name', 'theHunter', 'theHunter Primal'],
      dtype='object', name='title_str', length=3600)

In [47]:
print(type(pivot_game.index))

<class 'pandas.core.indexes.base.Index'>


In [48]:
print(type(pivot_game))
pivot_game

<class 'pandas.core.frame.DataFrame'>


user_id,5250,76767,86540,144736,181212,229911,298950,381543,547685,554278,...,309228590,309255941,309262440,309265377,309404240,309434439,309554670,309626088,309824202,309903146
title_str,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
007 Legends,,,,,,,,,,,...,,,,,,,,,,
0RBITALIS,,,,,,,,,,,...,,,,,,,,,,
1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby),,,,,,,,,,,...,,,,,,,,,,
10 Second Ninja,,,,,,,,,,,...,,,,,,,,,,
10000000,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
rymdkapsel,,,,,,,,,,,...,,,,,,,,,,
sZone-Online,,,,,,,,,,,...,,,,,,,,,,
the static speaks my name,,,,,,,,,,,...,,,,,,,,,,
theHunter,,,,,,,,,0.2,,...,,,,,,,,,,


In [49]:
q = 'city'
titles = pivot_game[pivot_game.index.str.contains(q, flags=re.IGNORECASE)].index
genre = 'Adventure'
print('Keyword search result:')
for title in titles:
    try:
        item = df_game_short.loc[title,['url','types','genre']]
#           if not item.empty:
        if not item.empty and genre.lower() in item[-1].lower():
#             print(item[-1])
            print(item)
            print('-----------------------------')
    except KeyError:
        pass
print('-----------------------------')

Keyword search result:
url      https://store.steampowered.com/app/271820/Card...
types                                                  app
genre                               Adventure,Casual,Indie
Name: Card City Nights, dtype: object
-----------------------------
url      https://store.steampowered.com/app/205020/Lumi...
types                                                  app
genre                               Adventure,Casual,Indie
Name: Lumino City, dtype: object
-----------------------------
url      https://store.steampowered.com/app/326180/Sini...
types                                                  app
genre                               Adventure,Casual,Indie
Name: Sinister City, dtype: object
-----------------------------
-----------------------------


In [50]:
def similarGame(q):
    titles = pivot_game[pivot_game.index.str.contains(q, flags=re.IGNORECASE)].index
    for title in titles:
        print(title)
        print('Average hours played per player', round(pivot_game.loc[title, :].mean(), 2))
        print('Number of players', pivot_game.T[title].count())
        print('')
        print('10 closest games - ranked by similarity distance')
        print(recommender_df[title].sort_values()[1:11])
        print('')
        print('***************************************************************')
        print('')
    print('done....')


In [51]:
q = 'maze' #'Ball', 'Battle', 'Night', 'City', 'NBA', 'Hockey', 'treasure'
similarGame(q)

Famaze
Average hours played per player 0.35
Number of players 2

10 closest games - ranked by similarity distance
title_str
Heavy Bullets                            0.200367
Kung Fury                                0.319688
Escape Machines                          0.400000
Infect and Destroy                       0.400000
Sunrider Academy                         0.400000
Anomaly Warzone Earth Mobile Campaign    0.400000
Spooky Cats                              0.400000
Taxi                                     0.400000
GamersGoMakers                           0.400000
Buzz Aldrin's Space Program Manager      0.400000
Name: Famaze, dtype: float64

***************************************************************

Fatty Maze's Adventures
Average hours played per player 6.2
Number of players 1

10 closest games - ranked by similarity distance
title_str
Shiny The Firefly              0.0
Fatty Maze's Adventures        0.0
Stonerid                       0.0
Amazing Princess Sarah         0.0
S

## üíΩ Save to CSV for Remote Deployment

üóúAfter performance is verfied, we will now save DataFrame into csv to be protable for depolyment.

In [52]:
# compression_opts = dict(method='zip', archive_name='recommender_df.csv')  
# recommender_df.to_csv('../data/recommender_df.zip', compression=compression_opts) 

In [53]:
# print(type(recommender_df))

In [54]:
# with open('../data/recommender_df.json', 'w') as fp:
#     json.dump(recommender_df, fp)

### üõ† Merge User Behavior & Game Tiltle DataFrame

üößThis section is only an experiment to see if I can minimize the data upload to deployment target, e.g. Heroku, by merging multiple DataFrame together.
Somehow slice() does not work for a function only need partial of the DataFrame given the column names in pivot table.

üöëNeed further investigation to figure out how.

In [55]:
pivot_game.sample()

user_id,5250,76767,86540,144736,181212,229911,298950,381543,547685,554278,...,309228590,309255941,309262440,309265377,309404240,309434439,309554670,309626088,309824202,309903146
title_str,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Reign Of Kings,,,,,,,,,,,...,,,,,,,,,,


In [56]:
df_game_short.sample()

Unnamed: 0_level_0,url,types,genre,developer
title_str,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Cash Grab - Hat Pack,https://store.steampowered.com/app/817910/Cash...,app,"Action,Casual,Indie,Massively Multiplayer",Greedy Developer


In [57]:
df_merge = df_game_short.merge(pivot_game, left_index=True, right_index=True)
# df_merge = pivot_game.merge(df_game_short, left_index=True, right_index=True)
# df_merge.set_index('title_str',inplace=True)
# pivot_game.reset_index().merge(df_game_short, how="left").set_index('index')
# df_game_short['copy_index'] = df_game_short['title_str'] 
# pivot_game['copy_index'] = pivot_game.index
# df_merged = pivot_game.merge(df_game_short, how='left', on='copy_index')

In [58]:
df_merge.sample()

Unnamed: 0_level_0,url,types,genre,developer,5250,76767,86540,144736,181212,229911,...,309228590,309255941,309262440,309265377,309404240,309434439,309554670,309626088,309824202,309903146
title_str,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Contrast,https://store.steampowered.com/app/224460/Cont...,app,"Adventure,Indie",Compulsion Games,,,,,,,...,,,,,,,,,,


In [59]:
def newSimilarGame(q):
    titles = df_merge[df_merge.index.str.contains(q, flags=re.IGNORECASE)].index
    for title in titles:
        print(title)
        print(df_merge.loc[title,['genre','url']])
        print('Average hours played per player', round(pivot_game.loc[title,:].mean(), 2))
        print('Number of players', pivot_game.T[title].count())
        print('')
        print('10 closest games - ranked by similarity Distance')
        print(recommender_df[title].sort_values()[1:11])
        print('')
        print('***************************************************************')
#         print('')
    print('done....')

In [60]:
# def recommendTitles(title):
#     titles = recommender_df[title].sort_values()[1:20]
#     counter = 0
#     for title in titles:
        

In [61]:
q = 'maze' #'Ball', 'Battle', 'Night', 'City', 'NBA', 'Hockey', 'treasure'
newSimilarGame(q)

Famaze
genre               Casual,Free to Play,Indie,RPG,Strategy
url      https://store.steampowered.com/app/297210/Famaze/
Name: Famaze, dtype: object
Average hours played per player 0.35
Number of players 2

10 closest games - ranked by similarity Distance
title_str
Heavy Bullets                            0.200367
Kung Fury                                0.319688
Escape Machines                          0.400000
Infect and Destroy                       0.400000
Sunrider Academy                         0.400000
Anomaly Warzone Earth Mobile Campaign    0.400000
Spooky Cats                              0.400000
Taxi                                     0.400000
GamersGoMakers                           0.400000
Buzz Aldrin's Space Program Manager      0.400000
Name: Famaze, dtype: float64

***************************************************************
Fatty Maze's Adventures
genre                    Adventure,Casual,Indie,Simulation
url      https://store.steampowered.com/app/349780/Fa