# Social Computing/Social Gaming - Summer 2020

# Exercise Sheet 3: Collaborative Filtering with Steam Games

In this exercise, we will build a collaborative filtering recommender system using data we gather from Steam. We will use your friends list to get information about owned games for each ID, and the time each game was played.

Usually, collaborative filtering is based on some sort of rating to determine the similarity between users. However, for games, the enjoyment and a rating do not always match. Additionally, only about 10% of players actually rate the games they play, which would make for a very incomplete dataset. Therefore, the playtime will be used instead of a rating system. This has the added benefit that playtime is usually the most authentic metric of enjoyment, as players are very unlikely to spend much time on a game they don't enjoy.

## Task 3.1: Obtaining the data


**1.** Your first task is to gather the data needed to create the recommender system. Create a data structure that holds the needed information for each player and game.  
**Note:** You cannot obtain a list from your profile with the Steam API unless your profile is set to public. 

If you do not have a Steam profile, you can use the default values. 
However, we encourage you to use your own profile. 

**Hint**: To obtain the games a user owns, use this: `games = data['response']['games']`. This returns a list of games, including the playtime (in minutes) which can be retrieved like this: `playtime = game['playtime_forever']` , where game refers to an item from the list of games. 

In [None]:
#Use this if you want to work with the default IDs
import requests
import urllib
import pandas as pd
import json
from urllib.request import Request, urlopen
from pandas.io.json import json_normalize
from requests.exceptions import HTTPError

# You can replace these values with your own ID and API key
key = "CB35B8F8DCE9135DDAA3B0328FCE0103"
id = "76561198329838242"
url = "http://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key="+key+"&steamids="+id
r = requests.get(url)
data = r.json()

# Get friendslist
request = Request("http://api.steampowered.com/ISteamUser/GetFriendList/v0001/?key="+key+"&steamid="+id+"&relationship=friend")
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
friendslist = data['friendslist']
friends = friendslist['friends']

friendids =[]
tempIDs = []
for friend in friends:
    friendids.append(friend['steamid'])
    
print(len(friendids))
#get friends of friends:
x = 0
while x < len(friendids):
    friendID = friendids[x]
    request = Request("http://api.steampowered.com/ISteamUser/GetFriendList/v0001/?key="+key+"&steamid="+friendID+"&relationship=friend")
    try:
        response = urlopen(request)    
    except urllib.error.HTTPError  as e:
        print('401')
    elevations = response.read()
    try:
        data = json.loads(elevations)
    except json.JSONDecodeError:
        print('couldnt decode')
    friendslist = data['friendslist']
    friends = friendslist['friends']

    friendidsNew =[]
    for friend in friends:
        friendidsNew.append(friend['steamid'])
        
    tempIDs+=friendidsNew
    x+=1

friendids += tempIDs
friendids = list(dict.fromkeys(friendids))
friendids = list(set(friendids))
print(len(friendids))


64
401
couldnt decode
401
couldnt decode
401
couldnt decode
401
couldnt decode
401
couldnt decode
401
couldnt decode
401
couldnt decode
401
couldnt decode
401
couldnt decode
401
couldnt decode
6655


In [None]:
# Trim the list of IDs to reasonable values:
if len(friendids)>250:
    friendids = friendids[:250]    
print(len(friendids))

users_gamedicts = {} # The dictionary containing all information for every ID
gamedict = {} # A dict containing information for one player

# Get owned games of friendslist:
request = Request("http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key="+key+"&steamid="+id+"&include_appinfo=1&format=json")
try:
    response = urlopen(request)
except urllib.error.HTTPError  as e:
    print('401')
try:    
    elevations = response.read()
except json.JSONDecodeError:
    print('couldnt decode')    
data = json.loads(elevations)    
if(data['response'] and data['response']['game_count']>0):
    games = data['response']['games']
    for game in games:
        name = game['name']
        playtime = game['playtime_forever']
        if(playtime!=0):
            gamedict.update({name:playtime})
            
    users_gamedicts[int(id)]=gamedict       
    gamedict={}

assert(len(users_gamedicts)==1)
#print(users_gamedicts)
# TODO:
# Open the URL and read the json response and retrieve the games of your friends and their playtime
# Save the games into a dictionary with key=name and values=playtime
# Hint 1: You can obtain the games a user owns with data['response']['games']
# Hint 2: You can retrieve their playtime with game['playtime_forever']

# Add the dictionary to the users_gamedict       
#users_gamedicts[int(id)]=gamedict

# Do the same for the friends of your friends
for friendID in friendids:
    request = Request("http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key="+key+"&steamid="+friendID+"&include_appinfo=1&format=json")
    try:
        response = urlopen(request)
    except urllib.error.HTTPError  as e:
        print('401')
    try:    
        elevations = response.read()
    except json.JSONDecodeError:
        print('couldnt decode')    
    data = json.loads(elevations)
    
    if(data['response'] and data['response']['game_count']>0):
        games = data['response']['games']
        for game in games:
            name = game['name']
            playtime = game['playtime_forever']
            if(playtime!=0):
                gamedict.update({name:playtime})
                
        users_gamedicts[friendID]=gamedict
        gamedict={}

assert(len(users_gamedicts)<=250)


250
1
{76561198329838242: {'Path of Exile': 16353, 'Europa Universalis IV': 113452, 'Titan Quest Anniversary Edition': 10354, 'Black Desert Online': 3697, 'Crusader Kings II': 5896}}


## Task 3.2: Association rule mining

Before we start with the "real" recommender system, let us take a look at a more general form of recommending items using association rules.

The concept of association rule mining is rather simple: Looking at an itemset, one tries to find dependencies between items that could "belong together". A common example would be buying food at the store: If, for example, meat and salt are bought together often, but meat without salt not that often, it is assumed that there is a connection between those two. For games, if it was found that most of the users who own the demo version of a game also own the full version of that game, it would be a reasonable assumption that these users liked the demo and therefore bought the full version.


Let us first cover the mathematical basis for association rules. The most important metrics used are **support**,  **confidence** and **lift**. The first is defined as the amount of times an item occurs in the itemset divided by the total number of items in the set; the second is defined as the support of a list of items [x,y,...] divided by the support of x. Lift is a measure describing the correlation between items. Written down mathematically:

$$supp(x)= \frac{len(x)}{len(n)}$$

$$conf(x=>y) = \frac{supp(x,y)}{supp(x)}$$

$$lift(x=>y) = \frac{P(x \cap y)}{P(x) * P(y)}$$



It is important to note that support refers to an item or a list of items, while confidence refers to a rule. Also note that a lift of 1 means that x and y occur independently of each other, while a lift greater 1 means a positive correlation.


**1.** Your task here is to first convert the dictionary you created into a list of lists as this is the input required for the algorithm to work. Then, print out the most frequent items using the `min_support` attribute. Finally, print out the association rules and play around with the threshold value to get a reasonable amount of rules. 


**2.** Discuss your results and try to answer the following questions: What kind of recommendations can be made? What does a confidence of 1.0 mean and is it meaningful for recommending games? Can you spot a correlation between the games with the highest support and the rules with the highest confidence? How does this affect the lift?  
**Hint:** Play around with the threshold values until you get a reasonable amount (4-30) rows as output.

In [None]:
gamesofallusers = []

# TODO: Convert the gamedict to a list of lists
for key in users_gamedicts:
    values_dict = users_gamedicts[key]
    gamesofallusers.append(list(values_dict.keys()))

assert(len(gamesofallusers)==len(users_gamedicts))
    
# Remove common Steam entries that are not games:
for game in gamesofallusers:
    if 'Dota 2 Test' in game:
        game.remove('Dota 2 Test')
    if 'True Sight' in game:
        game.remove('True Sight')
    if 'True Sight: Episode 1' in game:
        game.remove('True Sight: Episode 1')
    if 'True Sight: Episode 2' in game:
        game.remove('True Sight: Episode 2')
    if 'True Sight: Episode 3' in game:
        game.remove('True Sight: Episode 3')
    if 'True Sight: The Kiev Major Grand Finals' in game:
        game.remove('True Sight: The Kiev Major Grand Finals')
    if 'True Sight: The International 2017' in game:
        game.remove('True Sight: The International 2017')
    if 'True Sight: The International 2018 Finals' in game:
        game.remove('True Sight: The International 2018 Finals') 

In [None]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

te = TransactionEncoder()
# TODO: Tinker around with the values
te_ary = te.fit(gamesofallusers).transform(gamesofallusers)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)

frequent_itemsets

Unnamed: 0,support,itemsets
0,0.746479,(Counter-Strike: Global Offensive)
1,0.408451,(Left 4 Dead 2)
2,0.464789,(PAYDAY 2)
3,0.450704,(PLAYERUNKNOWN'S BATTLEGROUNDS)
4,0.338028,(Paladins)
5,0.394366,"(Left 4 Dead 2, Counter-Strike: Global Offensive)"
6,0.43662,"(PAYDAY 2, Counter-Strike: Global Offensive)"
7,0.422535,"(Counter-Strike: Global Offensive, PLAYERUNKNO..."
8,0.323944,"(PAYDAY 2, Left 4 Dead 2)"
9,0.323944,"(PAYDAY 2, PLAYERUNKNOWN'S BATTLEGROUNDS)"


In [None]:
from mlxtend.frequent_patterns import association_rules

# TODO: Play around with the treshold value
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.75)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Left 4 Dead 2),(Counter-Strike: Global Offensive),0.408451,0.746479,0.394366,0.965517,1.293429,0.089466,7.352113
1,(PAYDAY 2),(Counter-Strike: Global Offensive),0.464789,0.746479,0.43662,0.939394,1.258433,0.089665,4.183099
2,(PLAYERUNKNOWN'S BATTLEGROUNDS),(Counter-Strike: Global Offensive),0.450704,0.746479,0.422535,0.9375,1.255896,0.086094,4.056338
3,(Left 4 Dead 2),(PAYDAY 2),0.408451,0.464789,0.323944,0.793103,1.706374,0.1341,2.586854
4,"(PAYDAY 2, Left 4 Dead 2)",(Counter-Strike: Global Offensive),0.323944,0.746479,0.309859,0.956522,1.281378,0.068042,5.830986
5,"(Left 4 Dead 2, Counter-Strike: Global Offensive)",(PAYDAY 2),0.394366,0.464789,0.309859,0.785714,1.690476,0.126562,2.497653
6,(Left 4 Dead 2),"(PAYDAY 2, Counter-Strike: Global Offensive)",0.408451,0.43662,0.309859,0.758621,1.737486,0.131522,2.334004


**TODO: Write your observations here:**
1. From the frequent_itemsets, we can conclude that out of all the friends and friends of friends of the default user I have considered, approxiamately 75% of the users own 'Conuter-Strike: Global Offensive' and more than 40% own 'PAYDAY 2'
2. From the associaltion Rules table we can see that if a user owns 'Left 4 Dead 2', then if we recommend the user 'Counter-Strike:Global Offensive' there is more than 95% chance that user will go for it.
3. A lift value of less than 1 indicates that the antecedent and consequent are independent. However, in our table we have a lift value of more than 1 for all the rules indicating higher association between the antecedent and the consequebt. The 6th rule, with 'Left 4 Dead 2' as the antecedent and 'PAYDAY 2, Counter-Strike: Global Offensive' as the consequent has a very high lift of 1.737 indicating a stronger association between antecedent and consequent. This higher value indicates that there are higher chances to buy 'PAYDAY 2, Counter-Strike: Global Offensive' when the user already owns 'Left 4 Dead 2'


## Task 3.3: The Recommender System: Similarity Score


Finally, it is time to build the recommender system. 

**1.** The first thing to do is to implement a similarity score that will be used to predict a user's playtime of an unowned game. We implement a similarity score between two users by taking the relative distance between two players. We use the following formula:

$$d(u, v) = \sum_{i~\in~\textrm{common_games}} \frac{|r_{u,i} - r_{v,i}|}{r_{v,i}}$$  

Where $u$ and $v$ are users and $r_{u,i}$ is the playtime of user $u$ for game $i$. 

You can then return the similarity with  
$$ w_{u,v} = \frac{1}{1 + d(u, v)} $$

**Note:** If no common games exist return 0.

In [None]:
# Here we will calculate the similarity score between two friends based on their common games:
def calculate_similarity(user1ID, user2ID):
    
    # TODO:
    common_games=0
    duv=0
    user1games = users_gamedicts[user1ID]
    user2games = users_gamedicts[user2ID]
    for key in user1games:
        if key in user2games:
            if(user2games[key]!=0 and user1games[key]!=0):
                common_games+=1
                diff = user1games[key] - user2games[key]
                duv+=abs(diff)/user2games[key]
    
    if(common_games==0):
        return 0
    else:
        return 1/(1+duv)

## Task 3.4: Recommender System: Predict ratings

With the similarity score calculated, we can now predict a user's playtime for games they don't own.   
**1.** First, we create a set of all games, but we delete all games that are owned by less than 3 players. The reason is simple: If only 1 or 2 players own a game, it is impossible to derive a meaningful prediction since there is not enough data. 

The predicted playtime for a game works analogous to the predicted rating of a movie/item in a conventional collaborative filtering recommender system:

$$r_{u,i} = \frac{\sum_{v \in N_i(u)} w_{u,v}r_{v,i}}{\sum_{v \in N_i(u)} w_{u,v}}$$

where 
- $r_{u,i}$ is the estimated recommendation of item $i$ for target user $u$. 
- $N_i(u)$ is the set of similar users to target user $u$ for the designated item $i$. 
- $w_{u,v}$ is the similarity score between users $u$ and $v$ (used as a weighting factor).  

**Note:** In our case, we use playtime as a recommendation measure and the set $N_i(u)$ consists of user $u$ friends list and friends of friends list. In our scenario, we do not need the index $i$ as our friends list does not change between games.

In [None]:
# List of all games that are owned by at least 1 person:
allGames = []
for user in gamesofallusers:
    for game in user:
        allGames.append(game)
        
# TODO : Create a list of games owned by at least 3 people
allGamesUnique=[]
for game in allGames:
    if allGames.count(game)>=3 and game not in allGamesUnique:
        allGamesUnique.append(game)



# Find out which games you do not own out of all games because we are only interested in recommendations for games that we do not own
def difference(allGames, usersgames):
    # TODO:
    unowned_games=[]
    for game in allGames:
        if game not in usersgames:
            unowned_games.append(game)

    return unowned_games


    
# Predict ratings based on the formula above for each unowned game
def predict_ratings():
    # TODO:
    '''Hint: Iterate over all unowned games and for each game calculate a rating based
           on your friends playtime and similarity score ''' 
    unowned_games_rec={}
    user_owned_games_dict = users_gamedicts[76561198329838242]
    user_owned_games=list(user_owned_games_dict.keys());
    user_unowned_games = difference(allGamesUnique,user_owned_games)
    for unowned_game in user_unowned_games:
        num=0
        denom=0
        rui=0
        for friend in users_gamedicts:
            friends_games_dict=users_gamedicts[friend]
            if unowned_game in friends_games_dict.keys():
                similarity = calculate_similarity(76561198329838242,friend)
                rvi = friends_games_dict[unowned_game]
                num+=similarity*rvi
                denom+=similarity
            
        
        if num==0 or denom==0:
            rui = 0
        else:
            rui = num/denom
        unowned_games_rec.update({unowned_game:rui})
    return unowned_games_rec
    
recomm = predict_ratings()
    


In [None]:
friends_dicts={}
for friend in users_gamedicts:
        friends_games_dict=users_gamedicts[friend]
        if 'Elite Dangerous' in friends_games_dict.keys():
            similarity = calculate_similarity(76561198329838242,friend)
            if similarity>0:
                #print(friend, ":",similarity)
                friends_dicts.update({friend:similarity})

user_owned_games_dict = users_gamedicts[76561198329838242]
for key in list(user_owned_games_dict.keys()):
    for k in friends_dicts.keys():
        if key in users_gamedicts[k].keys():
            games = users_gamedicts[k]
            print("Friend id:",k," Similarity:",friends_dicts[k])
            print("Common game:",key," Playtime:",games[key])
            print("Elite Dangerous",games['Elite Dangerous'])

Friend id: 76561197970901320  Similarity: 0.608196265763322
Common game: Path of Exile  Playtime: 45962
Elite Dangerous 151744
Friend id: 76561198006164915  Similarity: 0.0018348922712213306
Common game: Path of Exile  Playtime: 1017
Elite Dangerous 44
Friend id: 76561198006164915  Similarity: 0.0018348922712213306
Common game: Europa Universalis IV  Playtime: 231
Elite Dangerous 44
Friend id: 76561198006164915  Similarity: 0.0018348922712213306
Common game: Titan Quest Anniversary Edition  Playtime: 399
Elite Dangerous 44
Friend id: 76561198006164915  Similarity: 0.0018348922712213306
Common game: Black Desert Online  Playtime: 808
Elite Dangerous 44
Friend id: 76561198108575144  Similarity: 0.00016960651289009497
Common game: Crusader Kings II  Playtime: 1
Elite Dangerous 842
Friend id: 76561198006164915  Similarity: 0.0018348922712213306
Common game: Crusader Kings II  Playtime: 524
Elite Dangerous 44


## Task 3.5: Recommender System: Discussion

**1.** Sort the predicted ratings by estimated playtime (highest first) and print out the top 5 predictions for you (or the default user if you are using the default ID). 

**2.** Discuss the difference in recommendations between the collaborative filtering approach and the association rule approach. Would you consider one more accurate than the other? Why/why not?

In [None]:
# TODO 1:
print("****************Top 5 recommendations for the Default user***************************")
sorted(recomm.items(), key=lambda x:x[1], reverse=True)[:5]



****************Top 5 recommendations for the Default user***************************


[('Elite Dangerous', 151245.88997259314),
 ('Warframe', 103841.32432403602),
 ('EVE Online', 84603.90560913057),
 ('Total War: WARHAMMER II', 41312.017736558424),
 ('ARK: Survival Evolved', 23298.007448506683)]

**TODO 2: Write your observations here:**
1. Recommendations using association rule will not take into account the similarity between the users while recommending something. Rules will just be formed based on the frequently occuring items between the users. Association rule will not take into account the social context while collaborative filtering takes considers the social context while recommending games. In collaborative filtering, we take the similarity between users before recommending the playtime.
2.  I consider recommendations using collaborative filtering is more accurate in this exercise as it can give very specific recommendations. Association rule approach is more general. It will only take into account the games owned by the user for recommending something while colloborative filter will use similarity between user and his/her friends for recommendations. Association rule also cannot prdict the playtime of the user for a particular game, while colloborative filtering can as seen above.
