In [3]:
import pandas as pd
import random

In [5]:
data = pd.read_csv('/Users/jasonmatiatos/Desktop/final_data.csv')

## Aim

The aim of this file is to create a function user_to_data() that takes in the user's preferences and a dataframe and then filters through a dataframe, in order to return a subsection containing only those entries that are in line with the user's preferences.

#### Step 1: Storing user preferences into a list of strings. 

At this point of our project, from the answers given by the user on the questionnaire, we have extracted the important words containing the essence of the answer. Then, each word of such has been converted into a vactor using word embeddings, and its closest semantic equivalent from within our own list of categories and mechanisms has been found.

The important inputs from the user are the following: 

1. category/ theme
2. type of game (category2)
3. game mechanisms
4. number of players
5. playtime (game duration) in minutes

The following code takes a list of strings, each string representing one of the above points of information.

In [6]:
# preferences_list = ['category', 'catrgory2', mechanics', 'min_players', 'max_players', 'min_playtime', 'max_playtime']

# Arbitrary example for test purposes
preferences_list = ['fighting', 'strategy games', 'dice rolling', 2.0, 3.0, 0.0, 60.0]

#### Step 2: Individual functions

Then, I make four individual functions. Each one scans through our dataframe of boardgames and keeps only the subsections that aligns with the user's preference.

In [7]:
# 1: Category

def category(user, df):
    user_cat = user[0] # extract relevant string
    sub_df = df[df[user_cat]==True] # take subsection of df
    return sub_df


# 1.2: Category2

def category2(user, df):
    user_cat = user[1]
    sub_df = df[df[user_cat]==True]
    return sub_df


# 2: Mechanisms

def mechanisms(user, df):
    user_mec = user[2]
    sub_df = df[df[user_mec]==True]
    return sub_df

##### note:

For the appropriate number of players, it should be the case that the number range specified by the boardgame is (partly) within the range specified by the player.

To compare ranges in my data frame, I used the .query method as shown here: https://stackoverflow.com/questions/16341367/grabbing-selection-between-specific-dates-in-a-dataframe. 

This answer must come from old pandas documentation, so I read the documentation here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html to see that variable names must have the '@' symbol in front.

In [8]:
# 3: Number of players

def numplayers(user, df):
    
    # extract user's interval
    min_num = user[3]
    max_num = user[4]
    
    # take subsection of df
    sub_df = df.query("@min_num <= minplayers <= @max_num" or 
                      "@min_num <= maxplayers <= @max_num")
    return sub_df


# 4: Playtime

def playtime(user, df):
    
    min_time = user[5]
    max_time = user[6] 
    
    sub_df = df.query("@min_time <= playingtime <= @max_time")
    return sub_df

##### note 2:

Although I ommitted it from this file, I have tested that all 4 functions work well individually. So now we can combine all into one master function.

# Master function

In [9]:
def user_to_data(user, df):
    '''
    Takes in a list of user preferences over boardgame specifications and a dataframe of 
    boardgames. Returns a subsection of the initial dataframe, containing only the boardgames
    in line with the user\'s preferences.
    '''
    
    subdf_1 = category(user, df)
    subdf_2 = category2(user, subdf_1)
    subdf_3 = mechanisms(user, subdf_2)
    subdf_4 = numplayers(user, subdf_3)
    final_subdf = playtime(user, subdf_4)
    
    return final_subdf

# TEST
_hint: it works_

In [10]:
p = preferences_list

suitable_games = user_to_data(p, data)
suitable_games.reset_index(drop=True, inplace=True)

suitable_games.head()

Unnamed: 0.1,Unnamed: 0,name,description,image,rating,usersrated,minplayers,maxplayers,playingtime,alliances,...,korean war,fan expansion,strategy games,abstract games,family games,thematic games,customizable games,wargames,party games,children's games
0,25,Armies of Oblivion: ASL Module 12,Armies of Oblivion is the long-awaited module ...,https://cf.geekdo-images.com/original/img/-aP6...,8.43063,271,2.0,2.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1,3010,WWE Raw Deal,Comic Images\' WWE Raw Deal is a Collectible C...,https://cf.geekdo-images.com/original/img/4_l4...,6.7771,169,2.0,8.0,15.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3722,Piece o' Cake,Everyone knows the cake rules: one person cuts...,https://cf.geekdo-images.com/original/img/Jats...,6.61664,2155,2.0,5.0,20.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,6007,Get Bit!,&quot;You don\'t have to be faster than the sh...,https://cf.geekdo-images.com/original/img/J2Lx...,6.16129,5834,3.0,6.0,20.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,6656,Mehr oder Weniger,Counting quickly and snatching the correct num...,https://cf.geekdo-images.com/original/img/SQzm...,6.025,20,2.0,10.0,10.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


# User output

Out of all the boardgames that are in line with the user's desired characteristics, we want to suggest (output):

1) the highest rated 
2) the most played (the one with the most user reviews)
3) a random one, if the user wants to experiment

The random game will be selected and presented at a later stage of our code. For now, we only need to extract the names of the top rated and the most played game. For this purpose, I'll make one function for each of these, and then combine them into a master function. 

##### notes for small functions:

Function (1) is easy to perform, as the data is already sorted with respect to rating in decresing order. So all this function needs to do is return the name of the 1st entry of the inputted df.

Function (2) will sort the dataframe with respect to number of user ratings in decreasing order, and then it will return the name of the 1st entry of the resulting (sorted) dataframe.

In [11]:
# 1: Highest rated

def rated(df):
    return df.loc[0, 'name']


# 2: Most played

def played(df):
    sorted_df = df.sort_values('usersrated', ascending=False)
    sorted_df.reset_index(drop=True, inplace=True)
    return sorted_df.loc[0, 'name']

# Final Master function

In [12]:
def names(df):
    
    r = rated(df)
    p = played(df)
    
    return r, p

# TEST
_hint: it works_

In [13]:
names(suitable_games)

('Armies of Oblivion: ASL Module 12', 'Get Bit!')