In [6]:
# Python User-User Collaborative Filtering Recommender System
# consider not using pivot table?

# KNeighbors-based solution
# - For each unrated gameID that a userID has,
# - Take K-Nearest Neighbors, (Experiment with different values of K)
# - Add each Neighbor's RatingTimesSimilarityWeight, divide by K
# - Predict should return the gameIDs that have the highest scores from unrated gameIDs (top N versus some threshold)

'''
1.) We have an n X m matrix consisting of the ratings of n users and m items. Each element of the matrix (i, j) 
represents how user i rated item j. Since we are working with movie ratings, each rating can be expected to be an 
integer from 1-5 (reflecting one-star ratings to five-star ratings) if user i has rated movie j, and 0 if the user 
has not rated that particular movie.

2.) For each user, we want to recommend a set of movies that they have not seen yet (the movie rating is 0). 
To do this, we will effectively use an approach that is similar to weighted K-Nearest Neighbors.

3.) For each movie j user i has not seen yet, we find the set of users U who are similar to user i and have seen 
movie j.
For each similar user u, we take u's rating of movie j and multiply it by the cosine similarity 
of user i and user u. Sum up these weighted ratings, divide by the number of users in U, and we get a
weighted average rating for the movie j.

4.) Finally, we sort the movies by their weighted average rankings. These average rankings serve as an estimate
for what the user will rate each movie. Movies with higher average rankings are more likely to be favored by the
user, so we will recommend the movies with the highest average rankings to the user.
'''

In [56]:
import numpy as np
import pandas as pd
import scipy
from sklearn.preprocessing import normalize

# Read CSV and change to pivot_table
df = pd.read_csv('inputs/boardgame-elite-users.csv')
df = df.pivot_table(index='userID', columns='gameID', values='rating')

# Fill nan with zero. Normalize ignores the zeroes
df = df.fillna(0)

In [19]:
# Test / Train Split
from sklearn.model_selection import train_test_split

train, test = train_test_split(df)

In [20]:
# Normalize User Ratings for GameIDs
normalized = normalize(train)

In [69]:
# Also try Pearson coefficient
from sklearn.metrics.pairwise import cosine_similarity

user_similarity_matrix = cosine_similarity(train)

print(user_similarity_matrix)

[[1.         0.74753812 0.73590471 ... 0.71508632 0.6813765  0.59982303]
 [0.74753812 1.         0.79633193 ... 0.72736324 0.76122329 0.71666738]
 [0.73590471 0.79633193 1.         ... 0.74098077 0.75948999 0.74296642]
 ...
 [0.71508632 0.72736324 0.74098077 ... 1.         0.7250797  0.67686769]
 [0.6813765  0.76122329 0.75948999 ... 0.7250797  1.         0.71507092]
 [0.59982303 0.71666738 0.74296642 ... 0.67686769 0.71507092 1.        ]]


In [68]:
from sklearn.neighbors import KNeighborsClassifier
