# Assignment 3 - DATA.ML.360

In [184]:
%pip install pandas
%pip install surprise

import pandas as pd

You should consider upgrading via the '/Users/laurira/uni/recsys/assignment3/a3_venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.
You should consider upgrading via the '/Users/laurira/uni/recsys/assignment3/a3_venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [185]:
# Read the data file and see how it looks like
df = pd.read_csv('u.data', sep='\t', header=None)

# Add column names and check few rows of the dataset
df.columns = ["user_id", "movie_id", "rating", "timestamp"]
df = df.drop("timestamp", axis=1)

df.head()

Unnamed: 0,user_id,movie_id,rating
0,196,242,3
1,186,302,3
2,22,377,1
3,244,51,2
4,166,346,1


### Average Weighted ratings

**Modified method for sequential recommendations**

I chose to calculate the average weighted ratings to improve the basic group recommendations. This method works by adjusting the predicted ratings for each movie with individual users average rating scores. This way each users preferences with the suggested movies are taken better into account, leading to better personalized recommendations for the group.This weighted approach to recommendations also improves the collaboration since all users preferences are taken into account.

The differences in recommendations are not very significant with small group sizes. However, when increasing the size to > 10 the recommendations start to differ more.

**These are the key steps in the method:**
- Calculate weights for each user by calculating the average from users predicted ratings.
- Adjust the original prediction with the weight.
- Normalize each movie prediciton by dividing the weighted ratings with total weights.
- Get the top 10 best scoring movies

The practical implementation below has more specific details on what is done in each step of the process.

In [186]:
# This helper function gets the sequence recommendations for the group as a parameter.
# Then those ratings are adjusted with the weighted averages, 
# and the top recommendations are returned.
def adjust_ratings(result):
    # Create a dataframe from the sequence results
    group_df = pd.DataFrame(result, columns=["user_id", "movie_id", "predicted_rating"])

    # Create a dataframe for user weights &
    # calculate the weight for each user from their average predicted rating
    user_weights = group_df.groupby('user_id')['predicted_rating'].mean().reset_index()
    user_weights.columns = ['user_id', 'weight']

    # Merge weights with the original group recommendations
    group_df = pd.merge(group_df, user_weights, on='user_id', how='left')

    # Multioply the original prediction with the user weight -> weighted rating for each item
    group_df['weighted_rating'] = group_df['predicted_rating'] * group_df['weight']

    # Divide total weighted ratings by total weights -> average rating for each tiem
    weighted_ratings = group_df.groupby('movie_id')['weighted_rating'].sum() / group_df.groupby('movie_id')['weight'].sum()

    # Get top 10 recommendations based on the weighted average ratings
    top_recommendations = weighted_ratings.sort_values(ascending=False)[:10]
    return top_recommendations

### Creating sequential group recommendations

I chose to change the basic recommendation method from Assignment 2 to Singular Value Decomposition. This is because I wanted something more accurate and faster than the previous implementation I had. SVD is used to create the predicitons. Otherwise the group recommendations are created the same way, with the change of creating them in sequences.

**Key steps:**
- Train the model sequentially
- Create group recommendations for that sequence
- Combine the sequences recommendations together

#### Singular Value Decomposition (SVD)

The initial group recommendations are created with the SVD algorithm. With this algorithm, we split the user-item-rating matrix into three matrices: a user matrix, a diagonal matrix of singular values, and an item matrix After that the recommendations for unrated movies are done by multiplying the matrices.

This article goes more in depth and helped me understand the general priciples on how SVD works:
https://gregorygundersen.com/blog/2018/12/10/svd/

Other sources used:

https://towardsdatascience.com/how-to-build-a-movie-recommendation-system-67e321339109

https://www.math3ma.com/blog/understanding-entanglement-with-svd


In [187]:
from surprise import Dataset
from surprise import Reader
from surprise import SVD

def create_group_recommendations(users, sequences):
    # Create a dataset
    reader = Reader(rating_scale=(1, 5))

    # Save recommendations from each sequence here
    group_recommendations = []

    # Loop through sequences
    for sequence in range(sequences):
        # Train the dataset for each round
        data = Dataset.load_from_df(df[['user_id', 'movie_id', 'rating']], reader)
        trainset = data.build_full_trainset()

        # Intialize the svd algorithm and train the dataset
        svd = SVD()
        svd.fit(trainset)

        # Go thorugh each user and create predictions.
        for user_id in users:
            # Get movies that user hasnt rated yet
            user_ratings = df[df['user_id'] == user_id]
            user_unrated_movies = df[~df['movie_id'].isin(user_ratings['movie_id'])]['movie_id'].unique()

            user_recommendations = []

            # Loop through the unrated movies
            for movie_id in user_unrated_movies:
                # Create a predicted score for the movie
                prediction = svd.predict(user_id, movie_id)
                # Add it to the recommendations
                user_recommendations.append((user_id, movie_id, prediction.est))

            # Retrun the top 10 best rated predictions
            user_recommendations = sorted(user_recommendations, key=lambda x: x[2], reverse=True)
            group_recommendations.extend(user_recommendations)

        # Adjust the sequence recommendations with the weighted averages
        sequence_recommendations = adjust_ratings(group_recommendations)

        # Show the current sequence recommendations to the user
        print("Recomenndations for sequence: " + str(sequence + 1))
        display(sequence_recommendations)

    return print("Recommendations generated succesfully")


### Testing the implementation

Next we check how the modified recommendation method performs by creating top 10 recommendations for 3 users in 3 sequences


In [188]:
user_list = [1, 5, 20]
sequences = 3
create_group_recommendations(user_list, 3)

Recomenndations for sequence: 1


movie_id
357    4.560872
474    4.273590
408    4.267714
318    4.214869
483    4.205073
484    4.190147
498    4.103931
603    4.100568
647    4.097499
430    4.091515
dtype: float64

Recomenndations for sequence: 2


movie_id
357    4.366959
408    4.338959
483    4.285636
169    4.160245
318    4.142754
285    4.134694
498    4.131158
474    4.129076
484    4.114929
513    4.082609
dtype: float64

Recomenndations for sequence: 3


movie_id
357    4.352143
408    4.302191
483    4.264115
474    4.154972
169    4.132826
484    4.106291
513    4.095060
511    4.094546
302    4.091496
285    4.084510
dtype: float64

Recommendations generated succesfully
