# Recommendation system
The idea of the project is to predict ratings of unwatched movies  for userId 610 in order to be able to recommend movies to this user


### Data reading

In [1]:
import pandas as pd
import numpy as np
np.random.seed(156068)
from scipy.stats import pearsonr
from IPython.display import display, HTML

df = pd.read_csv('./ratings.csv')
df

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


### Transforming the dataframe into a convinient form

In [2]:
userId = 610
transformed_df = df.pivot(index='movieId', columns='userId', values='rating')
transformed_df

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,,,,4.0,,4.5,,,,...,4.0,,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,,,,,,4.0,,4.0,,,...,,4.0,,5.0,3.5,,,2.0,,
3,4.0,,,,,5.0,,,,,...,,,,,,,,2.0,,
4,,,,,,3.0,,,,,...,,,,,,,,,,
5,,,,,,5.0,,,,,...,,,,3.0,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193581,,,,,,,,,,,...,,,,,,,,,,
193583,,,,,,,,,,,...,,,,,,,,,,
193585,,,,,,,,,,,...,,,,,,,,,,
193587,,,,,,,,,,,...,,,,,,,,,,


### Correlation finding methods
Below are the functions used to find correlations between users based on the mutual movies they have watched.

In [3]:
'''the coorelation finding method which just finds the pearson coorelation between the  given user and all other users ,
if they have at least 2 movies in common
and if neither given user nor other user ratings for common films are constant array '''
def find_correlation(transformed_df:pd.DataFrame,userId=610):
    user_column = transformed_df[userId].copy()
    user_column.dropna(inplace=True)
    correlations = dict()

    for other_user in transformed_df.drop(userId,axis=1).columns:
        # Cannot check correlations between arrays with 1 element
        common_ratings = user_column.index.intersection(transformed_df[other_user].dropna().index)
        if len(common_ratings)>1:
            common_ratings = list(common_ratings)

            # Cannot check correlation with array of all elements with the same value, since the concept of correlation does not apply there
            if user_column[common_ratings].nunique() > 1 and transformed_df[other_user][common_ratings].nunique() > 1:
                corr, _ = pearsonr(user_column[common_ratings], transformed_df[other_user][common_ratings])
                correlations[other_user] = corr
                continue

        correlations[other_user] = np.nan

    correlation_df = pd.Series(correlations)

    return correlation_df

'''the coorelation finding function as above,
 however  the coorelation is found only if the user and other user have at least threshold films in common'''
def find_correlation_with_common_films_threshold(transformed_df:pd.DataFrame,threshold=5,userId=610):
    user_column = transformed_df[userId].copy()
    user_column.dropna(inplace=True)
    correlations = dict()

    for other_user in transformed_df.drop(userId,axis=1).columns:
        # Cannot check correlations between arrays with 1 element
        common_ratings = user_column.index.intersection(transformed_df[other_user].dropna().index)
        if len(common_ratings)>threshold:
            common_ratings = list(common_ratings)
            # Cannot check correlation with array of all elements with the same value, since the concept of correlation does not apply there
            if user_column[common_ratings].nunique() > 1 and transformed_df[other_user][common_ratings].nunique() > 1:
                corr, _ = pearsonr(user_column[common_ratings], transformed_df[other_user][common_ratings])
                correlations[other_user] = corr
                continue

        correlations[other_user] = np.nan

    correlation_df = pd.Series(correlations)

    return correlation_df

### Movie recommender systems
Here, different approaches to movie recommender systems can be found.


In [4]:
def predict_scores(transformed_df: pd.DataFrame, userId=610, k=10,split = None):
    '''we find up to k most coorelated users with our userID  that have rated given movie not seen by our userID
        and calculate rating  as a weighted average
        of their ratings for this film. up to k because it may happen that there are less than k users
        with positive correlation with our userId who watched
        given movie'''
    transposed_df = transformed_df.transpose()
    user_correlations = find_correlation(transformed_df, userId)
    sorted_correlations = user_correlations.sort_values(ascending=False)
    sorted_correlations = sorted_correlations[sorted_correlations > 0]
    sorted_transposed_df = transposed_df.loc[sorted_correlations.index]
    if userId not in sorted_transposed_df.index:
        sorted_transposed_df.loc[userId] = transposed_df.loc[userId]
    predictions_df = sorted_transposed_df.copy()

    for movie in sorted_transposed_df.columns:
        ratings_for_movie = sorted_transposed_df[movie]
        if split is not None:
            if movie not in split:
                continue

        if pd.isna(ratings_for_movie[userId]):
            valid_ratings = ratings_for_movie.dropna()
            top_users_indices = sorted_correlations.index.intersection(valid_ratings.index)[:k]
            if top_users_indices.empty:
                continue
            weighted_avg = valid_ratings[top_users_indices].dot(user_correlations[top_users_indices]) / user_correlations[top_users_indices].sum()
            valid_score = round(weighted_avg * 2) / 2
            predictions_df.loc[userId, movie] = valid_score

    return predictions_df.transpose()

def predict_scores_no_baseline(
    transformed_df: pd.DataFrame,
    userId=610,
    k=10,
    split=None
):
    """
    Predict scores using top `k` most correlated users. Rating for tuple (userId, movie) predicted as mean of ratings for our userID for all movies +
    weighted deviation from the mean for given movie of the k most correlated users.
    """
    transposed_df = transformed_df.transpose()
    user_correlations = find_correlation(transformed_df, userId)
    sorted_correlations = user_correlations.sort_values(ascending=False)
    user_means = transposed_df.mean(axis=1, skipna=True)
    user_mean_target = user_means[userId]
    predictions_df = transposed_df.copy()

    for movie in transposed_df.columns:
        if split is not None and movie not in split:
            continue
        ratings_for_movie = transposed_df[movie]
        if pd.isna(ratings_for_movie[userId]):
            valid_ratings = ratings_for_movie.dropna()
            valid_users_sorted = sorted_correlations.index.intersection(valid_ratings.index)
            top_k_users = valid_users_sorted[:k]
            if len(top_k_users) == 0:
                continue
            weighted_deviation_sum = 0
            total_weight = 0

            for user in top_k_users:
                correlation = sorted_correlations[user]
                user_mean = user_means[user]
                user_rating = ratings_for_movie[user]
                deviation = user_rating - user_mean
                weighted_deviation_sum += correlation * deviation
                total_weight += abs(correlation)
            if total_weight > 0:
                weighted_deviation = weighted_deviation_sum / total_weight
                predicted_score = user_mean_target + weighted_deviation
                predictions_df.loc[userId, movie] = predicted_score
    return predictions_df.transpose()

def predict_scores_with_item_correction(
    transformed_df: pd.DataFrame,
    userId=610,
    k=10,
    border=0.8,
    split=None
):
    """
    works similar to the predictor above, only at the end it adjusts the  rating basing on the mean rating for given film in case
    the rating is close to the border of ratings
    """
    transposed_df = transformed_df.transpose()
    user_correlations = find_correlation(transformed_df, userId)
    sorted_correlations = user_correlations.sort_values(ascending=False)
    user_means = transposed_df.mean(axis=1, skipna=True)
    item_means = transformed_df.mean(axis=1, skipna=True)
    user_mean_target = user_means[userId]
    predictions_df = transposed_df.copy()

    for movie in transposed_df.columns:
        if split is not None and movie not in split:
            continue
        ratings_for_movie = transposed_df[movie]
        if pd.isna(ratings_for_movie[userId]):
            valid_ratings = ratings_for_movie.dropna()
            valid_users_sorted = sorted_correlations.index.intersection(valid_ratings.index)
            top_k_users = valid_users_sorted[:k]
            if len(top_k_users) == 0:
                continue
            weighted_deviation_sum = 0
            total_weight = 0

            for user in top_k_users:
                correlation = sorted_correlations[user]
                user_mean = user_means[user]
                user_rating = ratings_for_movie[user]
                deviation = user_rating - user_mean
                weighted_deviation_sum += correlation * deviation
                total_weight += abs(correlation)

            weighted_deviation = weighted_deviation_sum / total_weight if total_weight > 0 else 0
            item_mean = item_means[movie] if movie in item_means else 0
            if item_mean >= round(user_mean_target*2)/2 and 2*user_mean_target - int(2*user_mean_target) >= border:
                correction = 0.5
            elif item_mean <= round(user_mean_target*2)/2 and 2*user_mean_target - int(2*user_mean_target) <= (1-border):
                correction = - 0.5
            else:
                correction = 0

            predicted_score = user_mean_target + weighted_deviation + correction
            predicted_score = round(predicted_score * 2) / 2
            predicted_score = max(0.5,predicted_score)
            predictions_df.loc[userId, movie] = predicted_score

    return predictions_df.transpose()

def predict_baseline(transformed_df: pd.DataFrame, userId=610, k=1,split=None):
    """
    Predict baseline scores for movies the user has not rated.
    mean of user ratings of this user + movie ratings for this movie
    """
    predictions_df = transformed_df.copy()
    for movie in transformed_df.index:
        if split is not None and movie not in split:
            continue

        if pd.isna(transformed_df.loc[movie, userId]):
            user_ratings = transformed_df[userId].dropna()
            movie_ratings = transformed_df.loc[movie].dropna()
            total_ratings_sum = user_ratings.sum() + movie_ratings.sum()
            total_ratings_count = len(user_ratings) + len(movie_ratings)
            if total_ratings_count > 0:
                baseline_prediction = total_ratings_sum / total_ratings_count
                predictions_df.loc[movie, userId] = round(baseline_prediction * 2) / 2  # Round to nearest 0.5

    return predictions_df

def predict_scores_with_threshold(
    transformed_df: pd.DataFrame,
    userId=610,
    k=1,
    threshold=0.8,
    split=None
):
    """
    Predict scores using all correlated users above a specified threshold.
    weighted average of these users  rating for this film . Weights are the coorelation
    """
    transposed_df = transformed_df.transpose()
    user_correlations = find_correlation(transformed_df, userId)
    filtered_correlations = user_correlations[user_correlations > threshold]
    sorted_transposed_df = transposed_df.loc[filtered_correlations.index]
    if userId not in sorted_transposed_df.index:
        sorted_transposed_df.loc[userId] = transposed_df.loc[userId]

    predictions_df = sorted_transposed_df.copy()
    for movie in sorted_transposed_df.columns:
        ratings_for_movie = sorted_transposed_df[movie]
        if split is not None:
            if movie not in split:
                continue
        if pd.isna(ratings_for_movie[userId]):
            valid_ratings = ratings_for_movie.dropna()
            valid_users_indices =filtered_correlations.index.intersection(valid_ratings.index)
            if valid_users_indices.empty:
                continue
            weighted_avg = (
                valid_ratings[valid_users_indices].dot(filtered_correlations[valid_users_indices]) /
                filtered_correlations[valid_users_indices].sum()
            )

            valid_score = round(weighted_avg * 2) / 2
            predictions_df.loc[userId, movie] = valid_score
    return predictions_df.transpose()

def predict_scores_with_threshold_negative(
    transformed_df: pd.DataFrame,
    userId=610,
    k=1,
    positive_threshold=0,
    negative_threshold=0,
    split=None
):
    """
    Predict scores using all correlated users above a specified threshold for positive correlations,
    and include negatively correlated users below a specified threshold with adjusted contribution.
    """
    transposed_df = transformed_df.transpose()
    user_correlations = find_correlation(transformed_df, userId)
    positive_correlations = user_correlations[user_correlations > positive_threshold]
    negative_correlations = user_correlations[
        (user_correlations < 0) & (user_correlations.abs() > negative_threshold)
    ]
    sorted_transposed_df = transposed_df.loc[
        positive_correlations.index.union(negative_correlations.index)
    ]
    if userId not in sorted_transposed_df.index:
        sorted_transposed_df.loc[userId] = transposed_df.loc[userId]
    predictions_df = sorted_transposed_df.copy()

    for movie in sorted_transposed_df.columns:
        ratings_for_movie = sorted_transposed_df[movie]

        if split is not None:
            if movie not in split:
                continue

        if pd.isna(ratings_for_movie[userId]):
            valid_ratings = ratings_for_movie.dropna()
            positive_users = positive_correlations.index.intersection(valid_ratings.index)
            negative_users = negative_correlations.index.intersection(valid_ratings.index)
            weighted_sum = 0
            total_weight = 0
            if not positive_users.empty:
                weighted_sum += (
                    valid_ratings[positive_users].dot(positive_correlations[positive_users])
                )
                total_weight += positive_correlations[positive_users].sum()

            for neg_user in negative_users:
                neg_rating = valid_ratings[neg_user]
                if neg_rating < 2 or neg_rating > 4:
                    adjusted_rating = 6 - neg_rating
                    weight = abs(negative_correlations[neg_user])
                    weighted_sum += adjusted_rating * weight
                    total_weight += weight

            if total_weight == 0:
                continue

            weighted_avg = weighted_sum / total_weight
            valid_score = round(weighted_avg * 2) / 2
            predictions_df.loc[userId, movie] = valid_score
    return predictions_df.transpose()

def predict_scores_strict_correlation(transformed_df: pd.DataFrame, userId=610, k=10,correlation_neighbours=5,split = None):
    '''function working the same as predict_scores described above, with only difference , that coorelation are defined  only for the users
    having at least coorelation_neighbours mutual watched films with userId '''
    transposed_df = transformed_df.transpose()
    user_correlations = find_correlation_with_common_films_threshold(transformed_df,correlation_neighbours,userId)
    sorted_correlations = user_correlations.sort_values(ascending=False)
    sorted_correlations = sorted_correlations[sorted_correlations > 0]
    sorted_transposed_df = transposed_df.loc[sorted_correlations.index]
    if userId not in sorted_transposed_df.index:
        sorted_transposed_df.loc[userId] = transposed_df.loc[userId]
    predictions_df = sorted_transposed_df.copy()
    for movie in sorted_transposed_df.columns:
        if split is not None:
            if movie not in split:
                continue
        ratings_for_movie = sorted_transposed_df[movie]
        if pd.isna(ratings_for_movie[userId]):
            valid_ratings = ratings_for_movie.dropna()
            top_users_indices = sorted_correlations.index.intersection(valid_ratings.index)[:k]
            if top_users_indices.empty:
                continue
            weighted_avg = valid_ratings[top_users_indices].dot(user_correlations[top_users_indices]) / user_correlations[top_users_indices].sum()
            valid_score = round(weighted_avg * 2) / 2
            predictions_df.loc[userId, movie] = valid_score
    return predictions_df.transpose()

### Correlations between user 610 and other users

In [5]:
correlations = find_correlation(transformed_df, 610)
correlations = correlations.sort_values(ascending=False)
correlations = correlations.dropna()
correlations

576    1.000000
545    1.000000
442    1.000000
158    0.911322
92     0.903696
         ...   
54    -0.759555
388   -0.866025
383   -0.870388
194   -0.944911
250   -1.000000
Length: 598, dtype: float64

Seems that there are actually 3 users that have a perfect correlation. Let's investigate that.

In [6]:
user_ids_with_correlation_1 = correlations[correlations == 1.0].index

html = '<div style="display: flex; align-items: top; gap: 32px">'
for id in user_ids_with_correlation_1:
    user_ratings = transformed_df[id].dropna().to_frame()
    user_610_ratings = transformed_df[610].loc[user_ratings.index].to_frame()
    combined_df = pd.concat([user_ratings, user_610_ratings], axis=1)
    combined_df = combined_df.dropna()
    df_html = combined_df.to_html()
    html += df_html
html += '</div>'
display(HTML(html))

Unnamed: 0_level_0,576,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
849,1.5,3.0
1261,3.0,5.0

Unnamed: 0_level_0,545,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
3273,3.0,3.0
33794,3.5,4.0

Unnamed: 0_level_0,442,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
3752,2.0,3.5
3863,0.5,3.0


In [7]:
transformed_df[610][1261]

5.0

Looks like these rated very few movies that were also rated by user 610, but for those movies that were rated by both users, the correlation was perfect (this does not mean identical ratings, as we are using pearson correlation).

Let us find users that have at least 5 movies in common with user 610 and that have the highest correlation with that user.

In [8]:
correlations_with_treshold = find_correlation_with_common_films_threshold(transformed_df,5,610)
correlations_with_treshold = correlations_with_treshold.sort_values(ascending=False)
correlations_with_treshold.head()

120    0.879537
463    0.823180
138    0.822192
494    0.811761
13     0.796969
dtype: float64

These results look more reasonable.

### Evaluation of the performance of prediction methods
Frameworks for evaluating recommender systems are presented here. All evaluations are based on 10-fold cross-validation on user ID 610, ensuring consistent splits across every run of the evaluation methods. This guarantees that comparisons between different recommender systems are reliable.


In [9]:
def evaluate_recommendation_quality_general(
    transformed_df,
    predict_function,
    userId=610,
    k=10,
    num_splits=10,
    **predict_function_kwargs
):
    """
    General evaluation function for recommendation quality using MAE.
    """
    original_ratings = transformed_df.loc[:, userId].dropna()
    rated_movies = original_ratings.index
    splits = np.array_split(rated_movies, num_splits)
    mae_list = []
    for split in splits:
        df_with_nan = transformed_df.copy()
        df_with_nan.loc[split, userId] = np.nan
        predicted_df = predict_function(df_with_nan, userId=userId, k=k, split=split, **predict_function_kwargs)
        predicted_scores = predicted_df.loc[split, userId]
        actual_scores = original_ratings[split]
        valid_indices = ~np.isnan(predicted_scores) & ~np.isnan(actual_scores)
        mae = np.abs(predicted_scores[valid_indices] - actual_scores[valid_indices]).mean()

        mae_list.append(mae)

    return np.mean(mae_list)

def evaluate_recommendation_quality_at_p(
    transformed_df,
    predict_function,
    userId=610,
    k=10,
    p=10,
    num_splits=10,
    **predict_function_kwargs
):
    """
    Function for evaluating the Mae at p  highest rated items.
    """
    original_ratings = transformed_df.loc[:, userId].dropna()
    rated_movies = original_ratings.index
    splits = np.array_split(rated_movies, num_splits)
    mae_list = []

    for split in splits:
        df_with_nan = transformed_df.copy()
        df_with_nan.loc[split, userId] = np.nan
        predicted_df = predict_function(df_with_nan, userId=userId, k=k, split=split, **predict_function_kwargs)
        predicted_scores = predicted_df.loc[split, userId]
        actual_scores = original_ratings[split]
        predicted_scores.sort_values(ascending=False)
        predicted_scores = predicted_scores[:p]
        valid_indices = ~np.isnan(predicted_scores) & ~np.isnan(actual_scores)
        mae = np.abs(predicted_scores[valid_indices] - actual_scores[valid_indices]).mean()
        mae_list.append(mae)
    return np.mean(mae_list)

def evaluate_recommendation_quality_rmse_general(
    transformed_df,
    predict_function,
    userId=610,
    k=10,
    num_splits=10,
    **predict_function_kwargs
):
    """
    General evaluation function for recommendation quality using RMSE.
    """

    original_ratings = transformed_df.loc[:, userId].dropna()
    rated_movies = original_ratings.index
    splits = np.array_split(rated_movies, num_splits)
    rmse_list = []

    for split in splits:
        df_with_nan = transformed_df.copy()
        df_with_nan.loc[split, userId] = np.nan
        predicted_df = predict_function(df_with_nan, userId=userId, k=k, split=split, **predict_function_kwargs)
        predicted_scores = predicted_df.loc[split, userId]
        actual_scores = original_ratings[split]
        valid_indices = ~np.isnan(predicted_scores) & ~np.isnan(actual_scores)
        rmse = np.sqrt(((predicted_scores[valid_indices] - actual_scores[valid_indices]) ** 2).mean())
        rmse_list.append(rmse)
    return np.mean(rmse_list)

def evaluate_recommendation_quality_general_weighted(
    transformed_df,
    predict_function,
    weight_under=1,
    weight_over=1,
    userId=610,
    k=20,
    num_splits=10,
    **predict_function_kwargs
):
    '''function used to evalue the overestimation error and underestimation error'''
    original_ratings = transformed_df.loc[:, userId].dropna()
    rated_movies = original_ratings.index
    splits = np.array_split(rated_movies, num_splits)
    mae_list = []
    for split in splits:
        df_with_nan = transformed_df.copy()
        df_with_nan.loc[split, userId] = np.nan
        predicted_df = predict_function(df_with_nan,userId=userId, k=k, split=split,**predict_function_kwargs)
        predicted_scores = predicted_df.loc[split, userId]
        actual_scores = original_ratings[split]
        valid_indices = ~np.isnan(predicted_scores) & ~np.isnan(actual_scores)
        error_array = predicted_scores[valid_indices] - actual_scores[valid_indices]
        underpredicted_error = error_array[error_array<=0]
        overpredicted_error = error_array[error_array>0]
        underpredicted_error = np.abs(underpredicted_error)
        underpredicted_error  = underpredicted_error * weight_under
        overpredicted_error = overpredicted_error * weight_over
        weighted_mae = (sum(underpredicted_error) + sum(overpredicted_error)) / (len(underpredicted_error)*weight_under + len(overpredicted_error)*weight_over)
        mae_list.append(weighted_mae)
    return np.mean(mae_list)

### Description of testing

We tested several movie recommender systems, which are described step by step along with the conclusions drawn from the results. To evaluate the prediction accuracy of the systems, we devised several evaluation methods. All methods are based on 10-fold cross-validation on user ID 610. This involves running 10 iterations of the recommender systems, each using 9/10 of the records for user ID 610 (along with all records for other users) to predict the scores for deliberately erased data for user ID 610. We calculate the error for each iteration and take the mean error across all 10 runs.

While using data from other users could provide more insights, our task specifically focuses on predicting ratings for movies for user ID 610. Therefore, the evaluation is solely based on the algorithms' errors for this user ID. When creating the splits for 10-fold validation, we do not shuffle the data, as it is not sorted by rating for user ID 610. Consistent splits are used across evaluations to enable fair comparisons between different recommender systems.

We employ several error evaluation methods, including:

- **MAE (Mean Absolute Error):** A straightforward and easy-to-interpret measure of prediction error.
- **RMSE (Root Mean Squared Error):** A measure that penalizes larger errors more heavily, making it well-suited for this purpose.
- **Weighted MAE:** This approach accounts for the user's preferences. For instance:
  - If the user is selective and values precision in high ratings, overestimation errors may be more detrimental.
  - Conversely, if the user watches many movies and values variety, underestimating high ratings (false negatives) may be more problematic.
  To address this, we calculate the MAE with weights applied to overestimation and underestimation errors, allowing us to analyze which systems perform best for high precision or high recall in high ratings. To isolate each type of error, we use weights such as (1,0) for overestimation errors only and (0,1) for underestimation errors only.
- **MAE for Top-K Rated Movies:** This measure focuses on the K highest-rated movies. The rationale is that users may care less about ratings for movies they wouldn’t watch anyway. This is especially relevant for selective users with limited time, who prioritize avoiding low-quality movies while being less concerned about films they won’t watch.

These evaluation methods enable us to comprehensively assess the performance of the recommender systems and identify the most suitable approaches for specific user preferences and scenarios.

# Testing

##### Remark
- Although not explicitly mentioned earlier, each system rounds its final prediction to the nearest valid value. This ensures accurate predictions within the given scale, e.g., for the `ratings.csv`, the scale ranges from 0.5 to 5 in increments of 0.5.
- When referring to the correlation between users, we always mean Pearson correlation.

In [10]:
### we will store best errors, to be able later to seem which system performed best for which purpose
mae_list = []
rmse_list = []
perror_list = []
overestimate_error_list = []
underestimate_error_list = []

### Baseline predictor
First, we evaluate the performance of the baseline predictor to establish a benchmark for comparison. The baseline predictor is a simple approach that predicts the score for a given tuple (movie, user ID) by calculating the mean of all ratings for the given movie plus the mean of all ratings for the given user ID.

In [11]:
mae = evaluate_recommendation_quality_general(
    transformed_df=transformed_df,
    predict_function=predict_baseline,  # Function to predict scores
    userId=610,
    num_splits=10 # Pass the threshold as a keyword argument
)
mae_list.append(('predict_baseline',mae))
rmse = evaluate_recommendation_quality_rmse_general(
    transformed_df=transformed_df,
    predict_function=predict_baseline,  # Function to predict scores
    userId=610,
    num_splits=10 # Pass the threshold as a keyword argument
)
print(f"MAE using the baseline prediction {mae}")
print(f"RMSE using the baseline prediction {rmse}")
rmse_list.append(('predict_baseline', rmse))

p_error = evaluate_recommendation_quality_at_p(
    p=10,
    transformed_df=transformed_df,
    predict_function=predict_baseline,  # Function to predict scores
    userId=610,
    num_splits=10 # Pass the threshold as a keyword argument
)
print(f"Mae at p=10 using the baseline prediction {p_error}")
perror_list.append(('predict_baseline', p_error))

overestimate_error = evaluate_recommendation_quality_general_weighted(
    transformed_df=transformed_df,
    weight_under=0,
    weight_over=1,
    predict_function=predict_baseline,  # Function to predict scores
    userId=610,
    num_splits=10 # Pass the threshold as a keyword argument
)
print(f"Overestimate MAE is {overestimate_error} ")
overestimate_error_list.append(('predict_baseline',overestimate_error))

underestimate_error = evaluate_recommendation_quality_general_weighted(
    transformed_df=transformed_df,
    weight_under=1,
    weight_over=0,
    userId=610,
    predict_function=predict_baseline,  # Function to predict scores
    num_splits=10 # Pass the threshold as a keyword argument
)
print(f"Underestimate MAE is {underestimate_error} ")
underestimate_error_list.append(('predict_baseline',underestimate_error))

MAE using the baseline prediction 0.6701644157369347
RMSE using the baseline prediction 0.8670309338053681
Mae at p=10 using the baseline prediction 0.675
Overestimate MAE is 0.8334233572346182 
Underestimate MAE is 0.5890073252784332 


#### Conclusions
The baseline predictor does not perform terribly in terms of MAE and can serve as a computationally inexpensive starting point. However, one notable issue is that this system, at least with this dataset, is quite biased. The overestimation error is significantly higher than the underestimation error, meaning it tends to overestimate ratings.

### User correlation approaches

Now that we have a baseline to compare our more sophisticated recommender systems against, we can begin testing them. The first system sets the prediction for the tuple (user ID, movie) as a weighted average of ratings for the given movie from users who have watched the movie and had a positive correlation with the given user, with a correlation value above a certain threshold. We will test several correlation thresholds to determine which one performs the best.

In [12]:
for correlation_threshold in [0, 0.2, 0.4, 0.6]:
    mae = evaluate_recommendation_quality_general(
        transformed_df=transformed_df,
        num_splits=10,
        predict_function=predict_scores_with_threshold,
        userId=610,
        threshold=correlation_threshold
    )
    rmse = evaluate_recommendation_quality_rmse_general(
        transformed_df=transformed_df,
        predict_function=predict_scores_with_threshold,
        userId=610,
        num_splits=10,
        threshold=correlation_threshold
    )
    print(f"MAE using {correlation_threshold} positive correlation threshold {mae}")
    mae_list.append((f"MAE using {correlation_threshold}% positive correlation threshold",mae))

    print(f"RMSE using {correlation_threshold} positive correlation threshold {rmse}")
    rmse_list.append((f"RMSE using {correlation_threshold}% positive correlation threshold",rmse))

    p_error = evaluate_recommendation_quality_at_p(
        p=10,
        transformed_df=transformed_df,
        num_splits=10,
        predict_function=predict_scores_with_threshold,
        userId=610,
        threshold=correlation_threshold
    )
    print(f"Mae at p=10 using {correlation_threshold}% positive correlation threshold {p_error}")
    perror_list.append((f"MAE at p=10 using {correlation_threshold} positive correlation threshold", p_error))

    overestimate_error = evaluate_recommendation_quality_general_weighted(
        transformed_df=transformed_df,
        weight_under=1,
        weight_over=2,
        userId=610,
        num_splits=10,
        predict_function=predict_scores_with_threshold,
        threshold=correlation_threshold
    )
    print(f"Overestimate MAE is {overestimate_error} ")
    overestimate_error_list.append((f"Overestimate MAE using {correlation_threshold}% positive correlation threshold", overestimate_error))

    underestimate_error = evaluate_recommendation_quality_general_weighted(
        transformed_df=transformed_df,
        weight_under=1,
        weight_over=0,
        predict_function=predict_scores_with_threshold,
        userId=610,
        num_splits=10,
        threshold=correlation_threshold
    )
    print(f"Underestimate_error using {correlation_threshold}% positive correlation threshold {underestimate_error}")
    underestimate_error_list.append((f"p_error using {correlation_threshold}% positive correlation threshold", underestimate_error))

MAE using 0 positive correlation threshold 0.6021952781071522
RMSE using 0 positive correlation threshold 0.7939195306055609
Mae at p=10 using 0% positive correlation threshold 0.505952380952381
Overestimate MAE is 0.6211578619909098 
Underestimate_error using 0% positive correlation threshold 0.574598358638441
MAE using 0.2 positive correlation threshold 0.606241746389952
RMSE using 0.2 positive correlation threshold 0.802867162923574
Mae at p=10 using 0.2% positive correlation threshold 0.5147222222222223
Overestimate MAE is 0.619356961510738 
Underestimate_error using 0.2% positive correlation threshold 0.5873767604119139
MAE using 0.4 positive correlation threshold 0.6355510566787939
RMSE using 0.4 positive correlation threshold 0.8408924823724673
Mae at p=10 using 0.4% positive correlation threshold 0.5407936507936507
Overestimate MAE is 0.6449831749158503 
Underestimate_error using 0.4% positive correlation threshold 0.6224698281566787
MAE using 0.6 positive correlation threshold

#### Conclusions

This approach offers some advantages over the baseline method; however, the improvement in MAE is not dramatic, approximately ~10%. The RMSE decreases by an even smaller percentage. Nonetheless, the predictions are noticeably more balanced, with the difference between underestimation and overestimation errors being negligible compared to the baseline. Additionally, the error for the top 10 predicted ratings (p=10) decreases significantly, which is important in real recommender systems, as these are the movies the user is most likely to watch based on predicted ratings.

Interestingly, in nearly all metrics, setting the threshold to 0 (thereby including all positively correlated users) yielded the best results. Conversely, limiting the calculation to users with a positive correlation > 0.6 sometimes performed worse than the baseline, with a higher RMSE. This might be due to data sparsity: users with high correlation may have only a few movies (e.g., three) in common with user ID 610, making them less reliable as sources of information.

### Using negatively correlated users

Given the limited availability of data, it would be beneficial to leverage users who are negatively correlated with the current user. There is a significant number of such users, and we devised an algorithm to utilize them effectively. The approach involves using positive correlations as in the previous method but also incorporating users with strong negative correlations (i.e., users whose correlation magnitude exceeds a certain threshold).

To include negatively correlated users in the weighted average, we do not directly use their ratings. Instead, we transform their ratings using the formula `rating = 6 - rating`. This adjustment reflects the expectation that if a strongly negatively correlated user rates a film highly, the current user is likely to rate it low, and vice versa. In essence, we create a complement of the original rating.

This transformation is applied only to ratings from strongly negatively correlated users in the ranges (0,2] and [4,5). For example, if a strongly negatively correlated user rates a movie as 3, applying the formula would still result in a rating of 3 (`6 - 3 = 3`), which does not align with the expected behavior. To avoid such inconsistencies, these middle-range ratings are excluded from the adjustment process.

In [14]:
for negative_correlation in [0.2, 0.4, 0.6]:
    mae = evaluate_recommendation_quality_general(
        transformed_df=transformed_df,
        num_splits=10,
        predict_function=predict_scores_with_threshold_negative,  # Function to predict scores
        userId=610,
        positive_threshold = 0 ,
        negative_threshold = negative_correlation
          # Pass the threshold as a keyword argument
    )
    rmse = evaluate_recommendation_quality_rmse_general(
        transformed_df=transformed_df,
        predict_function=predict_scores_with_threshold_negative,  # Function to predict scores
        userId=610,
        num_splits=10,
        positive_threshold = 0 ,
        negative_threshold = negative_correlation
    )
    print(f"MAE using 0% positive and {negative_correlation}% negative correlation threshold {mae}")
    mae_list.append((f"MAE using 0% positive and {negative_correlation}% negative correlation threshold",mae))

    print(f"RMSE using 0% positive and {negative_correlation}%  negative  coorelation threshold {rmse}")
    rmse_list.append((f"Rmse using {negative_correlation} positive coorelation threshold",rmse))

    p_error = evaluate_recommendation_quality_at_p(
        p=10,
        transformed_df=transformed_df,
        num_splits=10,
        predict_function=predict_scores_with_threshold_negative,  # Function to predict scores
        userId=610,
        positive_threshold = 0 ,
        negative_threshold = negative_correlation
    )
    print(f"Ma at p=10 using 0% positive and {negative_correlation}% negative correlation threshold{p_error}")
    perror_list.append((f"p_error using 0% positive and  {negative_correlation}% negative correlation threshold",p_error))

    overestimate_error = evaluate_recommendation_quality_general_weighted(
        transformed_df=transformed_df,
        weight_under=0,
        weight_over=1,
        userId=610,
        num_splits=10,
        predict_function=predict_scores_with_threshold_negative,  # Function to predict scores
        positive_threshold = 0 ,
        negative_threshold = negative_correlation
    )
    print(f"overestimate_error using 0% positive and  {negative_correlation}%  negative  coorelation threshold {overestimate_error} ")
    overestimate_error_list.append((f"overestimate_erro using 0% positive and {negative_correlation}% negative correlation threshold", overestimate_error))
    underestimate_error = evaluate_recommendation_quality_general_weighted(
        transformed_df=transformed_df,
        weight_under=1,
        weight_over=0,
        predict_function=predict_scores_with_threshold_negative,  # Function to predict scores
        userId=610,
        num_splits=10,
        positive_threshold = 0,
        negative_threshold = negative_correlation
    )
    print(f"underestimate_error using 0% positive and {negative_correlation}% negative correlation threshold {underestimate_error}")
    underestimate_error_list.append((f"underestimate_error using 0% positive and  {negative_correlation}%  negative  coorelation threshold", underestimate_error))

MAE using 0% positive and 0.2% negative correlation threshold 0.6113749808614168
RMSE using 0% positive and 0.2%  negative  coorelation threshold 0.80136603031125
Ma at p=10 using 0% positive and 0.2% negative correlation threshold0.510952380952381
overestimate_error using 0% positive and  0.2%  negative  coorelation threshold 0.745418158624428 
underestimate_error using 0% positive and 0.2% negative correlation threshold 0.5859043019335467
MAE using 0% positive and 0.4% negative correlation threshold 0.6060534511840754
RMSE using 0% positive and 0.4%  negative  coorelation threshold 0.7972685194076556
Ma at p=10 using 0% positive and 0.4% negative correlation threshold0.510952380952381
overestimate_error using 0% positive and  0.4%  negative  coorelation threshold 0.7347326854718496 
underestimate_error using 0% positive and 0.4% negative correlation threshold 0.5793087606016811
MAE using 0% positive and 0.6% negative correlation threshold 0.604099744608393
RMSE using 0% positive and 

### Conclusions

In practice, this approach did not perform as expected, as it worsened performance compared to the previous algorithm across all metrics. Therefore, we will abandon this idea. Nevertheless, it was certainly worth exploring.

### Fixed neighbourhood size of k most similar neighbours

Since the approach with negative correlations did not yield the desired results, we now aim to further explore user-based rating prediction based on the positively correlated users. However, this time, we won’t include all users above a certain threshold. Instead, we will focus on the k most similar users to our User ID. The prediction will be made using a weighted average of these k users.

In [15]:
for neighbours_size in [5, 15, 25, 50]:
    mae = evaluate_recommendation_quality_general(
        transformed_df=transformed_df,
        num_splits=10,
        predict_function=predict_scores,  # Function to predict scores
        userId=610,
        k = neighbours_size
          # Pass the threshold as a keyword argument
    )
    rmse = evaluate_recommendation_quality_rmse_general(
        transformed_df=transformed_df,
        predict_function=predict_scores,  # Function to predict scores
        userId=610,
        num_splits=10,
        k = neighbours_size
    )
    print(f"MAE using {neighbours_size} strongest positively correlated neighbours {mae}")
    mae_list.append((f"MAE using {neighbours_size} strongest positively correlated neighbours", mae))

    print(f"RMSE using {neighbours_size} strongest positively correlated neighbours {rmse}")
    rmse_list.append((f"RMSE using {neighbours_size} strongest positively coRrelated neighbours", rmse))

    p_error = evaluate_recommendation_quality_at_p(
        p=10,
        transformed_df=transformed_df,
        num_splits=10,
        predict_function=predict_scores,  # Function to predict scores
        userId=610,
        k = neighbours_size
    )
    print(f"Ma at p=10 using {neighbours_size} strongest positively coorelated neighbours {p_error}")
    perror_list.append((f"p_error using {neighbours_size} strongest positively coorelated neighbours", p_error))

    overestimate_error = evaluate_recommendation_quality_general_weighted(
        transformed_df=transformed_df,
        weight_under=0,
        weight_over=1,
        userId=610,
        num_splits=10,
        predict_function=predict_scores,  # Function to predict scores
        k = neighbours_size
    )
    print(f"overestimate_error using {neighbours_size} strongest positively correlated neighbours {overestimate_error} ")
    overestimate_error_list.append((f"overestimate_error using {neighbours_size} strongest positively correlated neighbours", overestimate_error))
    underestimate_error = evaluate_recommendation_quality_general_weighted(
        transformed_df=transformed_df,
        weight_under=1,
        weight_over=0,
        predict_function=predict_scores,  # Function to predict scores
        userId=610,
        num_splits=10,
        k = neighbours_size
    )
    print(f"underestimate_error using {neighbours_size} strongest positively correlated neighbours {underestimate_error}")
    underestimate_error_list.append((f"underestimate_error using {neighbours_size} strongest positively correlated neighbours", underestimate_error))

MAE using 5 strongest positively correlated neighbours 0.6189454443760426
RMSE using 5 strongest positively correlated neighbours 0.8134962988471877
Ma at p=10 using 5 strongest positively coorelated neighbours 0.543015873015873
overestimate_error using 5 strongest positively correlated neighbours 0.7107882585333297 
underestimate_error using 5 strongest positively correlated neighbours 0.5964032868467976
MAE using 15 strongest positively correlated neighbours 0.6031413821118339
RMSE using 15 strongest positively correlated neighbours 0.7939928210640812
Ma at p=10 using 15 strongest positively coorelated neighbours 0.4970634920634921
overestimate_error using 15 strongest positively correlated neighbours 0.7272806141227194 
underestimate_error using 15 strongest positively correlated neighbours 0.5787871031982827
MAE using 25 strongest positively correlated neighbours 0.5981621909345929
RMSE using 25 strongest positively correlated neighbours 0.7906557039685967
Ma at p=10 using 25 stron

#### Conclusions

This approach has yielded more promising results compared to the previous inclusion of negative correlations. However, the improvements are not significant. The RMSE did not improve compared to taking all positively correlated users, but both MAE and MAE at p=10 have improved. The choice of neighbourhood size (\( k \)) plays a crucial role in performance; it must be selected wisely. Overall, the best performance was achieved with 25 of the most similar users.

Unfortunately, due to considering fewer data points when predicting ratings for each film, the overestimation and underestimation errors differ significantly. The algorithm tends to overestimate, which could be beneficial for a user with ample time for watching movies and who would not mind some false positive high ratings. In general, this is not the ideal outcome.

### Change of approach

Up until now, we have focused primarily on the correlation of our user with other users and their ratings for a particular film. While this approach seems reasonable, it does not account for one crucial aspect: the personal tendency of users to rate films either high or low. Some users may generally be grumpy or strict, while others might be more lenient overall in their grading. To address this, we need to consider how the rating for the film we are currently evaluating deviates from the given users’ rating means. This adjustment accounts for individual tendencies to rate films high or low, making the recommender system more versatile and accurate.

We will continue to use the previously discovered threshold of the 25 strongest positively correlated users in this approach as well.

In [16]:
mae = evaluate_recommendation_quality_general(
    transformed_df=transformed_df,
    num_splits=10,
    predict_function=predict_scores_no_baseline,  # Function to predict scores
    userId=610,
    k = 25
      # Pass the threshold as a keyword argument
)
rmse = evaluate_recommendation_quality_rmse_general(
    transformed_df=transformed_df,
    predict_function=predict_scores_no_baseline,  # Function to predict scores
    userId=610,
    num_splits=10,
    k = 25
)
print(f"MAE using 25 strongest positively correlated neighbours accounting for the personal  tendencies high/low {mae}")
mae_list.append((f"MAE using 25strongest positively correlated neighbours accounting for the personal tendencies high/low",mae))

print(f"RMSE using 25 strongest positively correlated neighbours  accounting for the personal  tendencies high/low{rmse}")
rmse_list.append((f"Rmse using 25strongest positively correlated neighbours accounting for the personal tendencies high/low",rmse))

p_error = evaluate_recommendation_quality_at_p(
    p=10,
    transformed_df=transformed_df,
    num_splits=10,
    predict_function=predict_scores_no_baseline,  # Function to predict scores
    userId=610,
    k = 25
)
print(f"Ma at p=10 using 25 strongest positively coorelated neighbours accounting for the personal  tendencies high/low {p_error}")
perror_list.append((f"p_error using 25 strongest positively coorelated neighbours accounting for the personal  tendencies high/low",p_error))

overestimate_error = evaluate_recommendation_quality_general_weighted(
    transformed_df=transformed_df,
    weight_under=0,
    weight_over=1,
    userId=610,
    num_splits=10,
    predict_function=predict_scores_no_baseline,  # Function to predict scores
    k = 25
)
print(f"overestimate_error using  25 strongest positively coorelated neighbours accounting for the personal  tendencies high/low  {overestimate_error} ")
overestimate_error_list.append((f"overestimate_error using 25 strongest positively coorelated neighbours accounting for the personal  tendencies high/low",overestimate_error))
underestimate_error = evaluate_recommendation_quality_general_weighted(
    transformed_df=transformed_df,
    weight_under=1,
    weight_over=0,
    predict_function=predict_scores_no_baseline,  # Function to predict scores
    userId=610,
    num_splits=10,
    k = 25
)
print(f"underestimate_error using  {neighbours_size} strongest positively coorelated neighbours accounting for the personal  tendencies high/low  {underestimate_error} ")
underestimate_error_list.append((f"underestimate_error using 25 strongest positively coorelated neighbours accounting for the personal  tendencies high/low",underestimate_error))

MAE using 25 strongest positively correlated neighbours accounting for the personal  tendencies high/low 0.5400134448817265
RMSE using 25 strongest positively correlated neighbours  accounting for the personal  tendencies high/low0.6820267320787797
Ma at p=10 using 25 strongest positively coorelated neighbours accounting for the personal  tendencies high/low 0.47314467607295896
overestimate_error using  25 strongest positively coorelated neighbours accounting for the personal  tendencies high/low  0.5110370673816587 
underestimate_error using  50 strongest positively coorelated neighbours accounting for the personal  tendencies high/low  0.5572389916468498 


#### Conclusions

This turned out to be a very effective approach. The MAE decreased visibly, and the RMSE improved even more significantly, dropping from the previous best of around 0.79 to 0.68 now. The overestimation and underestimation errors are very similar and significantly lower than before, which is also a very positive outcome. The error for the top 10 ratings decreased slightly, but only marginally compared to its previous best value. Overall, this approach seems to perform the best so far, as it minimizes error across all metrics, making it not only a good but also a versatile method.


### Item analysis

Finally, why not analyze the items as well to improve our algorithm? The idea here is similar to the previous algorithm but with an added twist. After calculating the weighted deviation for a given (user ID, movie) tuple, we check if the score is equivocal. To clarify, consider an example: if the average weighted score suggests 3.2, it would be rounded down to 3 because this is the closest viable rating. However, it may also not be far from 3.5. In such a case, we check the mean value for this item (movie) to determine if it is above 3.5. If so, we set our average to 3.5. This approach allows for rounding up in scenarios where user behavior analysis is torn between two ratings.

This addition of item analysis to the algorithm appears to be a great idea because it addresses ambiguity in user behavior where the scores are close to the rating boundary.

In [17]:
mae = evaluate_recommendation_quality_general(
    transformed_df=transformed_df,
    num_splits=10,
    predict_function=predict_scores_with_item_correction,  # Function to predict scores
    userId=610,
    k = 25
      # Pass the threshold as a keyword argument
)
rmse = evaluate_recommendation_quality_rmse_general(
    transformed_df=transformed_df,
    predict_function=predict_scores_with_item_correction,  # Function to predict scores
    userId=610,
    num_splits=10,
    k = 25
)
print(f"MAE using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity {mae}")
mae_list.append((f"MAE using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity",mae))

print(f"RMSE using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low{rmse}")
rmse_list.append((f"Rmse using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity",rmse))

p_error = evaluate_recommendation_quality_at_p(
    p=10,
    transformed_df=transformed_df,
    num_splits=10,
    predict_function=predict_scores_with_item_correction,  # Function to predict scores
    userId=610,
    k = 25
)
print(f"Ma at p=10 using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity {p_error}")
perror_list.append((f"p_error using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity",p_error))

overestimate_error = evaluate_recommendation_quality_general_weighted(
    transformed_df=transformed_df,
    weight_under=0,
    weight_over=1,
    userId=610,
    num_splits=10,
    predict_function=predict_scores_with_item_correction,  # Function to predict scores
    k = 25
)
print(f"overestimate_error using  25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity  {overestimate_error} ")
overestimate_error_list.append((f"overestimate_error using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity",overestimate_error))
underestimate_error = evaluate_recommendation_quality_general_weighted(
    transformed_df=transformed_df,
    weight_under=1,
    weight_over=0,
    predict_function=predict_scores_with_item_correction,  # Function to predict scores
    userId=610,
    num_splits=10,
    k = 25
)
print(f"underestimate_error using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity {underestimate_error} ")
underestimate_error_list.append((f"underestimate_error using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity",underestimate_error))

MAE using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity 0.5357688482689216
RMSE using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low0.7071960837101245
Ma at p=10 using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity 0.48492063492063486
overestimate_error using  25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity  0.7553065406303409 
underestimate_error using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity 0.4248038154063024 


#### Conclusions

The difference compared to the previous method is not very striking. The MAE is slightly improved, but on the other hand, the RMSE is slightly worse, and the MAE at the 10 best ratings is also slightly lower. The underestimation error is lower than ever before, indicating that this system tends to overestimate, which could be the best choice for a user who is not very picky but wants to have a broad set of well-rated films to watch. In general, this idea seems quite interesting but likely requires more testing to make a significant difference.

# Results comparison and conclusion

In [18]:
mae_list.sort(key = lambda x: x[-1])
rmse_list.sort(key = lambda x: x[-1])
underestimate_error_list.sort(key = lambda x: x[-1])
overestimate_error_list.sort(key = lambda x: x[-1])
perror_list.sort(key = lambda x: x[-1])

print ("MAE  scores:")
print(mae_list)
print('\n')
print ("rmse  scores:")
print(rmse_list)
print('\n')
print ("p_error at p =10   scores:")
print(perror_list)
print('\n')
print ("underestimate errror  scores:")
print(underestimate_error_list)
print('\n')
print ("overestimate error scores:")
print(overestimate_error_list)


MAE  scores:
[('MAE using 25 strongest positively correlated neighbours accounting for the personal tendencies high/low and items specificity', 0.5357688482689216), ('MAE using 25strongest positively correlated neighbours accounting for the personal tendencies high/low', 0.5400134448817265), ('MAE using 25 strongest positively correlated neighbours', 0.5981621909345929), ('MAE using 50 strongest positively correlated neighbours', 0.6003440458652951), ('MAE using 0% positive correlation threshold', 0.6021952781071522), ('MAE using 15 strongest positively correlated neighbours', 0.6031413821118339), ('MAE using 0% positive and 0.6% negative correlation threshold', 0.604099744608393), ('MAE using 0% positive and 0.4% negative correlation threshold', 0.6060534511840754), ('MAE using 0.2% positive correlation threshold', 0.606241746389952), ('MAE using 0% positive and 0.6% negative correlation threshold', 0.6113749808614168), ('MAE using 0% positive and 0.2% negative correlation threshold',

# Conclusions

- We believe that we thoroughly tested several recommender system approaches, often building one upon another. The main conclusion is that the system based on the weighted average of deviations from the means for the 25 most correlated users performs the best. It is definitely the most versatile, performs well across all metrics, and is the top performer in some specific metrics, such as RMSE, which is an important measure for this problem.

- Additionally, item analysis is a very promising idea. Even a simple conflict resolution approach using the mean for the given movie, applied when the weighted average of users yields an equivocal score, proved to be promising. With further work and testing, better results could likely be achieved using an item-based recommender system combined with a user-based one. We also tried combining a user-based recommender system based on correlation with an item-based recommender based on cosine similarity between movies. However, it turned out to be even more computationally expensive than the user-based recommender systems, leading us to abandon that approach.

- The baseline approach, a very simple and inexpensive system, does not perform badly in terms of MAE. However, for other, potentially more informative measures, it is clearly inferior. It remains a useful reference or comparison point, as it is not significantly worse than other, more sophisticated approaches by an ‘order of magnitude.’

- Our attempt to utilize negatively correlated users did not succeed. Although the idea appeared reasonable, introducing this concept only worsened performance across all metrics. It may be that the logic behind the algorithm was invalid, rather than the concept of using negative correlations per se.