## Experiments on Community Detection in Recommender Systems'

This Jupyter notebook aims to conduct a series of experiments to evaluate how the performance of specific recommendation algorithms varies with the addition of community detectors. The experiments will be performed using the MovieLens 100k and Jester datasets. The scikit-surprise library will also be used.

The main goal of these experiments is to verify whether the integration of community detection techniques in recommender systems can improve the recommendation accuracy. The algorithms will be evaluated based on RMSE, MSE and MAE metrics and the results will be saved in CSV format for further analysis.

### Setting up the environment

In [51]:
"""
    Importing needed libs
"""

'\n    Importing needed libs\n'

In [59]:
import os
import numpy as np
import pandas as pd
from surprise import (
    accuracy,
    Dataset,
    CoClustering,
    KNNBasic,
    NMF,
    SVD
)
from surprise.model_selection.split import ShuffleSplit 
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")

In [55]:
"""
    Setting up experiment algorithms 
"""

'\n    Setting up experiment algorithms \n'

In [57]:
algos_rec = {
    'SVD': SVD()
}

### Running experiment

In [58]:
results = []
for dataset in tqdm(['ml-100k', 'jester'], desc='General Progress', leave=True):
    print(f' -- Executing experiment for {dataset}')
    data = Dataset.load_builtin(dataset)
    for similarity_metric in ['cosine']: 
        for com in [None]:
            for algo_name, algo in  algos_rec.items():
                for test_size in [0.25]:
                    shuffle_split = ShuffleSplit(n_splits=3, test_size=test_size)
                    split_id = 1
                    for trainset, testset in shuffle_split.split(data):
                        algo.fit(trainset)
                        predictions = algo.test(testset)
                        rmse_value = accuracy.rmse(predictions, verbose=True)
                        result_dict = {
                            'dataset': dataset,
                            'similarity_metric': similarity_metric,
                            'community_detector': com,
                            'algorithm_rec': algo_name,
                            'test_size': test_size,
                            'split_id': split_id,
                            'rmse': rmse_value
                        }
                        results.append(result_dict)
                        split_id += 1



df_results = pd.DataFrame(results)
df_results

General Progress:   0%|          | 0/2 [00:00<?, ?it/s]

 -- Executing experiment for ml-100k
RMSE: 0.9375
RMSE: 0.9463


General Progress:  50%|█████     | 1/2 [00:10<00:10, 10.06s/it]

RMSE: 0.9446
 -- Executing experiment for jester
RMSE: 4.4683
RMSE: 4.4729


General Progress: 100%|██████████| 2/2 [01:34<00:00, 47.45s/it]

RMSE: 4.4710





Unnamed: 0,dataset,similarity_metric,community_detector,algorithm_rec,test_size,split_id,rmse
0,ml-100k,cosine,,SVD,0.25,1,0.937548
1,ml-100k,cosine,,SVD,0.25,2,0.946312
2,ml-100k,cosine,,SVD,0.25,3,0.944631
3,jester,cosine,,SVD,0.25,1,4.46832
4,jester,cosine,,SVD,0.25,2,4.472875
5,jester,cosine,,SVD,0.25,3,4.470972


In [64]:
file_name = 'results.csv'
notebook_dir = os.getcwd()
outputs_dir = notebook_dir.replace('notebooks', 'outputs')
file_path = os.path.join(outputs_dir, file_name)
df_results.to_csv(file_path, index=False)
