Here is a basic Python script using Jupyter Notebook for building a recommendation system with collaborative filtering using the Surprise library. This script uses the MovieLens dataset which is inbuilt in the Surprise library.

In [2]:
# Import necessary libraries
from surprise import Dataset, Reader, KNNBasic
from surprise.model_selection import cross_validate

# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin('ml-100k')

# Define the algorithm object; we'll use k-NN
sim_options = {
    'name': 'cosine',  # similarity measure default is MSD
    'user_based': False  # compute  similarities between items
}
algo = KNNBasic(sim_options=sim_options)

# Run 5-fold cross-validation and print results
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Dataset ml-100k could not be found. Do you want to download it? [Y/n] Y
Trying to download dataset from https://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to C:\Users\kamalap1/.surprise_data/ml-100k
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.0261  1.0280  1.0304  1.0310  1.0212  1.0273  0.0035  
MAE (testset)     0.8088  0.8128  0.8149  0.8170  0.8078  0.8123  0.0035  
Fit time          1.45    1.21    1.19    1.27    1.95    1.41    0.28    
Test time     

{'test_rmse': array([1.02608364, 1.02800795, 1.03044266, 1.03095201, 1.02117239]),
 'test_mae': array([0.8087924 , 0.81277873, 0.81491981, 0.81699988, 0.80779383]),
 'fit_time': (1.448932409286499,
  1.2102315425872803,
  1.1882891654968262,
  1.2651288509368896,
  1.9457128047943115),
 'test_time': (2.93818736076355,
  2.8423314094543457,
  2.781707286834717,
  2.7626874446868896,
  3.7093875408172607)}

This script first imports the necessary modules from the Surprise library. Then it loads the MovieLens dataset. The algorithm used here is k-nearest neighbors (k-NN), a basic collaborative filtering algorithm. The similarity options are set to use cosine similarity and item-based collaborative filtering (as opposed to user-based).

The script then performs a 5-fold cross-validation on the dataset using the k-NN algorithm, and prints the root mean square error (RMSE) and mean absolute error (MAE) for each fold. These are common measures of accuracy for recommendation systems.
Please note that this is a very basic recommendation system and there are many ways to improve it, such as using a more advanced algorithm (e.g., SVD, NMF), tuning the parameters of the algorithm, or incorporating additional information (e.g., user or item features).

After training your model, you can use it to predict the rating a user would give to an item they have not interacted with. Here's how to do it:

In [3]:
from surprise import Dataset, Reader, KNNBasic, accuracy
from surprise.model_selection import train_test_split

# Load the movielens-100k dataset
data = Dataset.load_builtin('ml-100k')

# Split the dataset into train and test
trainset, testset = train_test_split(data, test_size=0.25)

# Define the algorithm object; we'll use k-NN
sim_options = {
    'name': 'cosine',  # similarity measure default is MSD
    'user_based': False  # compute  similarities between items
}
algo = KNNBasic(sim_options=sim_options)

# Train the algorithm on the trainset
algo.fit(trainset)

# Predict ratings for the testset
predictions = algo.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)

Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 1.0280


1.027958083682903

In this script, the dataset is split into a training set and a testing set. The model is trained on the training set using the KNNBasic algorithm. Then, the trained model is used to predict the ratings for the interactions in the test set. The accuracy of the predictions is evaluated by computing the root mean square error (RMSE) between the predicted ratings and the actual ratings.

Please note that this is a simple evaluation method. There are more complex methods available, such as precision@k or recall@k, which consider the ranking of recommendations, not just the predicted ratings. The choice of evaluation method depends on the specific requirements of your recommendation system.

In [4]:
# Print the first 10 predictions
for i in range(10):
    print('User:', predictions[i].uid, 'Item:', predictions[i].iid, 'Actual Rating:', predictions[i].r_ui, 'Predicted Rating:', predictions[i].est)

User: 186 Item: 118 Actual Rating: 2.0 Predicted Rating: 3.7459888017285663
User: 44 Item: 21 Actual Rating: 2.0 Predicted Rating: 3.898897159556349
User: 655 Item: 4 Actual Rating: 2.0 Predicted Rating: 2.874927799678351
User: 758 Item: 387 Actual Rating: 2.0 Predicted Rating: 3.998939895051122
User: 751 Item: 945 Actual Rating: 3.0 Predicted Rating: 3.3969236020337092
User: 260 Item: 333 Actual Rating: 4.0 Predicted Rating: 4.148377795893102
User: 60 Item: 420 Actual Rating: 4.0 Predicted Rating: 4.176079115919704
User: 639 Item: 174 Actual Rating: 4.0 Predicted Rating: 3.175981338194002
User: 851 Item: 412 Actual Rating: 2.0 Predicted Rating: 3.020033777327246
User: 684 Item: 70 Actual Rating: 4.0 Predicted Rating: 3.822521572006314


In the last part of the script, a loop is used to print the user and item IDs, the actual rating, and the predicted rating for the first 10 instances in the test set. This information can be useful for understanding how well the recommendation system is working.