In [1]:
!pip install scikit-surprise

Collecting scikit-surprise
  Downloading scikit_surprise-1.1.4.tar.gz (154 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/154.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m153.6/154.4 kB[0m [31m5.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.4/154.4 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (pyproject.toml) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.4-cp310-cp310-linux_x86_64.whl size=2357285 sha256=1517b811ae52e470875f97b15ebd5c3f8043ed4312d9a419da6160625b75930a
  Stored in directory: /root/.cache/pip/wheels/4b/3f/df/6acbf0a

In [2]:
import pandas as pd
from surprise import Dataset, Reader, KNNBasic
from surprise.model_selection import train_test_split
from surprise import accuracy

In [3]:
# Sample data: user_id, item_id, and rating (or quantity)
data = {
    'user_id': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4],
    'item_id': [101, 102, 103, 101, 104, 102, 104, 105, 103, 104, 106],
    'rating': [5, 3, 4, 2, 5, 3, 4, 5, 2, 4, 5]  # Using ratings here, but it can be quantity as well
}

In [4]:
# Convert the data into a DataFrame
df = pd.DataFrame(data)

In [5]:
# Load data into Surprise's format
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)

In [6]:
# Split the data into training and test sets
trainset, testset = train_test_split(data, test_size=0.25)

In [7]:
# Use KNN Basic for collaborative filtering
algo = KNNBasic(sim_options={'name': 'cosine', 'user_based': True})

In [8]:
# Train the algorithm on the trainset
algo.fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x7e720303ce50>

In [9]:
# Make predictions on the testset
predictions = algo.test(testset)

In [10]:
# Compute and print the accuracy
accuracy.rmse(predictions)

RMSE: 0.8539


0.8539125638299665

In [11]:
# Function to recommend items for a given user
def recommend_items(user_id, algo, num_recommendations=5):
    # Get all unique item_ids
    item_ids = df['item_id'].unique()

    # Predict ratings for all items the user hasn't rated yet
    user_rated_items = df[df['user_id'] == user_id]['item_id'].values
    items_to_predict = [item for item in item_ids if item not in user_rated_items]

    # Predict ratings for the user
    predictions = [algo.predict(user_id, item_id) for item_id in items_to_predict]

    # Sort predictions by estimated rating in descending order
    recommendations = sorted(predictions, key=lambda x: x.est, reverse=True)

    # Get the top N recommendations
    top_recommendations = [rec.iid for rec in recommendations[:num_recommendations]]

    return top_recommendations

In [15]:
# Example: Recommend items for user 1
recommended_items = recommend_items(user_id=1, algo=algo)

print(f"Recommended items for user 1: {recommended_items}")

Recommended items for user 1: [104, 105, 106]


Explanation:
1. Surprise Library: We use the surprise library to manage the user-item interaction data, build the collaborative filtering model, and generate predictions.

2. KNNBasic Algorithm: This uses K-Nearest Neighbors with cosine similarity to find users with similar tastes.

3. Recommendations: The recommend_items function predicts ratings for items the user hasn't rated yet and recommends the highest-rated ones.

Output:
The code will print out a list of recommended items for the specified user (in this case, user_id=1).

Notes:
* Data Requirements: You can use ratings or purchase quantities as the input data.
* Improvement: You can experiment with different similarity measures (pearson, cosine, etc.) or try different algorithms within the Surprise library, like matrix factorization.
* Scalability: For larger datasets, more sophisticated methods such as SVD (Singular Value Decomposition) or using libraries like TensorFlow or PyTorch for deep learning-based recommendations can be considered.