<a href="https://colab.research.google.com/github/lovrodukic/music-recommendation/blob/main/notebooks/recommender_als.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Preprocessing

Preprocess Last.fm dataset to prepare it for building a recommendation system.

In [7]:
!wget -P /content/datasets https://files.grouplens.org/datasets/hetrec2011/hetrec2011-lastfm-2k.zip
!unzip /content/datasets/hetrec2011-lastfm-2k.zip -d /content/datasets
!ls /content/datasets
# Install required libraries
!pip install pandas numpy scikit-learn matplotlib implicit

--2024-11-13 22:21:26--  https://files.grouplens.org/datasets/hetrec2011/hetrec2011-lastfm-2k.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2589075 (2.5M) [application/zip]
Saving to: ‘/content/datasets/hetrec2011-lastfm-2k.zip.1’


2024-11-13 22:21:27 (12.1 MB/s) - ‘/content/datasets/hetrec2011-lastfm-2k.zip.1’ saved [2589075/2589075]

Archive:  /content/datasets/hetrec2011-lastfm-2k.zip
replace /content/datasets/user_friends.dat? [y]es, [n]o, [A]ll, [N]one, [r]ename: artists.dat		    readme.txt	      user_friends.dat
hetrec2011-lastfm-2k.zip    tags.dat	      user_taggedartists.dat
hetrec2011-lastfm-2k.zip.1  user_artists.dat  user_taggedartists-timestamps.dat
Collecting implicit
  Using cached implicit-0.7.2-cp310-cp310-manylinux2014_x86_64.whl.metadata (6.1 kB)
Downloading implicit-0.7.2-cp310-cp310-manylinux201

In [83]:
import matplotlib.pyplot as plt
import sklearn
import seaborn as sns
import numpy as np
import pandas as pd
import scipy

In [84]:
def load_user_artists(user_artists_file):
    """
    Return a CSR matrix of user_artistst.dat
    """
    user_artists = pd.read_csv(user_artists_file, sep='\t')
    user_artists.set_index(['userID', 'artistID'], inplace=True)
    coo = scipy.sparse.coo_matrix(
        (
            user_artists.weight.astype(float),
            (
                user_artists.index.get_level_values(0),
                user_artists.index.get_level_values(1),
            ),
        )
    )

    return coo.tocsr()

user_artists = load_user_artists('/content/datasets/user_artists.dat')
print(f"Sparse matrix shape: {user_artists.shape}")

Sparse matrix shape: (2101, 18746)


In [85]:
def load_artists(artists_file):
    """
    Load artists and return in a dataframe format
    """
    artists = pd.read_csv(artists_file, sep='\t')
    artists = artists.set_index('id')

    return artists

artists = load_artists('/content/datasets/artists.dat')
print(f"Dataframe shape: {artists.shape}")

Dataframe shape: (17632, 3)


# Model Training

Training a collaborative filtering model using Alternating Least Squares (ALS)

In [93]:
from implicit import als

model = als.AlternatingLeastSquares(
    factors=50,
    regularization=0.01,
    iterations=10
)

model.fit(user_artists)
print("Training complete.")

  0%|          | 0/10 [00:00<?, ?it/s]

Training complete.


# Model Evaluation

Evaluate the performance of the model using precision and recall

In [113]:
def get_als_recommendations(model, user_id, user_artists, n_recommendations=5):
    """
    Generate top n recommendations for a specific user using the ALS model.
    """
    recommended_items, scores = model.recommend(
        user_id, user_artists[n_recommendations], N=n_recommendations
    )

    recommendations = [
        artists.loc[artist_id, 'name'] for artist_id in recommended_items
    ]

    return recommendations, scores

user_id = 2
recommendations, scores = get_als_recommendations(
    model, user_id, user_artists, n_recommendations=5
)

for (artist, score) in zip(recommendations, scores):
    print(f"{artist}: {score}")

The Police: 1.586818814277649
Sting: 1.2840187549591064
The Prodigy: 1.2672972679138184
Zero 7: 1.2326427698135376
Roxette: 1.203816294670105
