# 🎵 Music Recommendation with NMF

In this project, we will apply **Non-Negative Matrix Factorization (NMF)** to build a simple music recommendation system.  

For music datasets to download : [Data Link](https://drive.google.com/file/d/1gXn_N6gvOxZPmsIn6b4aRFtkQMTciR02/view?usp=sharing).

The data to use is represented as a **user_artist.csv**, where:  
- **Rows** represent artists,  
- **Columns** represent users,  
- **Entries** indicate how many times a given user listened to a particular artist.  

Since users listen to different numbers of artists with varying frequencies, we first normalize the data to ensure all users contribute equally. This is done using **MaxAbsScaler**, which scales each user’s listening history without distorting the sparsity of the data.  

After normalization, we factorize the matrix into lower-dimensional **NMF features**. These features capture hidden patterns in listening behavior, which will later be used to recommend artists to users — similar to how platforms like **Spotify** or **Last.fm** suggest new music.  


In [24]:
import pandas as pd
import numpy as np

**Creating artists X users Data**

In [25]:
# Load artists.csv (single column, no header)
artists = pd.read_csv("artists.csv", header=None, names=["artist_name"])

# Add artist_offset as the index (0-based, to match scrobbler file)
artists = artists.reset_index().rename(columns={"index": "artist_offset"})

# Load scrobbler data
scrobbler = pd.read_csv("scrobbler-small-sample.csv")

# Merge using artist_offset
merged = scrobbler.merge(artists, on="artist_offset")

# Create user–artist matrix
data = merged.pivot_table(
    index="artist_name",      # rows = artists
    columns="user_offset",    # columns = users
    values="playcount",
    aggfunc="sum",
    fill_value=0
)

data.to_csv("user_artist.csv")


Loading artists X users Data

In [26]:
# Load the saved CSV
data = pd.read_csv("user_artist.csv", index_col=0)

# Convert to NumPy array (values only, no labels)
artists = data.values   # or data.to_numpy()

print(artists.shape)    # (num_artists, num_users)


(111, 500)


**Creating The Pipeline**

In [27]:
# Perform the necessary imports
from sklearn.decomposition import NMF
from sklearn.preprocessing import Normalizer, MaxAbsScaler
from sklearn.pipeline import make_pipeline

# Create a MaxAbsScaler: scaler
scaler = MaxAbsScaler()

# Create an NMF model: nmf
nmf = NMF(n_components = 20)

# Create a Normalizer: normalizer
normalizer = Normalizer()

# Create a pipeline: pipeline
pipeline = make_pipeline(scaler, nmf, normalizer)

# Apply fit_transform to artists: norm_features
norm_features = pipeline.fit_transform(artists)


**Loading artits Names**

In [28]:
# Load artists.csv (single column, no header)
artists = pd.read_csv("artists.csv", header=None, names=["artist_name"])

# Convert to NumPy array
artist_names = artists["artist_name"].to_numpy()

print(artist_names.shape)   # number of artists
print(artist_names[:10])    # preview first 10 names

(111,)
['Massive Attack' 'Sublime' 'Beastie Boys' 'Neil Young' 'Dead Kennedys'
 'Orbital' 'Miles Davis' 'Leonard Cohen' 'Van Morrison' 'NOFX']


Suppose you were a big fan of Bruce Springsteen - which other musical artists might you like? Use your NMF features and the cosine similarity to find similar musical artists.  `norm_features` is an array containing the normalized NMF features as rows. The names of the musical artists are available as the list `artist_names`.

In [29]:
# Create a DataFrame: df
df = pd.DataFrame(norm_features, index = artist_names)
# Select row of 'Bruce Springsteen': artist
artist = df.loc['Bruce Springsteen']

# Compute cosine similarities: similarities
similarities = df.dot(artist)

# Display those with highest cosine similarity
print(similarities.nlargest())


Bruce Springsteen    1.000000
Jet                  0.687018
The Flaming Lips     0.673298
Foo Fighters         0.667950
AC/DC                0.636998
dtype: float64
