## **Creating a Personalized Music Recommendation Engine**

**Project Goal:** Build a recommendation engine that learns user preferences and suggests songs based on past listening behavior using deep learning.

**Skills Developed:** Collaborative filtering, building embeddings for recommendation, handling large-scale user-item interactions.


## Problem Definition
The goal of a music recommendation engine is to predict the songs that a user will most likely enjoy based on their past listening history and preferences. A common approach is to use collaborative filtering, which leverages patterns in the interactions between users and songs to make predictions.

## Data Collection and Preprocessing
Data Requirements: You need a dataset that contains information about users, songs, and interactions (e.g., play history, ratings, or likes).
A popular dataset for this is Last.fm or Million Song Dataset.
The dataset should include:

> User IDs (unique identifier for each user)

> Song IDs (unique identifier for each song)

> Interaction data (e.g., ratings, number of plays, or time spent listening)


## Step 1: Import Required Libraries

Before starting the project, the necessary Python libraries must be imported. These include libraries for data manipulation, machine learning, and API handling.

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
import requests
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt

## Step 2: API Configuration

Set up the API configuration to interact with Last.fm. This requires an API key and username, along with the base URL for API requests:

**API_KEY:** The API key to authenticate requests.

**USER:** The Last.fm username for fetching data. Here, we need to tell the user name that requires music recommendation

**BASE_URL:** The base URL for Last.fm API endpoints.

In [2]:
API_KEY = "dfcd16d8f5058144250d9e2a9279fccd"  # Replace with your Last.fm API key
USER = "the_atm"  # Replace with your Last.fm username
BASE_URL = "http://ws.audioscrobbler.com/2.0/"  # Base URL for Last.fm API

## Step 3: Fetch Data from Last.fm API

Define a function, fetch_lastfm_data, to fetch the user's recent tracks from Last.fm using the API. The function sends a GET request to the user.getrecenttracks endpoint and retrieves the data in JSON format.

In [4]:
def fetch_lastfm_data(user, api_key):
    # Fetch the user's recent tracks
    response = requests.get(BASE_URL, {
        "method": "user.getrecenttracks",
        "user": user,
        "api_key": api_key,
        "format": "json",
        "limit": 1000  # Adjust limit as needed
    })
    data = response.json()
    return data

# Call the API
print("Fetching data from Last.fm...")
data = fetch_lastfm_data(USER, API_KEY)

Fetching data from Last.fm...


## Step 4: Extract and Preprocess Data

The data retrieved from the API is processed as follows:

Extract the track information from the JSON response.

Convert the extracted data into a pandas DataFrame with columns:

**user_id:** Numeric ID assigned to the user.

**song_name:** Name of the song.

**artist_name:** Name of the artist.

Encode the song and artist names into numeric IDs using LabelEncoder.

Add an interaction column (interaction) to represent implicit user feedback (e.g., play count).

Split the DataFrame into training and testing datasets using an 80-20 split.

In [5]:
# Parse the JSON data into a DataFrame
tracks = data['recenttracks']['track']
df = pd.DataFrame([{
    "user_id": 0,  # Assign numeric ID for the single user
    "song_name": track['name'],
    "artist_name": track['artist']['#text']
} for track in tracks])

print("Sample Data:")
print(df.head())

# Encode song and artist names into numeric IDs
df['song_id'] = LabelEncoder().fit_transform(df['song_name'])
df['artist_id'] = LabelEncoder().fit_transform(df['artist_name'])

# Add interaction column (e.g., play count or implicit rating)
df['interaction'] = 1  # Treat all plays equally

# Split into train and test sets
train, test = train_test_split(df, test_size=0.2, random_state=42)

Sample Data:
   user_id                song_name     artist_name
0        0                       流杯     锦瑟​︱Brocade
1        0                      小洞天     锦瑟​︱Brocade
2        0  Atlantic (Instrumental)     Sleep Token
3        0               The Keeper     Paul Ruskay
4        0         Song of the Wood  Motoi Sakuraba


## Step 5: Build the Deep Learning Model

**Model Architecture:**

**Input Layers:** Separate inputs for user_id and song_id.


**Embedding Layers:**

**user_embedding:** Embedding for user IDs.

**song_embedding:** Embedding for song IDs.

**Concatenation:** Combine user and song embeddings.


**Dense Layers:**

A hidden layer with 128 neurons and ReLU activation.

An output layer with 1 neuron and sigmoid activation.

The model is compiled using the Adam optimizer and binary cross-entropy loss.


**Parameters:**

**num_users:** Total number of unique users (1 in this case).

**num_songs:** Total number of unique songs.

**embedding_dim:**Dimension of embedding space (set to 50).

In [6]:
# Define parameters
num_users = 1  # Only one user (USER)
num_songs = df['song_id'].nunique()
embedding_dim = 50

# Input layers
user_input = tf.keras.layers.Input(shape=(1,))
song_input = tf.keras.layers.Input(shape=(1,))

# Embedding layers
user_embedding = tf.keras.layers.Embedding(num_users, embedding_dim)(user_input)
song_embedding = tf.keras.layers.Embedding(num_songs, embedding_dim)(song_input)

# Flatten embeddings
user_vec = tf.keras.layers.Flatten()(user_embedding)
song_vec = tf.keras.layers.Flatten()(song_embedding)

# Concatenate embeddings
concat = tf.keras.layers.Concatenate()([user_vec, song_vec])
dense = tf.keras.layers.Dense(128, activation='relu')(concat)
output = tf.keras.layers.Dense(1, activation='sigmoid')(dense)

# Create model
model = tf.keras.Model(inputs=[user_input, song_input], outputs=output)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

## Step 6: Train the Model

**Prepare the input data for training and testing:**

**X_train:** Input data for training (user and song IDs).

**y_train:** Interaction data for training.

**X_test:** Input data for testing.

**y_test:** Interaction data for testing.

Train the model using the fit method with 10 epochs and a batch size of 32. Validate the model on the test dataset during training.

In [7]:
# Prepare input data
X_train = [train['user_id'], train['song_id']]
y_train = train['interaction']

X_test = [test['user_id'], test['song_id']]
y_test = test['interaction']

# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 11ms/step - accuracy: 0.8220 - loss: 0.6000 - val_accuracy: 1.0000 - val_loss: 0.1895
Epoch 2/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 1.0000 - loss: 0.1016 - val_accuracy: 1.0000 - val_loss: 0.0065
Epoch 3/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 1.0000 - loss: 0.0042 - val_accuracy: 1.0000 - val_loss: 0.0017
Epoch 4/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0014 - val_accuracy: 1.0000 - val_loss: 0.0011
Epoch 5/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 9.4648e-04 - val_accuracy: 1.0000 - val_loss: 8.0391e-04
Epoch 6/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 7.0223e-04 - val_accuracy: 1.0000 - val_loss: 6.1973e-04
Epoch 7/10
[1m25/25[0

## Step 7: Evaluate the Model

Evaluate the model's performance using the evaluate method on the test dataset. The output includes:

> Test Loss

> Test Accuracy

In [8]:
# Evaluate the model
results = model.evaluate(X_test, y_test)
print(f"Test Loss: {results[0]}, Test Accuracy: {results[1]}")

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 1.0000 - loss: 2.7444e-04 
Test Loss: 0.00027484106249175966, Test Accuracy: 1.0


## Step 8: Test Recommendations for a User

Define a function, recommend_songs, to generate song recommendations for the user:


> Create an array of user IDs (user_array) and song IDs (song_array).

> Use the trained model to predict interaction scores for all songs.

> Identify the top 5 songs with the highest predicted scores.

> Retrieve and display the names and artists of the recommended songs.

In [9]:
# Generate song recommendations
def recommend_songs(num_recommendations=5):
    user_array = np.array([0] * num_songs)  # Single user
    song_array = np.array(range(num_songs))

    predictions = model.predict([user_array, song_array])
    recommended_songs = predictions.flatten().argsort()[-num_recommendations:][::-1]

    recommended_df = df[df['song_id'].isin(recommended_songs)]
    print("\nRecommended Songs:")
    print(recommended_df[['song_name', 'artist_name']].drop_duplicates())
    return recommended_df[['song_name', 'artist_name']].drop_duplicates()

# Get recommendations
recommendations = recommend_songs()

[1m26/26[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step

Recommended Songs:
                       song_name          artist_name
37                      Whiplash           Architects
105          The Light in Us Now           J.J. Ipsen
146  It’s All So Incredibly Loud        Glass Animals
321                   Soft Spine            Spiritbox
603           Edge of the Sundom  Niels van der Leest


## Working

The system suggests songs based on the patterns it finds in your listening habits. Here's how it works in simple terms:

The system looks at:

Songs You’ve Listened To: It tracks which songs you have played recently.

Artists You Like: It also keeps track of the artists whose songs you listen to the most.

The system then treats each song and artist as a unique item. It learns which songs you enjoy by connecting the songs you’ve listened to with the artists you like. It remembers these connections and uses them to suggest other songs that are similar in style or by artists you might also enjoy.

So, when you ask for song recommendations, the system uses the patterns it learned from your past listening habits—what songs you liked and who the artists are—to suggest new songs you might like in the future. It’s like a friend recommending songs they think you’ll love based on what they know you’ve enjoyed before.






