In [48]:
import numpy as np
import pandas as pd

## 1. Import Required Libraries

# Music Recommendation System

This notebook demonstrates a content-based music recommendation system using machine learning. The system analyzes song features (artist, genre, album, rating) to recommend similar songs.

## Features:
- Data preprocessing and cleaning
- Text vectorization using CountVectorizer
- Cosine similarity for finding similar songs
- Recommendation engine that suggests top 5 similar songs

## Libraries Used:
- **pandas**: Data manipulation and analysis
- **numpy**: Numerical computing
- **scikit-learn**: Machine learning algorithms
- **pickle**: Model serialization

---

In [49]:
# Load the music dataset with proper structure
df = pd.read_csv(r"d:\Forked projects\music-recommendation-system-django\music-recommendation-system\music_data.csv")
df

Unnamed: 0,Song-Name,Singer/Artists,Genre,Album/Movie,User-Rating
0,Proper Patola,Diljit Dosanjh,Punjabi Pop,Namaste England,4.5/5
1,Shape of You,Ed Sheeran,Pop,Divide,4.8/5
2,Despacito,Luis Fonsi feat. Daddy Yankee,Latin Pop,Vida,4.7/5
3,Blinding Lights,The Weeknd,Synthpop,After Hours,4.6/5
4,Rockstar,Post Malone feat. 21 Savage,Hip Hop,Beerbongs & Bentleys,4.4/5
5,Dance Monkey,Tones and I,Pop,The Kids Are Coming,4.3/5
6,Someone You Loved,Lewis Capaldi,Pop,Divinely Uninspired to a Hellish Extent,4.5/5
7,Sunflower,Post Malone & Swae Lee,Hip Hop,Spider-Man: Into the Spider-Verse,4.6/5
8,Old Town Road,Lil Nas X feat. Billy Ray Cyrus,Country Rap,7 EP,4.8/5
9,Bad Guy,Billie Eilish,Electropop,When We All Fall Asleep Where Do We Go,4.7/5


## 2. Load and Explore Dataset

In [50]:
df.isnull().sum()

Song-Name         0
Singer/Artists    0
Genre             0
Album/Movie       0
User-Rating       0
dtype: int64

In [51]:
df.dropna(inplace=True)

## 3. Data Cleaning and Preprocessing

In [52]:
df.isnull().sum()

Song-Name         0
Singer/Artists    0
Genre             0
Album/Movie       0
User-Rating       0
dtype: int64

In [53]:
df.duplicated().sum()

np.int64(0)

In [54]:
df=df.drop_duplicates()

In [55]:
df.duplicated().sum()

np.int64(0)

In [56]:
df.shape

(53, 5)

In [57]:
df.head()

Unnamed: 0,Song-Name,Singer/Artists,Genre,Album/Movie,User-Rating
0,Proper Patola,Diljit Dosanjh,Punjabi Pop,Namaste England,4.5/5
1,Shape of You,Ed Sheeran,Pop,Divide,4.8/5
2,Despacito,Luis Fonsi feat. Daddy Yankee,Latin Pop,Vida,4.7/5
3,Blinding Lights,The Weeknd,Synthpop,After Hours,4.6/5
4,Rockstar,Post Malone feat. 21 Savage,Hip Hop,Beerbongs & Bentleys,4.4/5


In [58]:
# Dataset Information
print("Dataset Shape:", df.shape)
print("\nColumn Names:", df.columns.tolist())
print("\nData Types:")
print(df.dtypes)
print("\nBasic Statistics:")
print(df.describe(include='all'))

Dataset Shape: (53, 5)

Column Names: ['Song-Name', 'Singer/Artists', 'Genre', 'Album/Movie', 'User-Rating']

Data Types:
Song-Name         object
Singer/Artists    object
Genre             object
Album/Movie       object
User-Rating       object
dtype: object

Basic Statistics:
            Song-Name Singer/Artists Genre Album/Movie User-Rating
count              53             53    53          53          53
unique             53             36    16          41          10
top     Proper Patola     Ed Sheeran   Pop      Divide       4.4/5
freq                1             15    31          12           9


In [59]:
# Analyze Genre Distribution
print("Genre Distribution:")
print(df['Genre'].value_counts())
print("\nUnique Genres:", df['Genre'].nunique())

print("\n" + "="*50)

# Analyze Artist Distribution  
print("Top 10 Artists:")
print(df['Singer/Artists'].value_counts().head(10))
print("\nUnique Artists:", df['Singer/Artists'].nunique())

Genre Distribution:
Genre
Pop            31
Hip Hop         5
R&B             2
Latin Pop       2
Folk            2
Synthpop        1
Country Rap     1
Punjabi Pop     1
Electropop      1
Indie Pop       1
Afrobeats       1
Art Rock        1
Country         1
Soul            1
Folk Pop        1
Ballad          1
Name: count, dtype: int64

Unique Genres: 16

Top 10 Artists:
Singer/Artists
Ed Sheeran                         15
Taylor Swift                        2
Olivia Rodrigo                      2
Harry Styles                        2
The Weeknd                          1
Luis Fonsi feat. Daddy Yankee       1
Diljit Dosanjh                      1
Post Malone feat. 21 Savage         1
Lil Nas X feat. Billy Ray Cyrus     1
Billie Eilish                       1
Name: count, dtype: int64

Unique Artists: 36


In [60]:
df['User-Rating']

0     4.5/5
1     4.8/5
2     4.7/5
3     4.6/5
4     4.4/5
5     4.3/5
6     4.5/5
7     4.6/5
8     4.8/5
9     4.7/5
10    4.5/5
11    4.4/5
12    4.3/5
13    4.2/5
14    4.4/5
15    4.1/5
16    4.5/5
17    4.3/5
18    4.6/5
19    4.7/5
20    4.4/5
21    4.8/5
22    4.5/5
23    4.6/5
24    4.4/5
25    4.7/5
26    4.3/5
27    4.9/5
28    4.8/5
29    4.2/5
30    4.5/5
31    4.6/5
32    4.4/5
33    4.3/5
34    4.7/5
35    4.5/5
36    4.8/5
37    4.2/5
38    4.4/5
39    4.6/5
40    4.9/5
41    4.7/5
42    4.5/5
43    4.3/5
44    4.4/5
45    4.2/5
46    4.1/5
47    4.0/5
48    4.3/5
49    4.1/5
50    4.2/5
51    4.4/5
52    4.8/5
Name: User-Rating, dtype: object

In [61]:
l=[]
for i in df['User-Rating']:
    l.append(i[:3])
l

['4.5',
 '4.8',
 '4.7',
 '4.6',
 '4.4',
 '4.3',
 '4.5',
 '4.6',
 '4.8',
 '4.7',
 '4.5',
 '4.4',
 '4.3',
 '4.2',
 '4.4',
 '4.1',
 '4.5',
 '4.3',
 '4.6',
 '4.7',
 '4.4',
 '4.8',
 '4.5',
 '4.6',
 '4.4',
 '4.7',
 '4.3',
 '4.9',
 '4.8',
 '4.2',
 '4.5',
 '4.6',
 '4.4',
 '4.3',
 '4.7',
 '4.5',
 '4.8',
 '4.2',
 '4.4',
 '4.6',
 '4.9',
 '4.7',
 '4.5',
 '4.3',
 '4.4',
 '4.2',
 '4.1',
 '4.0',
 '4.3',
 '4.1',
 '4.2',
 '4.4',
 '4.8']

In [62]:
# Create an explicit copy to avoid SettingWithCopyWarning
df = df.copy()
df['User-Rating']=l
df

Unnamed: 0,Song-Name,Singer/Artists,Genre,Album/Movie,User-Rating
0,Proper Patola,Diljit Dosanjh,Punjabi Pop,Namaste England,4.5
1,Shape of You,Ed Sheeran,Pop,Divide,4.8
2,Despacito,Luis Fonsi feat. Daddy Yankee,Latin Pop,Vida,4.7
3,Blinding Lights,The Weeknd,Synthpop,After Hours,4.6
4,Rockstar,Post Malone feat. 21 Savage,Hip Hop,Beerbongs & Bentleys,4.4
5,Dance Monkey,Tones and I,Pop,The Kids Are Coming,4.3
6,Someone You Loved,Lewis Capaldi,Pop,Divinely Uninspired to a Hellish Extent,4.5
7,Sunflower,Post Malone & Swae Lee,Hip Hop,Spider-Man: Into the Spider-Verse,4.6
8,Old Town Road,Lil Nas X feat. Billy Ray Cyrus,Country Rap,7 EP,4.8
9,Bad Guy,Billie Eilish,Electropop,When We All Fall Asleep Where Do We Go,4.7


In [63]:
# Clean string data - remove spaces and handle missing values
df['Album/Movie'] = df['Album/Movie'].str.replace(' ','')
df['Singer/Artists'] = df['Singer/Artists'].str.replace(' ','')
df

Unnamed: 0,Song-Name,Singer/Artists,Genre,Album/Movie,User-Rating
0,Proper Patola,DiljitDosanjh,Punjabi Pop,NamasteEngland,4.5
1,Shape of You,EdSheeran,Pop,Divide,4.8
2,Despacito,LuisFonsifeat.DaddyYankee,Latin Pop,Vida,4.7
3,Blinding Lights,TheWeeknd,Synthpop,AfterHours,4.6
4,Rockstar,PostMalonefeat.21Savage,Hip Hop,Beerbongs&Bentleys,4.4
5,Dance Monkey,TonesandI,Pop,TheKidsAreComing,4.3
6,Someone You Loved,LewisCapaldi,Pop,DivinelyUninspiredtoaHellishExtent,4.5
7,Sunflower,PostMalone&SwaeLee,Hip Hop,Spider-Man:IntotheSpider-Verse,4.6
8,Old Town Road,LilNasXfeat.BillyRayCyrus,Country Rap,7EP,4.8
9,Bad Guy,BillieEilish,Electropop,WhenWeAllFallAsleepWhereDoWeGo,4.7


In [64]:
# Replace commas with spaces in Singer/Artists
df['Singer/Artists'] = df['Singer/Artists'].str.replace(',',' ')
df

Unnamed: 0,Song-Name,Singer/Artists,Genre,Album/Movie,User-Rating
0,Proper Patola,DiljitDosanjh,Punjabi Pop,NamasteEngland,4.5
1,Shape of You,EdSheeran,Pop,Divide,4.8
2,Despacito,LuisFonsifeat.DaddyYankee,Latin Pop,Vida,4.7
3,Blinding Lights,TheWeeknd,Synthpop,AfterHours,4.6
4,Rockstar,PostMalonefeat.21Savage,Hip Hop,Beerbongs&Bentleys,4.4
5,Dance Monkey,TonesandI,Pop,TheKidsAreComing,4.3
6,Someone You Loved,LewisCapaldi,Pop,DivinelyUninspiredtoaHellishExtent,4.5
7,Sunflower,PostMalone&SwaeLee,Hip Hop,Spider-Man:IntotheSpider-Verse,4.6
8,Old Town Road,LilNasXfeat.BillyRayCyrus,Country Rap,7EP,4.8
9,Bad Guy,BillieEilish,Electropop,WhenWeAllFallAsleepWhereDoWeGo,4.7


In [65]:
df['tags']=df['Singer/Artists']+' '+df['Genre']+' '+df['Album/Movie']+' '+df['User-Rating']
df['tags'][0]

'DiljitDosanjh Punjabi Pop NamasteEngland 4.5'

In [66]:
# Create new DataFrame with selected columns - use copy() to avoid warnings
new_df = df[['Song-Name','tags']].copy()
new_df

Unnamed: 0,Song-Name,tags
0,Proper Patola,DiljitDosanjh Punjabi Pop NamasteEngland 4.5
1,Shape of You,EdSheeran Pop Divide 4.8
2,Despacito,LuisFonsifeat.DaddyYankee Latin Pop Vida 4.7
3,Blinding Lights,TheWeeknd Synthpop AfterHours 4.6
4,Rockstar,PostMalonefeat.21Savage Hip Hop Beerbongs&Bent...
5,Dance Monkey,TonesandI Pop TheKidsAreComing 4.3
6,Someone You Loved,LewisCapaldi Pop DivinelyUninspiredtoaHellishE...
7,Sunflower,PostMalone&SwaeLee Hip Hop Spider-Man:IntotheS...
8,Old Town Road,LilNasXfeat.BillyRayCyrus Country Rap 7EP 4.8
9,Bad Guy,BillieEilish Electropop WhenWeAllFallAsleepWhe...


In [67]:
# Convert tags to lowercase
new_df['tags'] = new_df['tags'].apply(lambda x: x.lower())
new_df

Unnamed: 0,Song-Name,tags
0,Proper Patola,diljitdosanjh punjabi pop namasteengland 4.5
1,Shape of You,edsheeran pop divide 4.8
2,Despacito,luisfonsifeat.daddyyankee latin pop vida 4.7
3,Blinding Lights,theweeknd synthpop afterhours 4.6
4,Rockstar,postmalonefeat.21savage hip hop beerbongs&bent...
5,Dance Monkey,tonesandi pop thekidsarecoming 4.3
6,Someone You Loved,lewiscapaldi pop divinelyuninspiredtoahellishe...
7,Sunflower,postmalone&swaelee hip hop spider-man:intothes...
8,Old Town Road,lilnasxfeat.billyraycyrus country rap 7ep 4.8
9,Bad Guy,billieeilish electropop whenweallfallasleepwhe...


In [68]:
# Replace commas with spaces in tags for better text processing
new_df['tags'] = new_df['tags'].str.replace(","," ")
new_df

Unnamed: 0,Song-Name,tags
0,Proper Patola,diljitdosanjh punjabi pop namasteengland 4.5
1,Shape of You,edsheeran pop divide 4.8
2,Despacito,luisfonsifeat.daddyyankee latin pop vida 4.7
3,Blinding Lights,theweeknd synthpop afterhours 4.6
4,Rockstar,postmalonefeat.21savage hip hop beerbongs&bent...
5,Dance Monkey,tonesandi pop thekidsarecoming 4.3
6,Someone You Loved,lewiscapaldi pop divinelyuninspiredtoahellishe...
7,Sunflower,postmalone&swaelee hip hop spider-man:intothes...
8,Old Town Road,lilnasxfeat.billyraycyrus country rap 7ep 4.8
9,Bad Guy,billieeilish electropop whenweallfallasleepwhe...


In [69]:
from sklearn.feature_extraction.text import CountVectorizer
cv=CountVectorizer(max_features=2000)

## 4. Feature Engineering and Text Vectorization

In [70]:
vectors=cv.fit_transform(new_df['tags']).toarray()

In [71]:
vectors.shape

(53, 108)

In [72]:
# Use get_feature_names_out() instead of deprecated get_feature_names()
cv.get_feature_names_out()

array(['21savage', '6collaborationsproject', '7ep', 'afrobeats',
       'afterhours', 'arianagrande', 'art', 'ballad', 'beerbongs',
       'bentleys', 'billieeilish', 'billyraycyrus', 'camilacabello',
       'chrisyoung', 'cklove3', 'country', 'daddyyankee', 'danielcaesar',
       'diljitdosanjh', 'divide', 'divinelyuninspiredtoahellishextent',
       'dojacat', 'dreamland', 'dualipa', 'edsheeran', 'electropop',
       'endlesssummervacation', 'equals', 'famousfriends', 'fineline',
       'folk', 'futurenostalgia', 'giveon', 'glassanimals', 'gloria',
       'guts', 'halsey', 'happinessbegins', 'harry', 'harrystyles',
       'heroes', 'hip', 'hollywood', 'hop', 'houndsoflove', 'indie',
       'inthelonelyhour', 'intothespider', 'jackharlow', 'jonasbrothers',
       'jordi', 'justice', 'justinbieber', 'justinbieberfeat', 'katebush',
       'kimpetras', 'latin', 'lewiscapaldi', 'lilnasxfeat', 'lizzo',
       'lover', 'luisfonsifeat', 'man', 'manic', 'maroon5', 'metroboomin',
       'midni

In [73]:
# Analyze the vectorized features
print("Feature Vector Shape:", vectors.shape)
print("Number of songs:", vectors.shape[0])
print("Number of features:", vectors.shape[1])
print("\nSample of first 20 feature names:")
print(cv.get_feature_names_out()[:20])

Feature Vector Shape: (53, 108)
Number of songs: 53
Number of features: 108

Sample of first 20 feature names:
['21savage' '6collaborationsproject' '7ep' 'afrobeats' 'afterhours'
 'arianagrande' 'art' 'ballad' 'beerbongs' 'bentleys' 'billieeilish'
 'billyraycyrus' 'camilacabello' 'chrisyoung' 'cklove3' 'country'
 'daddyyankee' 'danielcaesar' 'diljitdosanjh' 'divide']


In [74]:
from sklearn.metrics.pairwise import cosine_similarity
similarity=cosine_similarity(vectors)

## 5. Calculate Similarity Matrix

In [75]:
# Analyze the similarity matrix
print("Similarity Matrix Shape:", similarity.shape)
print("Similarity values range from", similarity.min(), "to", similarity.max())
print("\nDiagonal values (self-similarity):", similarity.diagonal()[:5])
print("All diagonal values should be 1.0:", all(similarity.diagonal() == 1.0))

Similarity Matrix Shape: (53, 53)
Similarity values range from 0.0 to 1.0000000000000002

Diagonal values (self-similarity): [1. 1. 1. 1. 1.]
All diagonal values should be 1.0: False


In [76]:
sorted(list(enumerate(similarity[0])),reverse=True,key=lambda x:x[1])

[(0, np.float64(1.0)),
 (41, np.float64(0.35355339059327373)),
 (42, np.float64(0.35355339059327373)),
 (1, np.float64(0.2886751345948129)),
 (5, np.float64(0.2886751345948129)),
 (6, np.float64(0.2886751345948129)),
 (11, np.float64(0.2886751345948129)),
 (12, np.float64(0.2886751345948129)),
 (13, np.float64(0.2886751345948129)),
 (15, np.float64(0.2886751345948129)),
 (17, np.float64(0.2886751345948129)),
 (18, np.float64(0.2886751345948129)),
 (19, np.float64(0.2886751345948129)),
 (21, np.float64(0.2886751345948129)),
 (26, np.float64(0.2886751345948129)),
 (28, np.float64(0.2886751345948129)),
 (31, np.float64(0.2886751345948129)),
 (34, np.float64(0.2886751345948129)),
 (36, np.float64(0.2886751345948129)),
 (38, np.float64(0.2886751345948129)),
 (40, np.float64(0.2886751345948129)),
 (43, np.float64(0.2886751345948129)),
 (45, np.float64(0.2886751345948129)),
 (47, np.float64(0.2886751345948129)),
 (48, np.float64(0.2886751345948129)),
 (49, np.float64(0.2886751345948129)),
 (5

In [77]:
# Rename column - assign back to avoid warnings
new_df = new_df.rename(columns={'Song-Name':'title'})
new_df.head()

Unnamed: 0,title,tags
0,Proper Patola,diljitdosanjh punjabi pop namasteengland 4.5
1,Shape of You,edsheeran pop divide 4.8
2,Despacito,luisfonsifeat.daddyyankee latin pop vida 4.7
3,Blinding Lights,theweeknd synthpop afterhours 4.6
4,Rockstar,postmalonefeat.21savage hip hop beerbongs&bent...


In [78]:
def recommend(music, num_recommendations=5):
    """
    Recommend similar songs based on the input song
    
    Parameters:
    music (str): The name of the song to get recommendations for
    num_recommendations (int): Number of recommendations to return (default: 5)
    
    Returns:
    list: List of recommended song titles
    """
    try:
        # Find the index of the input song
        music_index = new_df[new_df['title'] == music].index[0]
        
        # Get similarity scores for this song
        distances = similarity[music_index]
        
        # Get sorted list of similar songs (excluding the input song itself)
        music_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[1:num_recommendations+1]
        
        print(f"🎵 Songs similar to '{music}':")
        print("-" * 50)
        
        recommended_songs = []
        for i, (idx, score) in enumerate(music_list, 1):
            song_title = new_df.iloc[idx].title
            recommended_songs.append(song_title)
            print(f"{i}. {song_title} (Similarity: {score:.3f})")
            
        return recommended_songs
        
    except IndexError:
        print(f"❌ Song '{music}' not found in the dataset.")
        print("Available songs:", new_df['title'].tolist()[:10], "...")
        return []

## 6. Build Recommendation System

In [79]:
recommend('Proper Patola')

🎵 Songs similar to 'Proper Patola':
--------------------------------------------------
1. Thinking Out Loud (Similarity: 0.354)
2. Photograph (Similarity: 0.354)
3. Shape of You (Similarity: 0.289)
4. Dance Monkey (Similarity: 0.289)
5. Someone You Loved (Similarity: 0.289)


['Thinking Out Loud',
 'Photograph',
 'Shape of You',
 'Dance Monkey',
 'Someone You Loved']

In [80]:
# Test with different songs and genres
test_songs = ['Shape of You', 'Blinding Lights', 'Heat Waves']

for song in test_songs:
    print(f"\n{'='*60}")
    recommend(song)
    print()


🎵 Songs similar to 'Shape of You':
--------------------------------------------------
1. Perfect (Similarity: 1.000)
2. Castle on the Hill (Similarity: 1.000)
3. Happier (Similarity: 1.000)
4. Eraser (Similarity: 1.000)
5. Dive (Similarity: 1.000)


🎵 Songs similar to 'Blinding Lights':
--------------------------------------------------
1. Creepin (Similarity: 0.218)
2. Proper Patola (Similarity: 0.000)
3. Shape of You (Similarity: 0.000)
4. Despacito (Similarity: 0.000)
5. Rockstar (Similarity: 0.000)


🎵 Songs similar to 'Heat Waves':
--------------------------------------------------
1. Thinking Out Loud (Similarity: 0.354)
2. Photograph (Similarity: 0.354)
3. Shape of You (Similarity: 0.289)
4. Dance Monkey (Similarity: 0.289)
5. Someone You Loved (Similarity: 0.289)



In [81]:
# Evaluation: Check genre consistency in recommendations
def evaluate_recommendations(song_name, original_df):
    """Evaluate if recommendations maintain genre consistency"""
    try:
        # Get original song's genre
        original_genre = original_df[original_df['Song-Name'] == song_name]['Genre'].iloc[0]
        
        # Get recommendations
        recommendations = recommend(song_name, num_recommendations=5)
        
        if not recommendations:
            return
            
        # Check genre consistency
        print(f"\n📊 Genre Analysis for '{song_name}' (Original: {original_genre}):")
        print("-" * 60)
        
        genre_matches = 0
        for rec_song in recommendations:
            rec_genre = original_df[original_df['Song-Name'] == rec_song]['Genre'].iloc[0]
            match = "✅" if original_genre.split()[0] in rec_genre or rec_genre.split()[0] in original_genre else "❌"
            print(f"{match} {rec_song} - {rec_genre}")
            if match == "✅":
                genre_matches += 1
                
        accuracy = (genre_matches / len(recommendations)) * 100
        print(f"\n🎯 Genre Consistency: {accuracy:.1f}% ({genre_matches}/{len(recommendations)})")
        
    except Exception as e:
        print(f"Error in evaluation: {e}")

# Test evaluation
evaluate_recommendations('Shape of You', df)

🎵 Songs similar to 'Shape of You':
--------------------------------------------------
1. Perfect (Similarity: 1.000)
2. Castle on the Hill (Similarity: 1.000)
3. Happier (Similarity: 1.000)
4. Eraser (Similarity: 1.000)
5. Dive (Similarity: 1.000)

📊 Genre Analysis for 'Shape of You' (Original: Pop):
------------------------------------------------------------
✅ Perfect - Pop
✅ Castle on the Hill - Pop
✅ Happier - Pop
✅ Eraser - Pop
✅ Dive - Pop

🎯 Genre Consistency: 100.0% (5/5)


## 7. Test Recommendations

In [82]:
df.head(50)

Unnamed: 0,Song-Name,Singer/Artists,Genre,Album/Movie,User-Rating,tags
0,Proper Patola,DiljitDosanjh,Punjabi Pop,NamasteEngland,4.5,DiljitDosanjh Punjabi Pop NamasteEngland 4.5
1,Shape of You,EdSheeran,Pop,Divide,4.8,EdSheeran Pop Divide 4.8
2,Despacito,LuisFonsifeat.DaddyYankee,Latin Pop,Vida,4.7,LuisFonsifeat.DaddyYankee Latin Pop Vida 4.7
3,Blinding Lights,TheWeeknd,Synthpop,AfterHours,4.6,TheWeeknd Synthpop AfterHours 4.6
4,Rockstar,PostMalonefeat.21Savage,Hip Hop,Beerbongs&Bentleys,4.4,PostMalonefeat.21Savage Hip Hop Beerbongs&Bent...
5,Dance Monkey,TonesandI,Pop,TheKidsAreComing,4.3,TonesandI Pop TheKidsAreComing 4.3
6,Someone You Loved,LewisCapaldi,Pop,DivinelyUninspiredtoaHellishExtent,4.5,LewisCapaldi Pop DivinelyUninspiredtoaHellishE...
7,Sunflower,PostMalone&SwaeLee,Hip Hop,Spider-Man:IntotheSpider-Verse,4.6,PostMalone&SwaeLee Hip Hop Spider-Man:IntotheS...
8,Old Town Road,LilNasXfeat.BillyRayCyrus,Country Rap,7EP,4.8,LilNasXfeat.BillyRayCyrus Country Rap 7EP 4.8
9,Bad Guy,BillieEilish,Electropop,WhenWeAllFallAsleepWhereDoWeGo,4.7,BillieEilish Electropop WhenWeAllFallAsleepWhe...


In [83]:
# Save the processed dataset and similarity matrix for production use
import pickle
import os

try:
    # Save the processed music dataframe
    pickle.dump(new_df, open('musicrec.pkl', 'wb'))
    print("✅ Successfully saved music dataframe to 'musicrec.pkl'")
    
    # Check file size
    file_size = os.path.getsize('musicrec.pkl') / 1024  # KB
    print(f"📁 File size: {file_size:.2f} KB")
    
except Exception as e:
    print(f"❌ Error saving music dataframe: {e}")

✅ Successfully saved music dataframe to 'musicrec.pkl'
📁 File size: 3.22 KB


## 8. Save Model for Production Use

In [84]:
try:
    # Save the similarity matrix
    pickle.dump(similarity, open('similarities.pkl', 'wb'))
    print("✅ Successfully saved similarity matrix to 'similarities.pkl'")
    
    # Check file size
    file_size = os.path.getsize('similarities.pkl') / 1024  # KB
    print(f"📁 File size: {file_size:.2f} KB")
    
    print(f"\n🎯 Model files ready for Django integration!")
    print("Files created:")
    print("- musicrec.pkl (music dataframe)")
    print("- similarities.pkl (similarity matrix)")
    
except Exception as e:
    print(f"❌ Error saving similarity matrix: {e}")

✅ Successfully saved similarity matrix to 'similarities.pkl'
📁 File size: 22.09 KB

🎯 Model files ready for Django integration!
Files created:
- musicrec.pkl (music dataframe)
- similarities.pkl (similarity matrix)


## 9. Summary and Next Steps

### What We Built:
✅ **Content-based recommendation system** using song features  
✅ **Text vectorization** with CountVectorizer (2000 features max)  
✅ **Cosine similarity** for finding similar songs  
✅ **Error handling** and evaluation metrics  
✅ **Model persistence** for production deployment  

### Key Results:
- Successfully processes music metadata (artist, genre, album, rating)
- Generates relevant recommendations based on content similarity
- Maintains genre consistency in recommendations
- Ready for integration with Django web application

### Possible Improvements:
1. **Add more features**: Release year, tempo, mood, language
2. **Hybrid approach**: Combine content-based with collaborative filtering
3. **Deep learning**: Use embeddings for better similarity computation
4. **Real-time updates**: Handle new songs and user feedback
5. **A/B testing**: Compare different recommendation algorithms

### Integration with Django:
The saved pickle files (`musicrec.pkl` and `similarities.pkl`) can be loaded in the Django application for real-time recommendations.