### Example – Content-Based Filtering (Songs by Danceability & Valence)
We’ll simulate Spotify audio features.

In [4]:
# Install sklearn in a Jupyter notebook
!pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.7.2-cp310-cp310-win_amd64.whl.metadata (11 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.7.2-cp310-cp310-win_amd64.whl (8.9 MB)
   ---------------------------------------- 0.0/8.9 MB ? eta -:--:--
   ---- ----------------------------------- 1.0/8.9 MB 7.2 MB/s eta 0:00:02
   -------------- ------------------------- 3.1/8.9 MB 9.2 MB/s eta 0:00:01
   ---------------------- ----------------- 5.0/8.9 MB 8.9 MB/s eta 0:00:01
   ----------------------------- ---------- 6.6/8.9 MB 8.8 MB/s eta 0:00:01
   ------------------------------------- -- 8.4/8.9 MB 8.7 MB/s eta 0:00:01
   ---------------------------------------- 8.9/8.9 MB 8.4 MB/s  0:00:01
Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, scikit-learn

   -------------------- ------------------- 1/2 [scikit-learn]
   ---------

In [5]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

# Example song dataset (Spotify API gives features like danceability, valence)
songs = pd.DataFrame({
    "song_id": ["S1", "S2", "S3", "S4", "S5"],
    "title": ["Happy Vibes", "Sad Piano", "Dance Floor", "Chill Beats", "Workout Energy"],
    "danceability": [0.8, 0.2, 0.9, 0.4, 0.95],
    "valence": [0.9, 0.1, 0.8, 0.5, 0.85]   # valence ~ how positive/happy a song feels
})

print("🎶 Song Dataset:\n", songs)

🎶 Song Dataset:
   song_id           title  danceability  valence
0      S1     Happy Vibes          0.80     0.90
1      S2       Sad Piano          0.20     0.10
2      S3     Dance Floor          0.90     0.80
3      S4     Chill Beats          0.40     0.50
4      S5  Workout Energy          0.95     0.85


In [6]:
# Select features
features = songs[["danceability", "valence"]]

# Compute similarity
similarity = cosine_similarity(features)

# Put into DataFrame for clarity
sim_df = pd.DataFrame(similarity, index=songs["title"], columns=songs["title"])
print("\n🎵 Song Similarity (Content-Based):\n", sim_df)


🎵 Song Similarity (Content-Based):
 title           Happy Vibes  Sad Piano  Dance Floor  Chill Beats  \
title                                                              
Happy Vibes        1.000000   0.928477     0.993103     0.998653   
Sad Piano          0.928477   1.000000     0.965616     0.907959   
Dance Floor        0.993103   0.965616     1.000000     0.985684   
Chill Beats        0.998653   0.907959     0.985684     1.000000   
Workout Energy     0.993480   0.964764     0.999995     0.986228   

title           Workout Energy  
title                           
Happy Vibes           0.993480  
Sad Piano             0.964764  
Dance Floor           0.999995  
Chill Beats           0.986228  
Workout Energy        1.000000  


In [7]:
# Example: Recommend songs similar to "Happy Vibes"
target_song = "Happy Vibes"
recommendations = sim_df[target_song].sort_values(ascending=False).iloc[1:3]
print(f"\n✅ Recommendations for '{target_song}':")
print(recommendations)


✅ Recommendations for 'Happy Vibes':
title
Chill Beats       0.998653
Workout Energy    0.993480
Name: Happy Vibes, dtype: float64


Means: If user liked “Happy Vibes”, recommend “Dance Floor” & “Workout Energy”.

### Example – Collaborative Filtering (Fake User-Song Ratings)
We’ll simulate user ratings (like/dislike).

In [8]:
import numpy as np

# Fake user-song rating matrix (rows = users, cols = songs)
ratings = pd.DataFrame({
    "S1": [5, 4, 0, 0],
    "S2": [0, 0, 4, 5],
    "S3": [5, 0, 4, 0],
    "S4": [0, 3, 0, 4],
    "S5": [4, 0, 5, 0],
}, index=["User1", "User2", "User3", "User4"])

print("🎧 User-Song Ratings:\n", ratings)

🎧 User-Song Ratings:
        S1  S2  S3  S4  S5
User1   5   0   5   0   4
User2   4   0   0   3   0
User3   0   4   4   0   5
User4   0   5   0   4   0


In [9]:
# Compute cosine similarity between users
user_similarity = cosine_similarity(ratings)
user_sim_df = pd.DataFrame(user_similarity, index=ratings.index, columns=ratings.index)
print("\n👥 User Similarity (Collaborative):\n", user_sim_df)


👥 User Similarity (Collaborative):
           User1     User2     User3     User4
User1  1.000000  0.492366  0.652155  0.000000
User2  0.492366  1.000000  0.000000  0.374817
User3  0.652155  0.000000  1.000000  0.413714
User4  0.000000  0.374817  0.413714  1.000000


In [10]:
# Example: Recommend for User1 based on similar users
similar_users = user_sim_df["User1"].sort_values(ascending=False).iloc[1:3]
print("\nMost similar users to User1:\n", similar_users)


Most similar users to User1:
 User3    0.652155
User2    0.492366
Name: User1, dtype: float64


In [11]:
# Get recommendations from similar users
user1_ratings = ratings.loc["User1"]
recommend_candidates = ratings.loc["User3"]  # pick a similar user
recommended_songs = recommend_candidates[user1_ratings == 0].sort_values(ascending=False)

print(f"\n✅ Recommendations for User1 (Collaborative Filtering):")
print(recommended_songs)



✅ Recommendations for User1 (Collaborative Filtering):
S2    4
S4    0
Name: User3, dtype: int64


Means: User1 has not rated S4 and  S2 but similar users liked them → recommend these songs.