# **Introduction:**

**Research Question**

This paper endeavors to answer the following question: "How might we create a personalized music recommendation system for users based on their listening history, without being invasive or relying on personal data?"




**Background / Relevance for Study**

 In today's digital age, music streaming services such as Spotify are becoming increasingly popular. However, users often struggle to find new music that suits their tastes with such a vast selection of music. Personalized music recommendation systems have become a popular solution to this problem. Given users' listening history, these systems can suggest new music that they may enjoy. Unfortunately, many existing systems rely heavily on users' personal data (eg: age, location, etc.), which raises concerns about privacy. Our proposed model aims to create a personalized music recommendation system that relies on users' listening history without being invasive. Presently, Spotify recommends content based both on the actual content of songs that a user likes, and also the relationship that one track has with other tracks, determined by a broader set of users. 

Our proposed response to our main query is to to create a 
novel music recommendation algorithm that differs from that of Spotify. Spotify present incorporates multiple recommendation methods, chiefly:


1.   Content-Based Recommendation
2.   Collaborative-Based Recommendation
3.Popularity-based recommendation

It is our objective to create a new method which does not incorporate collaborative-based recommendation. The goal of this change is to enhance the privacy of users such that their listening history is not communicated with other users, directly or indirectly.

As an example illustrating this use case, if a user (Bob) has one friend on spotify (Rob), Bob might be aware that Rob is an avid fan of Norwegian death metal if he receives that as a recommendation. Rob may prefer to keep that private, and would feasibly choose to opt into our algorithm which eschews collaborate-based recommendation in favor of his privacy.


**Varaibles, Parameters, and Assumptions**

Our variables will include users' listening history, the genres and artists of the music they listen to, and their interactions with the music streaming service (such as liking, disliking, or skipping songs). We will assume that users' listening history reflects their music preferences to some extent. We will also assume that the music streaming service has access to a large enough database of music to make relevant recommendations.


**Limitations of Data**

We source our data from KaggleSet data (https://www.kaggle.com/datasets/mrmorj/dataset-of-songs-in-spotify). While KaggleSet provides a large quantity of data, it does not contain every single song on Spotify; thus, not every song on a user's playlist may be represented in the data, renderinig the recommendation algorithm less accurate given its reduced information. In particular, our data source only holds information on: **[Trap, Techno, Techhouse, Trance, Psytrance, Dark Trap, DnB (drums and bass), Hardstyle, Underground Rap, Trap Metal, Emo, Rap, RnB, Pop, Hiphop].** 

Each song has a set of accomanying labels with further data, such as danceability, energy, loudness, musical key, and level of instrumentality (to name a few).

Our project and technical analysis consists of 5 major components, enumerated below:

1. Data Collection and Cleaning: 



## Imports

In [5]:
import pandas as pd
import numpy as np
import json
import re 
import sys
import itertools

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt


import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth
import spotipy.util as util

import warnings
warnings.filterwarnings("ignore")

In [6]:
%matplotlib inline

In [7]:
#Makes using jupyter notebook on laptops much easier
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

# Data Processing
dataset link: https://www.kaggle.com/datasets/ektanegi/spotifydata-19212020

In [9]:
spotify_df = pd.read_csv('genres_v2.csv')

In [10]:
spotify_df.head()

Unnamed: 0.1,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,genre,song_name,Unnamed: 0,title
0,0.831,0.814,2,-7.364,1,0.42,0.0598,0.0134,0.0556,0.389,156.985,audio_features,2Vc6NJ9PW9gD9q343XFRKx,spotify:track:2Vc6NJ9PW9gD9q343XFRKx,https://api.spotify.com/v1/tracks/2Vc6NJ9PW9gD...,https://api.spotify.com/v1/audio-analysis/2Vc6...,124539,4,Dark Trap,Mercury: Retrograde,,
1,0.719,0.493,8,-7.23,1,0.0794,0.401,0.0,0.118,0.124,115.08,audio_features,7pgJBLVz5VmnL7uGHmRj6p,spotify:track:7pgJBLVz5VmnL7uGHmRj6p,https://api.spotify.com/v1/tracks/7pgJBLVz5Vmn...,https://api.spotify.com/v1/audio-analysis/7pgJ...,224427,4,Dark Trap,Pathology,,
2,0.85,0.893,5,-4.783,1,0.0623,0.0138,4e-06,0.372,0.0391,218.05,audio_features,0vSWgAlfpye0WCGeNmuNhy,spotify:track:0vSWgAlfpye0WCGeNmuNhy,https://api.spotify.com/v1/tracks/0vSWgAlfpye0...,https://api.spotify.com/v1/audio-analysis/0vSW...,98821,4,Dark Trap,Symbiote,,
3,0.476,0.781,0,-4.71,1,0.103,0.0237,0.0,0.114,0.175,186.948,audio_features,0VSXnJqQkwuH2ei1nOQ1nu,spotify:track:0VSXnJqQkwuH2ei1nOQ1nu,https://api.spotify.com/v1/tracks/0VSXnJqQkwuH...,https://api.spotify.com/v1/audio-analysis/0VSX...,123661,3,Dark Trap,ProductOfDrugs (Prod. The Virus and Antidote),,
4,0.798,0.624,2,-7.668,1,0.293,0.217,0.0,0.166,0.591,147.988,audio_features,4jCeguq9rMTlbMmPHuO7S3,spotify:track:4jCeguq9rMTlbMmPHuO7S3,https://api.spotify.com/v1/tracks/4jCeguq9rMTl...,https://api.spotify.com/v1/audio-analysis/4jCe...,123298,4,Dark Trap,Venom,,
