GitHub

Spotify Playlist Recommender

authors: Naga Sanka, David Hernandez and Sheila Pietono date: Dec 18, 2021

Description:

The purpose of this project was to build a recommendation system to allow users to discover music based on their listening preferences. Users can explore connections to music by providing playlist they already enjoy or favorites they already love.

Data:

We used the dataset provided by Spotify to enable research in music recommendations and can be accessed here
This dataset includes public playlists created by US Spotify users between January 2010 and November 2017.
It has 1 million Spotify playlists, over 2 million unique tracks, nearly 300,000 artists and 734,000 albums.
We obtained audio features for all tracks through Spotify's API. Features include danceability, tempo, liveness, speechiness, etc.

Features and Target Variables:

We used primarily an unsupervised learning clustering approach for this project, we predict the cluster for a given user, then use only that cluster to find the similar playlists and thereby use the playlists for song recommendations:

Data cleaning and pre-processing:

read playlists/track info from json files
EDA
extract audio features for each track
calculate average features for the playlist
normalize the features
project data into 2D space using TSNE
clustering using density and centroid models
identify optimum k using Silhouette/Davies-Bouldin/Calinski-Harabasz index

Code:

code/Get_MPD_Data.ipynb

This notebook is used to create the main .json file containing the playlists to train the model and to generate the recommendations.
The loop_slices() function will go through as many slices as desired to extract the necessary information from the playlists,
It is recommended to use 20 slices to run locally and scale it as needed with a bigger instances such as AWS.

code/Playlist_Recommendation.ipynb

This notebook will go through the entire analysis and development for the model and the recommendations. It describes what methods are used and how those were selected.
Seven different models were selected from different families and a 2D projection with TSNE was done. At 20,000 playlists, KMeans with k=17 is the best performer.
This notebook will generate the model and the playlist dataset to be used. All the models and datasets are saved locally.

code/read_spotify_million_playlists.py

This is the primary code that we used to read all the million playlists information
This code exports sqlite database tables that are eventually used in the streamlit app

streamlit/app.py

This is the code used to build the streamlit web application
This calls the class defined in spotify_client.py to get recommendations

streamlit/spotipy_client.py

This code was primarily used to generate the song recommendations based on user input
This code also has a class to connect to Spotify API using user access token
It takes machine learning models generated above and user input from web app to recommend top n songs
It also has functions to create visualizations

streamlit/style.css

This is used to define web app CSS styles

Summary:

The final product is a streamlit app which allows users to do the following:

explore the similar or dissimilar songs related to the users music
explore the top playlists that are similar to their preference
see the genres they listen to the most
obtain recommended songs to listen to based on users favorites collected in a time frame of last month, 6 months or all time
obtain recommended songs to listen to based on any playlist

Blogs:

Part 1: Create Development Environment
Part 2: Get the music dataset and perform Exploratory Data Analysis
Part 3: Build and train machine learning models
Part 4: Evaluate the effect of dataset size on machine learning models
PART 5: Pushing the Project to Cloud Computing (AWS Instance)
Final Part: Deploy ML Based Recommender System into Production

Contributions:

This project marks the completion of a Master's degree in Applied Data Science at Univeristy of Michigan.
Naga Sanka - Data extraction and manipulation, Update code to recommendation predictions. Deploy the model to a web app.
David Hernandez - Machine Learning Modeling and Recommender System for different dataset sizes.
Sheila Pietono - Exploratory data analysis and Scale the data to an AWS instance.

Future Improvements:

This data set has much potential to keep working on it and I would like to list all the potential ramifications and future work that can be done with it.
We performed quite a few QC checks of the data to make sure recommendations made sense by comparing average audio features between the playlists.

There is still future work to be done on this project as this app is currently still in beta mode. These are the proposed future improvements:

properly link to Spotify API so that users are not requied to copy/paste the access token
implement collaborative based recommendation in addition to content based recommendations
use deep learning for recommender system
use the user feedback to improve the recommendations

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the GNU GENERAL PUBLIC LICENSE. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
code		code
data		data
images		images
models		models
streamlit		streamlit
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

nsanka/RecSys

Folders and files

Latest commit

History

Repository files navigation

Spotify Playlist Recommender

Table of Contents

Description:

Data:

Features and Target Variables:

Data cleaning and pre-processing:

Code:

code/Get_MPD_Data.ipynb

code/Playlist_Recommendation.ipynb

code/read_spotify_million_playlists.py

streamlit/app.py

streamlit/spotipy_client.py

streamlit/style.css

Summary:

Blogs:

Contributions:

Future Improvements:

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Languages