Skip to content

nsanka/RecSys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spotify Playlist Recommender

spr_web_app

Open in Gitpod

authors: Naga Sanka, David Hernandez and Sheila Pietono date: Dec 18, 2021

Table of Contents

Description:

The purpose of this project was to build a recommendation system to allow users to discover music based on their listening preferences. Users can explore connections to music by providing playlist they already enjoy or favorites they already love.

Data:

  • We used the dataset provided by Spotify to enable research in music recommendations and can be accessed here
  • This dataset includes public playlists created by US Spotify users between January 2010 and November 2017.
  • It has 1 million Spotify playlists, over 2 million unique tracks, nearly 300,000 artists and 734,000 albums.
  • We obtained audio features for all tracks through Spotify's API. Features include danceability, tempo, liveness, speechiness, etc.

Features and Target Variables:


We used primarily an unsupervised learning clustering approach for this project, we predict the cluster for a given user, then use only that cluster to find the similar playlists and thereby use the playlists for song recommendations:

Data cleaning and pre-processing:

  • read playlists/track info from json files
  • EDA
  • extract audio features for each track
  • calculate average features for the playlist
  • normalize the features
  • project data into 2D space using TSNE
  • clustering using density and centroid models
  • identify optimum k using Silhouette/Davies-Bouldin/Calinski-Harabasz index

Code:

code/Get_MPD_Data.ipynb

  • This notebook is used to create the main .json file containing the playlists to train the model and to generate the recommendations.
  • The loop_slices() function will go through as many slices as desired to extract the necessary information from the playlists,
  • It is recommended to use 20 slices to run locally and scale it as needed with a bigger instances such as AWS.

code/Playlist_Recommendation.ipynb

  • This notebook will go through the entire analysis and development for the model and the recommendations. It describes what methods are used and how those were selected.
  • Seven different models were selected from different families and a 2D projection with TSNE was done. At 20,000 playlists, KMeans with k=17 is the best performer.
  • This notebook will generate the model and the playlist dataset to be used. All the models and datasets are saved locally.

code/read_spotify_million_playlists.py

  • This is the primary code that we used to read all the million playlists information
  • This code exports sqlite database tables that are eventually used in the streamlit app

streamlit/app.py

  • This is the code used to build the streamlit web application
  • This calls the class defined in spotify_client.py to get recommendations

streamlit/spotipy_client.py

  • This code was primarily used to generate the song recommendations based on user input
  • This code also has a class to connect to Spotify API using user access token
  • It takes machine learning models generated above and user input from web app to recommend top n songs
  • It also has functions to create visualizations

streamlit/style.css

  • This is used to define web app CSS styles

Summary:

The final product is a streamlit app which allows users to do the following:

  • explore the similar or dissimilar songs related to the users music
  • explore the top playlists that are similar to their preference
  • see the genres they listen to the most
  • obtain recommended songs to listen to based on users favorites collected in a time frame of last month, 6 months or all time
  • obtain recommended songs to listen to based on any playlist

Blogs:

Part 1: Create Development Environment
Part 2: Get the music dataset and perform Exploratory Data Analysis
Part 3: Build and train machine learning models
Part 4: Evaluate the effect of dataset size on machine learning models
PART 5: Pushing the Project to Cloud Computing (AWS Instance)
Final Part: Deploy ML Based Recommender System into Production

Contributions:


This project marks the completion of a Master's degree in Applied Data Science at Univeristy of Michigan.
Naga Sanka - Data extraction and manipulation, Update code to recommendation predictions. Deploy the model to a web app.
David Hernandez - Machine Learning Modeling and Recommender System for different dataset sizes.
Sheila Pietono - Exploratory data analysis and Scale the data to an AWS instance.

Future Improvements:


This data set has much potential to keep working on it and I would like to list all the potential ramifications and future work that can be done with it.
We performed quite a few QC checks of the data to make sure recommendations made sense by comparing average audio features between the playlists.

There is still future work to be done on this project as this app is currently still in beta mode. These are the proposed future improvements:

  • properly link to Spotify API so that users are not requied to copy/paste the access token
  • implement collaborative based recommendation in addition to content based recommendations
  • use deep learning for recommender system
  • use the user feedback to improve the recommendations

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the GNU GENERAL PUBLIC LICENSE. See LICENSE for more information.

About

Build Recommender System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages