# Estimated Spotify Plays

In this project I want to build a deep learning model that predicts how often I will listen to a song based on its audio features which can be retrieved via the [Spotify Web API](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/).

## The Dataset

Let's have a look at the dataset I have to start with.

In [1]:
!ls dataset

hoergewohnheiten.csv  last_fm.csv


There are two CSV files which hold information about when (UTC timestamp) I did listen to a certain song (identifies via the combination of song title and artist name). One file is a export of [LastFM](https://mainstream.ghan.nl/export.html) the other one is from a side project I started a while ago which is called [Hoergewohnheiten](https://github.com/mymindwentblvnk/hoergewohnheiten)

Let's see how files look like.

In [2]:
!head -5 dataset/last_fm.csv

uts,utc_time,artist,artist_mbid,album,album_mbid,track,track_mbid
"1486670769","09 Feb 2017, 20:06","Shy Glizzy","21354007-a91f-4460-934d-12d389de3a2a","Homieland, Vol. 2","","Woah",""
"1486658377","09 Feb 2017, 16:39","Dre","9ccc0b15-6506-46ba-8bca-6e35d40ebfa7","Rich & Lit","","5 Rounds",""
"1486658166","09 Feb 2017, 16:36","Dre","9ccc0b15-6506-46ba-8bca-6e35d40ebfa7","Rich & Lit","","Fine Ass Girls",""
"1486507369","07 Feb 2017, 22:42","Cosima","dcf74737-7d5b-4847-acdb-9edec6a7cea1","To Build A House","","To Build A House",""


In [3]:
!head -5 dataset/hoergewohnheiten.csv

timestamp,title,artist,album
1530516395,"Ella, elle l'a - Remasterisé",France Gall,Babacar ( Remasterisé)
1530516323,Get Down,Junglepussy,Jp3
1530516172,State of the Union,Junglepussy,Jp3
1530516088,Jammin That Screw,Trae Tha Truth,48 Hours Later


It loks like the LastFM export has a unique identifier but that does not help with the Hoergewohnheiten data. So now I want to build the following datastructure with help of Paul Lamere's [spotipy](https://github.com/plamere/spotipy) where every row represents one track:

| tempo | valence | energy | ... | danceability | plays |
|-------|---------|--------|-----|--------------|-------|
| 98.30 | 0.523   | 0.993  | ... | 0.7350       | 12    |
| 132.4 | 0.24    | 0.451  | ... | 0.99002      | 130   |
| 78.0  | 0.9     | 0.56   | ... | 0.12502      | 2     |
| ...   | ...     | ...    | ... | ...          | ...   |

### Count plays per Track

In [4]:
import csv
from collections import defaultdict

splitter = '#*#*#*#*#*#*#'
play_data_dict = defaultdict(int)

with open('dataset/hoergewohnheiten.csv', 'r') as hoergewohnheiten_in:
    reader = csv.DictReader(hoergewohnheiten_in)
    for row in reader:
        temp_identifier = "{artist}{splitter}{title}".format(title=row['title'],
                                                             artist=row['artist'],
                                                             splitter=splitter)
        play_data_dict[temp_identifier] += 1
        
with open('dataset/last_fm.csv', 'r') as last_fm_in:
    reader = csv.DictReader(last_fm_in)
    for row in reader:
        temp_identifier = "{artist}{splitter}{title}".format(title=row['track'],
                                                             artist=row['artist'],
                                                             splitter=splitter)
        play_data_dict[temp_identifier] += 1


In [5]:
play_data = list(
    zip(
        list([k.split(splitter) for k in play_data_dict.keys()]), 
        list(play_data_dict.values())
    )
)

In [11]:
print(len(play_data), "plays found.")
for i in range(10):
    print(play_data[i])

21501 plays found.
(['France Gall', "Ella, elle l'a - Remasterisé"], 2)
(['Junglepussy', 'Get Down'], 1)
(['Junglepussy', 'State of the Union'], 1)
(['Trae Tha Truth', 'Jammin That Screw'], 5)
(['Faithless', 'Insomnia'], 9)
(['Faithless', 'God Is a DJ - Radio Mix'], 2)
(['DJ Bobo', 'Everybody'], 6)
(['Robin S', 'Show Me Love'], 1)
(['Ricky Martin', "La Copa de la Vida (La Cancion Oficial de la Copa Mundial, Francia '98) - Spanglish Radio Edit"], 1)
(['Members Of Mayday', 'Sonic Empire - Short Mix'], 1)


### Get Spotify information from API

In [12]:
from spotipy import Spotify
import spotipy.util

try:
    import spotify_settings
    user_name = spotify_settings.USER_NAME
    client_id = spotify_settings.CLIENT_ID
    client_secret = spotify_settings.CLIENT_SECRET
    redirect_uri = spotify_settings.REDIRECT_URI
except ImportError:
    user_name = None
    client_id = None
    client_secret = None
    redirect_uri = None

token = spotipy.util.prompt_for_user_token(
    user_name,
    scope='user-library-read',
    client_id=client_id,
    client_secret=client_secret,
    redirect_uri=redirect_uri
)
spotify_client = Spotify(auth=token)

In [13]:
play_data_with_id = dict()

for play in play_data[:10]:  # TODO Remove slicing
    q = '{} {}'.format(play[0][1], play[0][0])
    result = spotify_client.search(q=q, type='track', limit=1)
    
    if len(result['tracks']['items']) == 1:
        track_id = result['tracks']['items'][0]['id']
        name = result['tracks']['items'][0]['name']
        artist = result['tracks']['items'][0]['artists'][0]['name']
        
        play_data_with_id[track_id] = {
            'id':  track_id,
            'name': name,
            'artist': artist,
            'plays': play[1]
        }

In [14]:
play_data_with_id

{'67CfbIKKTCKFqmQLJwUftX': {'id': '67CfbIKKTCKFqmQLJwUftX',
  'name': "Ella, elle l'a - Remasterisé",
  'artist': 'France Gall',
  'plays': 2},
 '1zLJktzNI18DO9MPFI6iCW': {'id': '1zLJktzNI18DO9MPFI6iCW',
  'name': 'Get Down',
  'artist': 'Junglepussy',
  'plays': 1},
 '4AZVT4epcaUxsAoGl1o1nE': {'id': '4AZVT4epcaUxsAoGl1o1nE',
  'name': 'State of the Union',
  'artist': 'Junglepussy',
  'plays': 1},
 '4IdMRBuYw4qyrKClsvIAK9': {'id': '4IdMRBuYw4qyrKClsvIAK9',
  'name': 'Jammin That Screw',
  'artist': 'Trae Tha Truth',
  'plays': 5},
 '3dX6WDwnHwYzB5t754oB4T': {'id': '3dX6WDwnHwYzB5t754oB4T',
  'name': 'Insomnia - Radio Edit',
  'artist': 'Faithless',
  'plays': 9},
 '2pHRKegE8YjSv0SWO07R7Z': {'id': '2pHRKegE8YjSv0SWO07R7Z',
  'name': 'God Is a DJ - Radio Mix',
  'artist': 'Faithless',
  'plays': 2},
 '29jPmPUIHfd8CoPEOOJ8Gg': {'id': '29jPmPUIHfd8CoPEOOJ8Gg',
  'name': 'Everybody',
  'artist': 'DJ Bobo',
  'plays': 6},
 '4t0UsYzmmmZRMTWn77jiGF': {'id': '4t0UsYzmmmZRMTWn77jiGF',
  'name':