# Last.fm Downloader

## Setup and Installation

* Clone credentials-sample.json as credentials.json
* Create a [Last.fm API Account](https://www.last.fm/api/account/create)
* Copy your key and user id and save to relevant fields in credentials.json

Once finished run the cells below to generate CSV exports of your Last.fm music history. 

**NOTE: Depending on size of your last.fm listening history, this process may take some time. Please be patient.**

### Acknowledgements and Helpful Resources 

  - ["Analyzing Last.fm Listening History"](http://geoffboeing.com/2016/05/analyzing-lastfm-history/) and [Code](https://github.com/gboeing/data-visualization/tree/master/lastfm-listening-history) by Geoff Boeing
  - Last.fm API documentation: http://www.last.fm/api
  - For anything more complicated, you might use this Python wrapper for the API: https://github.com/pylast/pylast

-----

## Credentials and Authentification

In [1]:
import json

with open("credentials.json", "r") as file:
    credentials = json.load(file)
    last_fm_cr = credentials['last_fm']
    key = last_fm_cr['KEY']
    username = last_fm_cr['USERNAME']

In [2]:
# how long to pause between consecutive API requests
pause_duration = 0.2

## Dependencies

In [3]:
import requests
import json
import time
import pandas as pd

## First get your all-time most played tracks, artists, and albums

In [4]:
url = 'https://ws.audioscrobbler.com/2.0/?method=user.get{}&user={}&api_key={}&limit={}&extended={}&page={}&format=json'
limit = 200 #api lets you retrieve up to 200 records per call
extended = 0 #api lets you retrieve extended data for each track, 0=no, 1=yes
page = 1 #page of results to start retrieving at

In [5]:
method = 'toptracks'
request_url = url.format(method, username, key, limit, extended, page)
artist_names = []
track_names = []
play_counts = []
response = requests.get(request_url).json()
for item in response[method]['track']:
    artist_names.append(item['artist']['name'])
    track_names.append(item['name'])
    play_counts.append(item['playcount'])

top_tracks = pd.DataFrame()
top_tracks['artist'] = artist_names
top_tracks['track'] = track_names
top_tracks['play_count'] = play_counts
top_tracks.to_csv('data/lastfm_top_tracks.csv', index=None, encoding='utf-8')
top_tracks.head()

Unnamed: 0,artist,track,play_count
0,Admo,Sparks,141
1,Soleil Soleil,I'm At The Bottom Of The Ocean,112
2,Men I Trust,Lauren,106
3,Redbone,Come and Get Your Love,100
4,Luis Fonsi,Despacito ft Daddy Yankee,100


In [6]:
method = 'topartists'
request_url = url.format(method, username, key, limit, extended, page)
artist_names = []
play_counts = []
response = requests.get(request_url).json()
for item in response[method]['artist']:
    artist_names.append(item['name'])
    play_counts.append(item['playcount'])

top_artists = pd.DataFrame()
top_artists['artist'] = artist_names
top_artists['play_count'] = play_counts
top_artists.to_csv('data/lastfm_top_artists.csv', index=None, encoding='utf-8')
top_artists.head()

Unnamed: 0,artist,play_count
0,Flight Facilities,251
1,The Shins,159
2,Admo,141
3,Ten Fé,122
4,Midlake,120


In [7]:
method = 'topalbums'
request_url = url.format(method, username, key, limit, extended, page)
artist_names = []
album_names = []
play_counts = []
response = requests.get(request_url).json()
for item in response[method]['album']:
    artist_names.append(item['artist']['name'])
    album_names.append(item['name'])
    play_counts.append(item['playcount'])

top_albums = pd.DataFrame()
top_albums['artist'] = artist_names
top_albums['album'] = album_names
top_albums['play_count'] = play_counts
top_albums.to_csv('data/lastfm_top_albums.csv', index=None, encoding='utf-8')
top_albums.head()

Unnamed: 0,artist,album,play_count
0,Generationals,BIRP! September 2014,217
1,Gems,BIRP! July 2014,161
2,Her,BIRP! November 2015,108
3,Burning Hotels,BIRP! September 2011,108
4,Journey,Greatest Hits,88


## Now get all your scrobbles

Last.fm provides this 'recenttracks' API method to get 'all' scrobbles. However, it seems to be pretty spotty for data from circa 2007. The best way to determine top tracks, artists, albums is with the cells above. However, the code below retrieves time series data of all scrobbles (but with the caveat of spotty data from 2007 and earlier).

Sample URL: https://ws.audioscrobbler.com/2.0/?method=user.getrecenttracks&user=gboeing&api_key={}&limit=1&extended=0&page=1&format=json

In [8]:
def get_scrobbles(method='recenttracks', username=username, key=key, limit=200, extended=0, page=1, pages=0):
    '''
    method: api method
    username/key: api credentials
    limit: api lets you retrieve up to 200 records per call
    extended: api lets you retrieve extended data for each track, 0=no, 1=yes
    page: page of results to start retrieving at
    pages: how many pages of results to retrieve. if 0, get as many as api can return.
    '''
    # initialize url and lists to contain response fields
    url = 'https://ws.audioscrobbler.com/2.0/?method=user.get{}&user={}&api_key={}&limit={}&extended={}&page={}&format=json'
    responses = []
    artist_names = []
    artist_mbids = []
    album_names = []
    album_mbids = []
    track_names = []
    track_mbids = []
    timestamps = []
    
    # make first request, just to get the total number of pages
    request_url = url.format(method, username, key, limit, extended, page)
    response = requests.get(request_url).json()
    total_pages = int(response[method]['@attr']['totalPages'])
    if pages > 0:
        total_pages = min([total_pages, pages])
        
    print('{} total pages to retrieve'.format(total_pages))
    
    # request each page of data one at a time
    for page in range(1, int(total_pages) + 1, 1):
        if page % 10 == 0: print(page, end=' ')
        time.sleep(pause_duration)
        request_url = url.format(method, username, key, limit, extended, page)
        responses.append(requests.get(request_url))
    
    # parse the fields out of each scrobble in each page (aka response) of scrobbles
    for response in responses:
        scrobbles = response.json()
        for scrobble in scrobbles[method]['track']:
            # only retain completed scrobbles (aka, with timestamp and not 'now playing')
            if 'date' in scrobble.keys():
                artist_names.append(scrobble['artist']['#text'])
                artist_mbids.append(scrobble['artist']['mbid'])
                album_names.append(scrobble['album']['#text'])
                album_mbids.append(scrobble['album']['mbid'])
                track_names.append(scrobble['name'])
                track_mbids.append(scrobble['mbid'])
                timestamps.append(scrobble['date']['uts'])
                
    # create and populate a dataframe to contain the data
    df = pd.DataFrame()
    df['artist'] = artist_names
    df['artist_mbid'] = artist_mbids
    df['album'] = album_names
    df['album_mbid'] = album_mbids
    df['track'] = track_names
    df['track_mbid'] = track_mbids
    df['timestamp'] = timestamps
    df['datetime'] = pd.to_datetime(df['timestamp'].astype(int), unit='s')
    
    return df

In [9]:
# get all scrobbled tracks ever, in order of recency (pages=0 to get all)
scrobbles = get_scrobbles(pages=0)

54 total pages to retrieve
10 20 30 40 50 

In [10]:
# save the dataset
scrobbles.to_csv('data/lastfm_scrobbles.csv', index=None, encoding='utf-8')
print('{:,} total rows'.format(len(scrobbles)))
scrobbles.head()

10,684 total rows


Unnamed: 0,artist,artist_mbid,album,album_mbid,track,track_mbid,timestamp,datetime
0,Daniel Deluxe,20e17bb3-4b34-48d3-bc35-d8e6638e3b4d,"Desync (Original Soundtrack, Vol. 1)",,Breakout,,1527060513,2018-05-23 07:28:33
1,Occams Laser,,Take Your Time,,Just Give Me Your Love,,1527060296,2018-05-23 07:24:56
2,Droid Bishop,,End of Aquarius,,Sagan's Quest,,1527060009,2018-05-23 07:20:09
3,Arcade High,3fe8b0f1-e0b1-406e-a152-2cd6d5ec5145,Kingdom,,Badlands,,1527059765,2018-05-23 07:16:05
4,Kalax,,Kalax,,Soaring,,1527059474,2018-05-23 07:11:14
