# Explore Spotify (Friends)

In this notebook, I'm going to explore the overlap in my music taste with that of my Spotify Friends. I'll do this by scraping my public playlists & comparing it to those of a select number of friends – then throwing all of that data into a graph structure & visualizing it with Neo4J.

### Part 1: Scraping Spotify

In this section, we're going to ping the Spotify Backend API through a GET Request from our browser session. To capture this request's headers, sign in to Spotify.com and use Chrome Developer Tools to capture any GET Request made to the Spotify Backend. Then, paste that CURL request into a tool like [Trillworks](https://curl.trillworks.com/) to convert the headers to Python syntax.

In [1]:
from tqdm.notebook import tqdm

In [2]:
import requests
from collections import defaultdict
import json

In [116]:
headers = {
    'authority': 'api.spotify.com',
    'accept': 'application/json',
    'authorization': 'Bearer INSERT_TOKEN_HERE',
    'accept-language': 'en',
    'app-platform': 'WebPlayer',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
    'spotify-app-version': '1591734700',
    'origin': 'https://open.spotify.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://open.spotify.com/user/MY_USER_ID_HERE',
}

In [117]:
def get_playlists(friend_id):
    params = (
        ('limit', '50'),
        ('market', 'from_token'),
    )

    response = requests.get(f'https://api.spotify.com/v1/users/{friend_id}/playlists', headers=headers, params=params)
    return response.json().get('items')

In [118]:
def get_tracks(base_url):
    if base_url == None: return []
    tracks = []
    r = requests.get(base_url, headers=headers, params=params)
    r = r.json()
    tracks.extend(r.get('items'))
    while r.get('next') != None:
        r = requests.get(r.get('next'), headers=headers).json()
        tracks.extend(r.get('items'))
        
        
    formatted = []
    for track in tracks:
        formatted.append({
            'added_at': track.get('added_at'),
            'added_by': track.get('added_by').get('id'),
            'album': track.get('track').get('album').get('name'),
            'release_date': track.get('track').get('album').get('release_date'),
            'artists': [a.get('name') for a in track.get('track').get('artists')],
            'duration_ms': track.get('track').get('duration_ms'),
            'name': track.get('track').get('name'),
            'popularity': track.get('track').get('popularity')
        })
    return formatted

In [119]:
playlists = {}

In [120]:
friends = ['krthaker1', '1279497886', '12120215048', 'immy15', 'khtqit04pn9l295uvsjy7r1j9', '22qdsjyotkicie3ak4ldgucia', 'z45z1rjvbvy33rv71csaewpm8']



In [121]:
for friend in friends:
    playlists[friend] = get_playlists(friend)

In [122]:
dataset = defaultdict(list)

In [123]:
for friend, playlists in tqdm(playlists.items(), leave=False, desc='Tracks'):
    for p in tqdm(playlists, leave=False, desc='Playlists'):
        details = {'name': p.get('name'), 
                   'description': p.get('description'), 
                   'id': p.get('id'), 
                   'owner': p.get('owner', {}).get('display_name'),
                   'tracks': get_tracks(p.get('tracks', {}).get('href'))}
        dataset[friend].append(details)

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  """Entry point for launching an IPython kernel.


HBox(children=(FloatProgress(value=0.0, description='Tracks', max=7.0, style=ProgressStyle(description_width='…

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  


HBox(children=(FloatProgress(value=0.0, description='Playlists', max=7.0, style=ProgressStyle(description_widt…

HBox(children=(FloatProgress(value=0.0, description='Playlists', max=21.0, style=ProgressStyle(description_wid…

HBox(children=(FloatProgress(value=0.0, description='Playlists', max=33.0, style=ProgressStyle(description_wid…

HBox(children=(FloatProgress(value=0.0, description='Playlists', max=10.0, style=ProgressStyle(description_wid…

HBox(children=(FloatProgress(value=0.0, description='Playlists', max=30.0, style=ProgressStyle(description_wid…

HBox(children=(FloatProgress(value=0.0, description='Playlists', max=5.0, style=ProgressStyle(description_widt…

HBox(children=(FloatProgress(value=0.0, description='Playlists', max=28.0, style=ProgressStyle(description_wid…

In [126]:
json.dump(dict(dataset), open('friend_data.json', 'w'))

### Part 2: Inserting Data into Neo4J for Visualization

In this section, we're going to insert our scraped data into Neo4J for visualization.

**Nodes:**
- Albums
- Artists
- Users

**Edges:**
- Users <--> Albums
- Users <--> Artists
- Albums <--> Artists

In [10]:
friend_map = {
    'krthaker1': 'Kanyes',
    '1279497886': 'Niky',
    '12120215048': 'Shubha',
    'khtqit04pn9l295uvsjy7r1j9': 'Shomil',
}

In [11]:
from neo4j import *

In [12]:
def clear_database(driver):
    # Delete all nodes (reset the database)
    with driver.session() as session:
        session.run('MATCH (n) DETACH DELETE n')

In [13]:
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', '123456'), encrypted=False)

In [14]:
clear_database(driver)

In [15]:
dataset = json.load(open('friend_data.json'))

In [16]:
added_tracks = set()
added_artists = set()

added_edges = set()

In [17]:
with driver.session() as session:
    
    for friend, playlists in tqdm(dataset.items(), leave=False, desc='0'):
        user_name = friend_map.get(friend)
        if user_name == None: continue
        session.run("CREATE (:User {name: $name})", name=user_name)
        
        for playlist in tqdm(playlists, leave=False, desc='1'):
            playlist_name = playlist.get('name')
            description = playlist.get('description')
            owner = playlist.get('owner')
            session.run("CREATE (:Playlist {name: $name, description: $description, owner: $owner})", name=playlist_name, description=description, owner=owner)
            
            session.run("MATCH (p:User {name: $user_name}) MATCH (t:Playlist {name: $playlist_name}) "
                        + "CREATE (p)-[:USER_PLAYLIST]->(t)", user_name=user_name, playlist_name=playlist_name)
            
            for track in tqdm(playlist.get('tracks'), leave=False, desc='2'):
                track_name = track.get('name')
                if track_name not in added_tracks:
                    session.run("CREATE (:Track {name: $name, duration_ms: $duration_ms, popularity: $popularity})", 
                                name=track_name, duration_ms=track.get('duration_ms'), popularity=track.get('popularity'))
                    added_tracks.add(track_name)
                    
                if str(user_name + track_name) not in added_edges:
                    session.run("MATCH (p:User {name: $user_name}) MATCH (t:Track {name: $track_name}) "
                                + "CREATE (p)-[:USER_TRACK]->(t)", user_name=user_name, track_name=track_name)
                    added_edges.add(user_name + track_name)

                session.run("MATCH (p:Track {name: $track_name}) MATCH (t:Playlist {name: $playlist_name}) "
                        + "CREATE (p)-[:TRACK_PLAYLIST]->(t)", track_name=track_name, playlist_name=playlist_name)
                    
                for artist in track.get('artists'):
                    artist_name = artist
                    if artist not in added_artists:
                        session.run("CREATE (:Artist {name: $name})", name=artist)
                        added_artists.add(artist)
                        
                    if str(track_name + artist_name) not in added_edges:
                        session.run("MATCH (p:Track {name: $track_name}) MATCH (t:Artist {name: $artist_name}) "
                        + "CREATE (p)-[:TRACK_ARTIST]->(t)", track_name=track_name, artist_name=artist_name)
                        added_edges.add(track_name + artist_name)
                    
                    if user_name + artist_name not in added_edges:
                        session.run("MATCH (p:User {name: $user_name}) MATCH (t:Artist {name: $artist_name}) "
                            + "CREATE (p)-[:USER_ARTIST]->(t)", user_name=user_name, artist_name=artist_name)
                        added_edges.add(user_name + artist_name)
                                    
                    if playlist_name + artist_name not in added_edges:
                        session.run("MATCH (p:Playlist {name: $playlist_name}) MATCH (t:Artist {name: $artist_name}) "
                            + "CREATE (p)-[:PLAYLIST_ARTIST]->(t)", playlist_name=playlist_name, artist_name=artist_name)
                        added_edges.add(playlist_name + artist_name)

HBox(children=(FloatProgress(value=0.0, description='0', max=7.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='1', max=7.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='2', max=18.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=77.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=170.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=52.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=47.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=142.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=39.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='1', max=21.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=2.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='2', max=1.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='2', max=25.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=14.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=35.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=74.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=30.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=56.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=75.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=51.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=117.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=24.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=29.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=96.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=102.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=41.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=90.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=59.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=67.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=66.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=105.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='1', max=33.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=19.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=47.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=45.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=21.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=104.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=33.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=40.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=103.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=196.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=113.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=5.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='2', max=21.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=18.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=92.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=9.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='2', max=14.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=70.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=70.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=10.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=35.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=29.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=22.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=17.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=3.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='2', max=16.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=28.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=6.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='2', max=34.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=72.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=47.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=13.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=6.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='2', max=18.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='1', max=30.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=1.0, style=ProgressStyle(description_width='initi…

HBox(children=(FloatProgress(value=0.0, description='2', max=26.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=50.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=44.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=29.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=21.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=24.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=28.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=43.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=55.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=69.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=40.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=224.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=96.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=21.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=39.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=58.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=26.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=20.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=23.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=44.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=24.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=30.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=67.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', style=ProgressStyle(description_width='initial')), HT…

HBox(children=(FloatProgress(value=0.0, description='2', max=129.0, style=ProgressStyle(description_width='ini…

HBox(children=(FloatProgress(value=0.0, description='2', max=30.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', style=ProgressStyle(description_width='initial')), HT…

HBox(children=(FloatProgress(value=0.0, description='2', max=30.0, style=ProgressStyle(description_width='init…

HBox(children=(FloatProgress(value=0.0, description='2', max=60.0, style=ProgressStyle(description_width='init…