# 00: Feature Pull from Spotify

The starting point for this project, naturally, is extracting my music from Spotify via the Spotify API. Ultimately, I want to have a UI where I can input either a playlist, or a collection (2+) of artists, from which the graph will expand outwards from. So, I need some to establish the pipeline of grabbing that information, and also just to grab starter data. 

In [3]:
import pandas as pd
import numpy as np
import json
import spotipy
import re
import os
from dotenv import load_dotenv
from spotipy.oauth2 import SpotifyClientCredentials

### Set up

In [4]:
load_dotenv()

True

In [5]:
CLIENT_ID = os.environ.get("SPOTIFY_CLIENT_ID")
CLIENT_SECRET = os.environ.get("SPOTIFY_CLIENT_SECRET")

In [6]:
auth_manager = SpotifyClientCredentials(client_id=CLIENT_ID,
                                        client_secret=CLIENT_SECRET)
sp = spotipy.Spotify(auth_manager=auth_manager)

### Test Playlist

In [10]:
hardnheavy_url = "https://open.spotify.com/playlist/142oZDOc1za2dkUwyonA1P?si=9477d01b58b34669"

In [65]:
# grab artists in my hard n heavy playlist
response = sp.playlist_tracks(hardnheavy_url, offset=0)
artists = [track["track"]["artists"][0] for track in response["items"]]
total = response["total"]

for offset in range(100, total + 1, 100): 

    response = sp.playlist_tracks(hardnheavy_url, offset=offset)
    artists += [track["track"]["artists"][0] for track in response["items"]]

artists[:5]

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/3Ri4H12KFyu98LMjSoij5V'},
  'href': 'https://api.spotify.com/v1/artists/3Ri4H12KFyu98LMjSoij5V',
  'id': '3Ri4H12KFyu98LMjSoij5V',
  'name': 'Bad Omens',
  'type': 'artist',
  'uri': 'spotify:artist:3Ri4H12KFyu98LMjSoij5V'},
 {'external_urls': {'spotify': 'https://open.spotify.com/artist/3Ri4H12KFyu98LMjSoij5V'},
  'href': 'https://api.spotify.com/v1/artists/3Ri4H12KFyu98LMjSoij5V',
  'id': '3Ri4H12KFyu98LMjSoij5V',
  'name': 'Bad Omens',
  'type': 'artist',
  'uri': 'spotify:artist:3Ri4H12KFyu98LMjSoij5V'},
 {'external_urls': {'spotify': 'https://open.spotify.com/artist/3Ri4H12KFyu98LMjSoij5V'},
  'href': 'https://api.spotify.com/v1/artists/3Ri4H12KFyu98LMjSoij5V',
  'id': '3Ri4H12KFyu98LMjSoij5V',
  'name': 'Bad Omens',
  'type': 'artist',
  'uri': 'spotify:artist:3Ri4H12KFyu98LMjSoij5V'},
 {'external_urls': {'spotify': 'https://open.spotify.com/artist/3Ri4H12KFyu98LMjSoij5V'},
  'href': 'https://api.spotify.com/v1/artis

Some things to consider about the above - 

* I am for now just taking the primary artist of each song. Features can also be taken if I want; I can leave it as a toggle for the user when extracting the playlist artists.
  * Same thing with weighing artists by the proportion of their songs in the playlist. Leave it as a user option. But for now I will not mind that.

In [66]:
total, len(artists)

(376, 376)

In [70]:
artist_names = [artist["name"] for artist in artists]
len(set(artist_names))

48

In [71]:
artist_uris = [artist["uri"] for artist in artists]

In [75]:
sp.artist(artist_uris[0])

{'external_urls': {'spotify': 'https://open.spotify.com/artist/3Ri4H12KFyu98LMjSoij5V'},
 'followers': {'href': None, 'total': 1776121},
 'genres': ['metalcore'],
 'href': 'https://api.spotify.com/v1/artists/3Ri4H12KFyu98LMjSoij5V',
 'id': '3Ri4H12KFyu98LMjSoij5V',
 'images': [{'url': 'https://i.scdn.co/image/ab6761610000e5eb3a62c74a31a446406a033926',
   'height': 640,
   'width': 640},
  {'url': 'https://i.scdn.co/image/ab676161000051743a62c74a31a446406a033926',
   'height': 320,
   'width': 320},
  {'url': 'https://i.scdn.co/image/ab6761610000f1783a62c74a31a446406a033926',
   'height': 160,
   'width': 160}],
 'name': 'Bad Omens',
 'popularity': 75,
 'type': 'artist',
 'uri': 'spotify:artist:3Ri4H12KFyu98LMjSoij5V'}

In [78]:
sp.artist_albums(artist_uris[0])["items"][0].keys()

dict_keys(['album_type', 'total_tracks', 'available_markets', 'external_urls', 'href', 'id', 'images', 'name', 'release_date', 'release_date_precision', 'type', 'uri', 'artists', 'album_group'])

Really there is not a lot of artist-level information that can be retrieved using the Spotify API. The only general information available is: 

* number of followers
* popularity
* genre (has now become far less specific than it used to be; broad genres would be of very little use)
  * Regarding more specific genres, Spotify has deprecated lots of endpoints and limited the data retrieved from those still available, and genres were affected as such. Old applications may still be able to use those features, though - note [exportify](https://exportify.net/) is able to grab audio features and more specific genres. Someone built a cli [here](https://github.com/donmerendolo/exportify-cli) that can be used to automate the process. Other APIs I can look into are MusicBrainz, Lastfm, and Discogs; the former two result in user-curated tags and so are not ideal, while the latter has more formal tags but are associated with releases as opposed to artist (so need to be aggregated, which is fine). The thing with all the alternative APIs though is that artists need to be corresponded to those taken from Spotify. Try exportify first.