# Spotify API Access, Data Retrieval, and Graph Creation

## Load the libraries

In [1]:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from IPython.display import clear_output
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pickle

## Access to Spotify API
Set up Spotify API access by providing client ID and client secret

In [2]:
client_id = "8c384979ed264980b32fb820865a3a21"
client_secret = "e2f33e036c0347eba2a75278e2501832"

credmanager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=credmanager)

## Define our main artist
In this part, we define the main artist as 'Ed Sheeran'. The main artist serves as a starting point for building the graph representation. The graph will be constructed by adding related artists and establishing connections between them and the main artist.

In [3]:
main_artist = 'Ed Sheeran'

## Demonstrating some features of Spotify API

This part demonstrates how to use the Spotify API to search for an artist, retrieve their information, and find related artists. The artist information, such as their name, ID, popularity, and genre, is extracted from the search results.

In [4]:
artist_search = sp.search(main_artist, type='artist')['artists']['items'][0]
print(artist_search)

{'external_urls': {'spotify': 'https://open.spotify.com/artist/6eUKZXaKkcviH0Ku9w2n3V'}, 'followers': {'href': None, 'total': 112147503}, 'genres': ['pop', 'uk pop'], 'href': 'https://api.spotify.com/v1/artists/6eUKZXaKkcviH0Ku9w2n3V', 'id': '6eUKZXaKkcviH0Ku9w2n3V', 'images': [{'height': 640, 'url': 'https://i.scdn.co/image/ab6761610000e5eb9e690225ad4445530612ccc9', 'width': 640}, {'height': 320, 'url': 'https://i.scdn.co/image/ab676161000051749e690225ad4445530612ccc9', 'width': 320}, {'height': 160, 'url': 'https://i.scdn.co/image/ab6761610000f1789e690225ad4445530612ccc9', 'width': 160}], 'name': 'Ed Sheeran', 'popularity': 92, 'type': 'artist', 'uri': 'spotify:artist:6eUKZXaKkcviH0Ku9w2n3V'}


### Searching Artist

This section performs a search for the artist using the sp.search function and retrieves the search results. It then extracts important features of the artist, such as their name, ID, popularity, and genre. The extracted features are stored in the artist_features dictionary.

In [5]:
def extract_artist_features(spotify_search_result):
    result = {
        'artist_name': spotify_search_result.get('name', 'artist_name_not_available'),
        'artist_id': spotify_search_result.get('id', 'artist_id_not_available'),
        'artist_popularity': spotify_search_result.get('popularity', 0),
        'artist_first_genre': (spotify_search_result.get('genres', ['genre_not_available']) + ['genre_not_available'])[0],
    }
    return result

In [6]:
artist_features = extract_artist_features(artist_search)
print(artist_features)

{'artist_name': 'Ed Sheeran', 'artist_id': '6eUKZXaKkcviH0Ku9w2n3V', 'artist_popularity': 92, 'artist_first_genre': 'pop'}


### Retrieving related artists

It retrieves the related artists of the artist using the sp.artist_related_artists function. It prints the number of related artists and displays some information about the first 20 related artists.

In [7]:
artist_related_artists = sp.artist_related_artists(artist_features['artist_id'])['artists']

print('Ed Sheeran has', len(artist_related_artists), 'related artists. The first one is', artist_related_artists[0]['name'], '\n')

for i in range(20):
    print(artist_related_artists[i]['name'])

Ed Sheeran has 20 related artists. The first one is James Arthur 

James Arthur
Shawn Mendes
James TW
Sam Smith
Charlie Puth
Hailee Steinfeld
Calum Scott
Liam Payne
Niall Horan
James Bay
Lewis Capaldi
Lukas Graham
Nick Jonas
Alessia Cara
The Vamps
DNCE
Meghan Trainor
Cheat Codes
Camila Cabello
The Script


Every artist on Spotify has only 20 related artists.

## Create a graph

This part demonstrates the creation of a graph representation using the NetworkX library. An empty undirected graph, G, is initialized. The main artist is added as a node to the graph using the add_node() function. Connections between the main artist and related artists are established by utilizing the Spotify API to retrieve information about related artists. Each related artist's name is added as a node to the graph using add_node(), and edges are created between the main artist and each related artist using add_edge().

In [8]:
G = nx.Graph()  
G.add_node(main_artist)

In [9]:
# Retrieve information about related artists using the Spotify API
related_artists = sp.artist_related_artists(artist_features['artist_id'])

# Iterate through the related artists and add them as nodes to the graph
for related_artist in related_artists['artists']:
    related_artist_name = related_artist['name']
    G.add_node(related_artist_name)
    G.add_edge(main_artist, related_artist_name) # Create an edge between the main artist and each related artist

### Add some initial nodes to the graph

In this section, a list of artist names is read from a file. Each artist is searched for using the Spotify API, and their information is added as nodes to the graph. The artist's features, such as name, ID, popularity, and genre, are set as attributes for each node. Additionally, it checks if the new artist is a related artist of any existing node in the graph and creates edges between them if a relationship exists.

In [10]:
with open('foreign.txt', 'r', encoding='utf-8') as file:
    artists_name_list = file.read().splitlines()

print('There are', len(artists_name_list), 'artists in the initial list.')

There are 21246 artists in the initial list.


In [12]:
for name in artists_name_list:
    search_results = sp.search(name, type='artist')['artists']['items']
    try:
        search = search_results[0]
        this_artist = extract_artist_features(search)
        this_artist_name = this_artist['artist_name']
        
        G.add_node(this_artist_name, **this_artist, related_found=False)

        # Check if the new artist is a related artist of any existing node
        for existing_node in G.nodes:
            if 'related_found' in G.nodes[existing_node] and G.nodes[existing_node]['related_found'] and this_artist_name in G.nodes[existing_node]['related_found']:
                G.add_edge(existing_node, this_artist_name)
            print('The graph has', len(G), 'nodes now.')
        
    except spotipy.SpotifyException as e:
        print("An error occurred while searching for", name)
        print("Error message:", str(e))

The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nodes now.
The graph has 31 nod

### Adding edges and more nodes to the graph

This section iterates through the nodes in the graph and finds their related artists. It adds these related artists as nodes to the graph and creates edges between the artists. The process continues until there are no new artists to add or the number of artists in the graph exceeds 25000.

In [None]:
# Iterate through the nodes in the graph and find their related artists
while True:
    num_nodes_before = len(G.nodes)
    for x in list(G):
        if 'related_found' not in G.nodes[x] or 'artist_id' not in G.nodes[x]:
            continue
        relateds = sp.artist_related_artists(G.nodes[x]['artist_id'])['artists']
        relateds = [extract_artist_features(r) for r in relateds]
        relateds_names = [r['artist_name'] for r in relateds]
        G.nodes[x]['related_found'] = True
        for rname, rdict in zip(relateds_names, relateds):
            if rname not in G:
                G.add_node(rname, **rdict)
                clear_output(wait=True)
                print('The graph has', len(G), 'nodes now.')
            G.add_edge(x, rname)

    num_nodes_after = len(G.nodes)
    if num_nodes_after == num_nodes_before or num_nodes_after > 25000:
        break

print("Number of edges:", len(G.edges))

In [None]:
plt.figure(figsize=(10, 10))
nx.draw_networkx(G, with_labels=True, node_size=100, font_size=5, alpha=0.7)
plt.axis('off')
plt.show()