# Spotipy

Spotipy is a dedicated library for interacting with Spotify's own API. 

It was designed for easier integration with other applications but we can use it for data analysis!

You will need to setup a free spotify account to generate an API key to access the spotify API. 

Follow steps <a href="https://developer.spotify.com/documentation/web-api">here</a> to generate an API Key.

<a href="https://spotipy.readthedocs.io/en/2.24.0/">SpotiPy Documentation</a>


In [None]:
# For this to work, you need to have the spotipy library installed.
! pip install spotipy

## Importing and Authorising
To run this you'll need your client ID and your secret ID which can be found in your spotify developer dashboard. 

Either paste your codes as strings directly into variables or make an api_keys.py script with variables names the same as the imports. 

In [None]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from api_keys import SPOTIPY_CLIENT_ID, SPOTIPY_CLIENT_SECRET

cid = SPOTIPY_CLIENT_ID #ADD YOUR CLIENT ID HERE
secret = SPOTIPY_CLIENT_SECRET #ADD YOUR SECRET ID HERE

client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

## Loading Data
This code loops through the top 1000 songs from the year 2018 and stores the data in separate lists.

In [None]:
artist_name = []
track_name = []
popularity = []
track_id = []
images = []

for i in range(0, 1000, 50):
    track_results = sp.search(q='year:2023', type='track', limit=50, offset=i) # hover over the search function to see the parameters, notice that we are searching for tracks in the year 2023
    for t in track_results['tracks']['items']:
        artist_name.append(t['artists'][0]['name'])
        track_name.append(t['name'])
        track_id.append(t['id'])
        popularity.append(t['popularity'])
        images.append(t['album']['images'][0]['url'])
        
# lets sort the data in a dictionary 
popular_tracks_dict = {'artist_name' : artist_name, 'track_name' : track_name, 'track_id' : track_id, 'popularity' : popularity, 'images' : images}

Let's print the data we've collected from the spotify api

In [None]:
print(popular_tracks_dict) 
# As we can see, there is a lot of data here. 
# It's difficult to analyse it in this format. 
# Let's convert it to a pandas dataframe...

## Loading Data into a DataFrame for Analysis
With the lists we created by calling the spotify API, we can now generate a dataframe using pandas.

Here we create our data frame using the lists we created in the above cell. We can see that this has a shape of 1000, 5. This means there are 1000 rows and 5 columns.

Then we sort the dataframe by popularity 

In [None]:
import pandas as pd
track_dataframe = pd.DataFrame({'artist_name' : artist_name, 'track_name' : track_name, 'track_id' : track_id, 'popularity' : popularity, 'images' : images})
print(track_dataframe.shape) # this should print (1000, 5) if you have 1000 rows
track_dataframe.head() # this will print the first 5 rows of

In [None]:
# Lets sort the dataframe by popularity with most popular songs at the top
df_sorted = track_dataframe.sort_values(by='popularity', ascending=False) # sort the dataframe by popularity in descending order
df_sorted_indexed = df_sorted.reset_index(drop=True) # this will reset the index of the dataframe and remove the old index
df_sorted_indexed.head(20)    # this will print the first 20 rows of he dataframe

## Quick example of what we can do with it...

We already have the the most popular tracks and have successfully organised the data in decending order, but some artist's have many popular songs. 

You could argue that the most successful artists would have more than one track in spotify's top 1000. So let's find which artist was the most successful on spotify by number of tracks.

**Remember:** This is all using pandas and matplotlib which we have been using in previous classes!

In [None]:
artist_counts = track_dataframe['artist_name'].value_counts() # this will count the number of times each artist appears in the dataframe
duplicated_artists = artist_counts[artist_counts > 1]  # this will filter out the artists that appear more than once
#duplicated_artists

In [None]:
duplicated_artists.describe() # Before we plot the data, lets see the summary statistics of the data.

# We can see that there are 163 artists that appear more than once.
# the mean number of tracks for these artists is 4.3
# One artist has 36 tracks! Bet's on who that is? 🪡 🕊️

In [None]:
most_successful_artists = duplicated_artists.head(10) # this will get the top 10 most successful artists by number of songs in top 1000 in 2023

import matplotlib.pyplot as plt

# Plotting the artist counts
plt.figure(figsize=(12, 6))
plt.bar(most_successful_artists.index, most_successful_artists.values)
plt.xlabel('Artist')
plt.ylabel('Count')
plt.title('Number of Songs by Artist')
plt.xticks(rotation=90)
plt.show()

# Can you guess which artist has the most songs in the top 1000 in 2023?

## More data we can collect

Spotify has loads of data points that it uses to fuel it's algorithm and select songs and playlists for you based on data it has about you... 

We won't get into the ethics of this now but let's try and view some of this data. 

In [None]:
# Let's take the most popular track that we found earlier and get the audio features for it
most_popular_track = df_sorted_indexed.iloc[0]
most_popular_track 

Now let's use spotify's audio_features function to view some data they have synthesised from analysing the track...

In [None]:
# get the track id of the most popular track
most_popular_track_id = most_popular_track['track_id'] 
# get the audio analysis data of the most popular track
track_audio_features = sp.audio_features(most_popular_track_id)
track_audio_features