# Spotify Song Data Scraping

### Acknowledgements:
Based on the following tutorials: <br />
Max Hilsdorf, "How to Create Large Music Datasets Using Spotipy", <i>Towards Data Science</i>, 25 April 2020: <br />
https://towardsdatascience.com/how-to-create-large-music-datasets-using-spotipy-40e7242cc6a6 <br />
Max Tingle, "Getting Started with Spotify’s API & Spotipy", <i>Towards Data Science</i>, 3 Oct 2019: <br />
https://medium.com/@maxtingle/getting-started-with-spotifys-api-spotipy-197c3dc6353b

Using the following datasets to aid in scraping:
Elena Call, "Spotify Artists", <i>Kaggle</i>: <br />
https://www.kaggle.com/ehcall/spotify-artists

## Setup

### Import packages
#### *Install SpotiPy if necessary (pip install spotify)

In [12]:
import spotipy
import spotipy.util as util
import pandas as pd

### Spotify API access

In [21]:
CLIENT_ID = "303fd13b2f764be5bf062c320edc2376"
CLIENT_SECRET = "a9085f00d6aa48f5b693dc142749a04d"
token = spotipy.oauth2.SpotifyClientCredentials(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
cache_token = token.get_access_token()
sp = spotipy.Spotify(cache_token)

  cache_token = token.get_access_token()


## Import & Wrangle Artist List dataset
### Import

In [35]:
artist_uris = pd.read_csv("artist-uris.csv")
artist_uris

Unnamed: 0,artistName,artistURI
0,1:43,spotify:artist:39EHxSQAIaWusRqSI9xoyF
1,2:00 AM,spotify:artist:4tN3rZ7cChj4Wns2Wt2Nj6
2,2:15,spotify:artist:4HsOm6VNKZtGh8W8GhdNu4
3,2:54,spotify:artist:3LsQKoRgMc8VEkQn66jfAQ
4,4:20,spotify:artist:5KCG0FDMDPzQpxcohGUnyH
...,...,...
81318,黃曉明,spotify:artist:53F8atvCmVFVOvnKwZXBd3
81319,黃玠瑋,spotify:artist:6VI0p0xTjBKKxQN8i8vGpD
81320,黃義達,spotify:artist:7kaq0LysuRSgBZSorlZ7Vj
81321,黃顯忠 (Huang Xianzhong),spotify:artist:7MuuEryyseo5cvReO6gdPF


### Recode Artist URI

In [36]:
artist_uris['artistURIcall'] = artist_uris['artistURI']

In [37]:
artist_uris['artistURI'] = artist_uris['artistURIcall'].str.replace('spotify:artist:', '')
artist_uris

Unnamed: 0,artistName,artistURI,artistURIcall
0,1:43,39EHxSQAIaWusRqSI9xoyF,spotify:artist:39EHxSQAIaWusRqSI9xoyF
1,2:00 AM,4tN3rZ7cChj4Wns2Wt2Nj6,spotify:artist:4tN3rZ7cChj4Wns2Wt2Nj6
2,2:15,4HsOm6VNKZtGh8W8GhdNu4,spotify:artist:4HsOm6VNKZtGh8W8GhdNu4
3,2:54,3LsQKoRgMc8VEkQn66jfAQ,spotify:artist:3LsQKoRgMc8VEkQn66jfAQ
4,4:20,5KCG0FDMDPzQpxcohGUnyH,spotify:artist:5KCG0FDMDPzQpxcohGUnyH
...,...,...,...
81318,黃曉明,53F8atvCmVFVOvnKwZXBd3,spotify:artist:53F8atvCmVFVOvnKwZXBd3
81319,黃玠瑋,6VI0p0xTjBKKxQN8i8vGpD,spotify:artist:6VI0p0xTjBKKxQN8i8vGpD
81320,黃義達,7kaq0LysuRSgBZSorlZ7Vj,spotify:artist:7kaq0LysuRSgBZSorlZ7Vj
81321,黃顯忠 (Huang Xianzhong),7MuuEryyseo5cvReO6gdPF,spotify:artist:7MuuEryyseo5cvReO6gdPF


### Drop original column

In [39]:
artistURIs = artist_uris.drop('artistURIcall', axis=1)
artistURIs

Unnamed: 0,artistName,artistURI
0,1:43,39EHxSQAIaWusRqSI9xoyF
1,2:00 AM,4tN3rZ7cChj4Wns2Wt2Nj6
2,2:15,4HsOm6VNKZtGh8W8GhdNu4
3,2:54,3LsQKoRgMc8VEkQn66jfAQ
4,4:20,5KCG0FDMDPzQpxcohGUnyH
...,...,...
81318,黃曉明,53F8atvCmVFVOvnKwZXBd3
81319,黃玠瑋,6VI0p0xTjBKKxQN8i8vGpD
81320,黃義達,7kaq0LysuRSgBZSorlZ7Vj
81321,黃顯忠 (Huang Xianzhong),7MuuEryyseo5cvReO6gdPF


#### Drop rows with #ERROR!

In [42]:
artistURIs = artistURIs[(artistURIs.artistName !='#ERROR!') & (artistURIs.artistURI !='#ERROR!')]
artistURIs

Unnamed: 0,artistName,artistURI
0,1:43,39EHxSQAIaWusRqSI9xoyF
1,2:00 AM,4tN3rZ7cChj4Wns2Wt2Nj6
2,2:15,4HsOm6VNKZtGh8W8GhdNu4
3,2:54,3LsQKoRgMc8VEkQn66jfAQ
4,4:20,5KCG0FDMDPzQpxcohGUnyH
...,...,...
81317,黃小琥,6KCusBln9NTESgcuI0DlUz
81318,黃曉明,53F8atvCmVFVOvnKwZXBd3
81319,黃玠瑋,6VI0p0xTjBKKxQN8i8vGpD
81320,黃義達,7kaq0LysuRSgBZSorlZ7Vj
