## Identification of Established and Emerging Artists

Using an unofficial Billboard API by Github user [guoguo12](https://github.com/guoguo12), I pulled the names of every artist features on the Artist 100 and Emerging 50 charts dating back to September of 2017 (when Billboard started populating the Emerging 50 chart). In May 2019, After processing the data (which included overlapping artists from the emerging artists list and deleting duplicates from both lists), I was able to identify 576 emerging artists and 626 established artists. These numbers will vary depending on the week the code is run.  

In [None]:
import datetime
import billboard
import pickle
import pandas as pd

In [None]:
#set date today
date = datetime.datetime.today()

#create a list of dates that can be fed to the API in order to pull names of charting artists per week
date_list = [date - datetime.timedelta(days=x) for x in range(0, 620) if x % 7 == 0] 

#format the dates so that they are legible to the billboard API
date_list = [i.strftime('%Y-%m-%d') for i in date_list] 

### Pull List of Emerging Artists

The code below may take a few minutes to run.

In [None]:
emerging_artists = []
for i in date_list:
    chart_data = billboard.ChartData("emerging-artists", date=i)
    chart = chart_data.entries
    for i in range(0, len(chart)):
        emerging_artists.append(chart[i].artist)

#remove duplicates from the list
emerging_artists = list(set(emerging_artists))

In [None]:
#check first 5 artists in list
emerging_artists[:5]

In [None]:
#check length of list
len(emerging_artists)

### Pull List of Established Artists

The code below may take a few minutes to run.

In [None]:
established_artists = []
for i in date_list:
    chart_data = billboard.ChartData("artist-100", date=i)
    chart = chart_data.entries
    for i in range(0, len(chart)):
        established_artists.append(chart[i].artist)



#remove duplicates from the list
established_artists = list(set(established_artists))

In [None]:
#check first 5 artists in list
established_artists[:5]

In [None]:
#check length of list
len(established_artists)

### Remove Overlap

It is possible that artists who were once considered emerging could have moved over to the Artist 100 list in the ~88 week period through which the code above scrapes the Billboard charts. This means that there will be artists who will inevitably be featured in both lists. The code below identifies the overlap between the lists and eliminates the overlapping artists from the **emerging artists** list. 

In [None]:
overlap = list(set(established_artists).intersection(emerging_artists))

In [None]:
#these are the artists who have crossed over in the last year and a half or so. 
overlap

In [None]:
#removing the now-established artists from the emerging artists lists.
for i in overlap:
    emerging_artists.remove(i)

In [None]:
#new length of the emerging artists list
len(emerging_artists)

### Pickle the Lists

In [None]:
with open('../established_artists.pkl', 'wb') as f:
    pickle.dump(established_artists, f)

In [None]:
with open('../emerging_artists.pkl', 'wb') as f:
    pickle.dump(emerging_artists, f)

## Identify Top Songs Per Artist on Spotify


The second stage involves creating a loop that would feed the names of the artists in each list to the Spotify API as search terms and capturing the first search result for each artist in order to identify each artist's Spotify ID. Then, using the artist search method, the top songs will be pulled for each artist. The max number of songs is 10 per artist and if an artist has less than 10 tracks published on Spotify, all tracks will be pulled.

Spotify provides links to the track preview audio files, each of which are 30 seconds long. Track previews will be downloaded using the URLLIB library.

In [None]:
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials

In [None]:
#username="1235130080"
#client_id='80aa7a928c4541428fab1590b72d13b5'
#client_secret='4c08b36625c64523ab87cabc9aa36835'
#redirect_uri='https://github.com/jon-ruiz'
client_credentials_manager = SpotifyClientCredentials(client_id="80aa7a928c4541428fab1590b72d13b5", 
                                                      client_secret="4c08b36625c64523ab87cabc9aa36835")
spotify = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

### Get Spotify IDs for Each Artist

In [None]:
emerging_spotify_id = []
for i in emerging_artists:
    try:
        results = spotify.search(q=i, type="artist")
        emerging_spotify_id.append(results["artists"]["items"][0]["id"])
    except:
        emerging_spotify_id.append("null")

In [None]:
established_spotify_id = []
for i in established_artists:
    try:
        results = spotify.search(q=i, type="artist")
        established_spotify_id.append(results["artists"]["items"][0]["id"])
    except:
        established_spotify_id.append("null")

### Create DataFrames that Contain the Artist-ID Combinations for Future Reference

In [None]:
df_established_artists = pd.DataFrame({"artist":established_artists, "id":established_spotify_id})
df_emerging_artists = pd.DataFrame({"artist":emerging_artists, "id":emerging_spotify_id})

In [None]:
#remove nulls
emerging_spotify_id = [x for x in emerging_spotify_id if x != "null"]
established_spotify_id = [x for x in established_spotify_id if x != "null"]

### Create DataFrames that Contain Song Information

In [None]:
em_artist_track = []
em_artist_name = []
em_artist_track_duration = []
em_artist_track_popularity = []
em_artist_track_release_date = []
em_artist_track_preview_url = []
em_artist_track_id = []
em_artist_id = []

for i in emerging_spotify_id:
    data = spotify.artist_top_tracks(i)
    for i in range(len(data["tracks"])):
        em_artist_track.append(data["tracks"][i]["name"])
        em_artist_name.append(data["tracks"][i]["artists"][0]["name"])
        em_artist_track_duration.append(data["tracks"][i]["duration_ms"])
        em_artist_track_popularity.append(data["tracks"][i]["popularity"])
        em_artist_track_release_date.append(data["tracks"][0]["album"]["release_date"])
        em_artist_track_preview_url.append(data["tracks"][0]["preview_url"])
        em_artist_track_id.append(data["tracks"][i]["id"])
        em_artist_id.append(data["tracks"][i]["artists"][0]["id"])
        
emerging_artists_songs = pd.DataFrame({"track_name": em_artist_track, 
                                       "artist": em_artist_name, 
                                       "duration": em_artist_track_duration, 
                                       "popularity": em_artist_track_popularity, 
                                       "release_date": em_artist_track_release_date, 
                                       "mp3": em_artist_track_preview_url, 
                                       "track_id": em_artist_track_id, 
                                       "artist_id": em_artist_id})

emerging_artists_songs.head()

In [None]:
es_artist_track = []
es_artist_name = []
es_artist_track_duration = []
es_artist_track_popularity = []
es_artist_track_release_date = []
es_artist_track_preview_url = []
es_artist_track_id = []
es_artist_id = []

for i in established_spotify_id:
    data = spotify.artist_top_tracks(i)
    for i in range(len(data["tracks"])):
        es_artist_track.append(data["tracks"][i]["name"])
        es_artist_name.append(data["tracks"][i]["artists"][0]["name"])
        es_artist_track_duration.append(data["tracks"][i]["duration_ms"])
        es_artist_track_popularity.append(data["tracks"][i]["popularity"])
        es_artist_track_release_date.append(data["tracks"][0]["album"]["release_date"])
        es_artist_track_preview_url.append(data["tracks"][0]["preview_url"])
        es_artist_track_id.append(data["tracks"][i]["id"])
        es_artist_id.append(data["tracks"][i]["artists"][0]["id"])
        
established_artists_songs = pd.DataFrame({"track_name": es_artist_track, 
                                          "artist": es_artist_name, 
                                          "duration": es_artist_track_duration, 
                                          "popularity": es_artist_track_popularity, 
                                          "release_date": es_artist_track_release_date, 
                                          "mp3": es_artist_track_preview_url, 
                                          "track_id": es_artist_track_id, 
                                          "artist_id": es_artist_id})

established_artists_songs.head()

### Download the Songs

I created seperate folders for songs from the two categories of artists. The code below will loop through each category's respective database and download the songs into the appropriate folder. 

In [None]:
import urllib.request

In [None]:
for index, row in established_artists_songs.iterrows():
    url = row['mp3']
    name = row['track_id']
    urllib.request.urlretrieve(url, '../Downloads/Established_Artists_Songs/' + name + '.mp3')

In [None]:
for index, row in emerging_artists_songs.iterrows():
    url = row['mp3']
    name = row['track_id']
    urllib.request.urlretrieve(url, '../Downloads/Emerging_Artists_Songs/' + name + '.mp3')