# Getting the Top 50 Artists to Determine Genre 

The first step is to log-in to Spotify which requires a token for authorization.
The scope 'user-top-read' is used to obtain the top user data.

In [5]:
# List of user's top 50 tracks
import sys
import spotipy
import spotipy.util as util

if len(sys.argv) > 1:
    username = sys.argv[1]
else:
    print("Usage: %s username" % (sys.argv[0],))
    sys.exit()

scope = 'user-top-read'
token = util.prompt_for_user_token(username, scope, client_id='c3bf1281ee3348aca441cc5e6c369f49',client_secret='2de133361d8246f29f458d12d8b5127c',redirect_uri='http://example.com/callback/')

## Obtaining Data
Lists for genre and artist name for each time span will be used.
Spotipy is used to obtain the list of the current top artists for the signed in user. The list is limited to 50 as that is the only data available from Spotify. Each artist will have a set of one or more genres associated with their music. Those genres will be used to determine user taste.

In [6]:
# empty track lists
short_artist_genre = []
short_artist_name = []

medium_artist_genre = []
medium_artist_name = []

long_artist_genre = []
long_artist_name = []

if token:
    sp = spotipy.Spotify(auth=token)
    sp.trace = False
    ranges = ['short_term', 'medium_term', 'long_term']
    for range in ranges:
        results = sp.current_user_top_artists(time_range=range, limit=50)
        for i, item in enumerate(results['items']):
            if (range=='short_term'):
                short_artist_genre.append(item['genres'])
                short_artist_name.append(item['name'])
            elif(range=='medium_term'):
                medium_artist_genre.append(item['genres'])
                medium_artist_name.append(item['name'])
            else:
                long_artist_genre.append(item['genres'])
                long_artist_name.append(item['name'])

else:
    print("Can't get token for", username)

We now have to turn the list into a one dimensional list so that we can have a dataset of all the genres.

In [7]:
genre_list_short = []
genre_list_long = []
for genres in short_artist_genre:
    for genre in genres:
        genre_list_short.append(genre)
for genres in long_artist_genre:
    for genre in genres:
        genre_list_long.append(genre)
print (genre_list_short)

['canadian hip hop', 'canadian pop', 'hip hop', 'pop rap', 'rap', 'toronto rap', 'boy band', 'hip hop', 'pop rap', 'rap', 'art pop', 'pop', 'pop', 'pop rap', 'chicago rap', 'pop rap', 'rap', 'east coast hip hop', 'gangster rap', 'hip hop', 'pop rap', 'rap', 'southern hip hop', 'canadian contemporary r&b', 'canadian pop', 'pop', 'k-pop', 'k-pop girl group', 'escape room', 'conscious hip hop', 'hip hop', 'pop rap', 'rap', 'west coast rap', 'escape room', 'hip hop', 'pop rap', 'rap', 'trap', 'underground hip hop', 'alternative metal', 'alternative rock', 'el paso indie', 'garage rock', 'modern rock', 'rock', 'canadian contemporary r&b', 'pop', 'alternative r&b', 'art pop', 'escape room', 'indie r&b', 'indie soul', 'la pop', 'electropop', 'pop', 'hip hop', 'pop rap', 'rap', 'underground hip hop', 'art pop', 'dance pop', 'electropop', 'metropopolis', 'nz pop', 'pop', 'cali rap', 'melodic rap', 'pop rap', 'trap', 'underground hip hop', 'vapor trap', 'pop', 'uk pop', 'k-pop', 'k-pop girl grou

We can then convert the list to a set to obtain distinct strings and then convert it back to a list.

In [8]:
list_short = list(set(genre_list_short)) 
list_long = list(set(genre_list_long))

## Find Out How Often Each Genre Occured
Now we can go through each value in the distinct genres list and then counting the number of occurences in the list with all
the genres using loops. We will obtain lists of each genres respective counts.

In [14]:
counts_short = []
counts_long = []
percentage_short = []
percentage_long = []

for gen in list_short:
    count = genre_list_short.count(gen)
    counts_short.append(count)
for gen in list_long:
    count = genre_list_long.count(gen)
    counts_long.append(count)
      
for count in counts_short:
    percentage_short.append(count/len(short_artist_genre))
for count in counts_long:
    percentage_long.append(count/len(long_artist_genre))

## Pandas dataframe
The lists obtained are then used to create a dataframe just consisting of each distinct genre and their respective occurences.

In [15]:
import pandas as pd
df_genres_short = pd.DataFrame({'genre':list_short,'count':counts_short,'percentage':percentage_short})
df_genres_long = pd.DataFrame({'genre':list_long,'count':counts_long,'percentage':percentage_long})

df_genres_short.head()

Unnamed: 0,genre,count,percentage
0,nz pop,1,0.02
1,rap,24,0.48
2,k-pop boy group,1,0.02
3,boom bap,1,0.02
4,boy band,1,0.02


The data seems to be fine and we can convert it into a csv file.

In [None]:
df_gen_short.to_csv('SpotifyGenres12022019ShortTerm.csv')
df_gen_long.to_csv('SpotifyGenres12022019LongTerm.csv')