# Tour Locations of Popular Genres
Jake Gluck, Nhien Theresa Phan

## Introduction
Do you have a favorite music artist or genre? Have they performed in your city? We can learn about listening statistics from top music charts, but these lists don't show where artists of a particular genre usually tour. This tutorial looks into the geographic tour patterns of different genres. First, we demonstrate how to get top artists and genres from Spotify listening data. We use data from setlist.fm to map these artists' tour locations. Then, we perform exploratory analysis and visualization to come up with hypotheses about the data. We compare the tour data with actual listener distribution mapped geographically.

## Python dependencies

You will need Python 3 and the following libraries:

- itertools
- json
- numpy
- pandas
- re
- requests

In [2]:
import pandas as pd
import requests as rq
import json
import re

## Accessing the Spotify API

The Spotify Web API can be used to fetch the genres and other data associated with music artists. Its documentation can be found [here](https://developer.spotify.com/web-api/).

Accessing the API requires an OAuth token. Getting this token requires a Spotify account, which you can [create for free](https://www.spotify.com/signup/) if you do not currently have an account. Then, log into [Spotify Developer](https://beta.developer.spotify.com/dashboard/) and click "Create an App" in the dashboard. Follow the instructions to create an application. The title and description can be anything of your choosing.

You can then generate an OAuth token [here](https://developer.spotify.com/web-api/console/get-search-item/). This token is valid for one hour, but you can generate a new token on the same page once the current token expires.

In [8]:
oauth_token = 'BQDmoQpJDw1lG6EZQ-enFmesw-sW4P6zoxk59Mh0LL2B08DL8ldQNnU3cqxoYvR8KGidjl4OXG3bGxb69pgCLDMJcEGHtOV9Jj_PvBBwSwsGvib4_2CRX9tXMpVTQm9eo05ahys49Lq3ljKYFw'

## Getting the Spotify data
We want to get the genres of artists populating the most-streamed songs of the United States. Go to [spotifycharts.com](https://spotifycharts.com/) and use the filter options to get a weekly chart of the top 200 songs in the United States. We picked a week in early October to avoid lists predominated by holiday and seasonal music. Download the `.csv` and use `pandas` to read the file into a dataframe.

In [10]:
df = pd.read_csv('spotify.csv')

## Get artist details from the Spotify API
After dropping the duplicate artists from the dataframe, search each artist name using the Spotify API and the OAuth token we generated. The search returns various items, including a list of artists. Retrieve the list of genres associated with the first artist in the search results. We use this information to create a `dict` where each key is an artist, and each value is a `list` of associated genres. 

In [11]:
spotify_genres = {}
# for each artist from the data
for artist in df.drop_duplicates(['Artist'])['Artist']:
    # search artist name
    url = 'https://api.spotify.com/v1/search?q=' + artist + '&type=artist'
    headers = {'Accept':'application/json',
               'Content-Type':'application/json',
               'Authorization':'Bearer ' + oauth_token}
    search = rq.get(url, headers=headers)
    # get genres of first artist in search results
    spotify_genres[artist] = json.loads(search.text)['artists']['items'][0]['genres']

# dict of artists their genres
spotify_genres

{'2 Chainz': ['dwn trap', 'pop rap', 'rap', 'southern hip hop', 'trap music'],
 '21 Savage': ['dwn trap', 'rap', 'trap music'],
 'A Boogie Wit da Hoodie': ['dwn trap',
  'rap',
  'southern hip hop',
  'trap music'],
 'AJR': ['pop'],
 'Adele': ['dance pop', 'pop', 'post-teen pop'],
 'Alessia Cara': ['dance pop', 'pop', 'post-teen pop'],
 'Aminé': ['dwn trap',
  'pop rap',
  'rap',
  'southern hip hop',
  'trap music',
  'underground hip hop'],
 'Andy Williams': ['adult standards',
  'brill building pop',
  'bubblegum pop',
  'cabaret',
  'christmas',
  'easy listening',
  'lounge',
  'mellow gold',
  'opera',
  'operatic pop',
  'rock-and-roll',
  'soft rock',
  'vocal jazz'],
 'Ariana Grande': ['dance pop', 'pop', 'pop christmas', 'post-teen pop'],
 "Auli'i Cravalho": ['hollywood'],
 'Bebe Rexha': ['dance pop', 'pop', 'post-teen pop', 'tropical house'],
 'Big Sean': ['detroit hip hop', 'pop rap', 'rap', 'trap music'],
 'Bing Crosby': ['adult standards',
  'big band',
  'cabaret',
  'ch

## Determine top genres
Count the total instances of all the genres that appear in the data, and sort descending.

In [12]:
count = {}

# for each artist
for key, value in spotify_genres.items():
    # for each genre in the artist's list of genres
    for genre in value:
        # increment count of that genre
        if genre in count:
            count[genre] += 1
        else:
            count[genre] = 1

top_genres = pd.DataFrame.from_dict(count, orient='index')
# sort genres descending
top_genres.sort_values(by=0, ascending=False).head()

Unnamed: 0,0
pop,51
rap,42
pop rap,41
dance pop,34
trap music,30


## Simplifying the genre data
The data includes subgenres and various genre distinctions outside the scope of our project goals. We want to map the geographic distributions of hip hop, country, and rock artists' tour locations. Go through the data to sort artists into these categories. We collapse hip hop, rap, and subgenres containing these terms into one genre. Country and country subgenres are also collapsed into one, and the same is done for rock and rock subgenres. Artists of other genres are dropped.

In [8]:
# Get single genre names
genres = {}
# Filter trap
hip_hop = re.compile(r'(hip hop|^rap)',re.MULTILINE)
country = re.compile(r'country')
rock = re.compile(r'rock')

for key, value in spotify_genres.items():
    # Search genre list of each artist
    for genre in value:
        if hip_hop.match(genre):
            genres[key] = 'hip hop'
        else:
            if country.match(genre):
                genres[key] = 'country'
            else:
                if rock.match(genre):
                    genres[key] = 'rock'
                # If not hip hop, country, or rock, discard
df2 = pd.DataFrame.from_dict(genres, orient='index')
df2

Unnamed: 0,0
Post Malone,hip hop
Logic,hip hop
Cardi B,hip hop
21 Savage,hip hop
Lil Uzi Vert,hip hop
French Montana,hip hop
Kendrick Lamar,hip hop
Gucci Mane,hip hop
Travis Scott,hip hop
A Boogie Wit da Hoodie,hip hop
