In [4]:
!pip install folium

Collecting folium
  Downloading folium-0.5.0.tar.gz (79kB)
[K    100% |████████████████████████████████| 81kB 2.4MB/s ta 0:00:01
[?25hCollecting branca (from folium)
  Downloading branca-0.2.0-py3-none-any.whl
Building wheels for collected packages: folium
  Running setup.py bdist_wheel for folium ... [?25ldone
[?25h  Stored in directory: /home/jovyan/.cache/pip/wheels/04/d0/a0/b2b8356443364ae79743fce0b9b6a5b045f7560742129fde22
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.2.0 folium-0.5.0


# Tour Locations of Popular Genres
Jake Gluck, Nhien Theresa Phan

## Introduction
Do you have a favorite music artist or genre? Have they performed in your city? We can learn about listening statistics from top music charts, but these lists don't show where artists of a particular genre usually tour. This tutorial looks into the geographic tour patterns of different genres. First, we demonstrate how to get top artists and genres from Spotify listening data. We use data from setlist.fm to map these artists' tour locations. Then, we perform exploratory analysis and visualization to come up with hypotheses about the data. We compare the tour data with actual listener distribution mapped geographically.

## Python dependencies

You will need Python 3 and the following libraries:

- folium
- itertools
- json
- numpy
- pandas
- re
- requests

In [5]:
import pandas as pd
import requests as rq
import json
import re
import folium
import bs4
import time

## Method 1: Getting artists from the last.fm API
The [last.fm API](https://www.last.fm/api) can be used to fetch genres and other data associated with music artists and listenership.

Using the API methods requires an API key. Getting this key requires a Spotify account, which you can [create for free](https://www.last.fm/join) if you do not currently have an account. Then, [create an API account](https://www.last.fm/api/account/create). "Contact email" and "Application name" are the only required fields in the form; fill in the latter with whatever you wish. Once you complete and submit the form, you will receive an API key. Save this in a UTF-8 encoded text file. We can now use this API key to access the last.fm API methods.

In [6]:
file = open('api_key.txt')
api_key = file.read().replace('\ufeff','')
file.close()

Artists on last.fm are given tags by users. We can use the API method [`tag.getTopArtists`](https://www.last.fm/api/show/tag.getTopArtists) to get a list of artists tagged with a genre we are interested in.

In [7]:
# number of artists per genre to return
limit = '50'
url = 'http://ws.audioscrobbler.com/2.0/?method=tag.gettopartists&api_key='+ api_key +'&limit='+ limit +'&format=json&tag='

rock_url = url + 'rock'
rock = rq.get(rock_url)

hiphop_url = url + 'hip+hop'
hiphop = rq.get(hiphop_url)

In [8]:
for artist in json.loads(rock.text)['topartists']['artist']:
    print(artist['name'])
print()
time.sleep(1)
for artist in json.loads(hiphop.text)['topartists']['artist']:
    print(artist['name'])

Coldplay
Linkin Park
Red Hot Chili Peppers
David Bowie
Foo Fighters
Paramore
Kings of Leon
U2
Maroon 5
The White Stripes
Incubus
Panic! at the Disco
Weezer
Evanescence
R.E.M.
Rage Against the Machine
Nickelback
Bruce Springsteen
Papa Roach
Aerosmith
OneRepublic
Bon Jovi
Jimmy Eat World
The Velvet Underground
3 Doors Down
The Cranberries
Audioslave
Tenacious D
Garbage
The All-American Rejects
Seether
The Police
Simple Plan
No Doubt
Lostprophets
Sting
Anberlin
The Pretty Reckless
Lifehouse
Goo Goo Dolls
Black Rebel Motorcycle Club
Stereophonics
Shinedown
Manic Street Preachers
Coma
Counting Crows
Bloodhound Gang
Wolfmother
Alter Bridge
Guano Apes

Samy Deluxe
Absolute Beginner
Beginner
Curse
Nxworries
Dynamite Deluxe
D.R.A.M.
Eins Zwo
Teflon Vest
Injury Reserve
RICH CHIGGA
Torch
MADEINTYO
Lil' Kleine
Afrob
Ferris MC
Rockstah
Umse
2AM Club
Kent Jones
www.MzHipHop.com
Bezimienni
YONAS
Radical Something
Lance Butters
Creme De La Creme
Renee Elise Goldsberry
Kitty
Billy Blue
Blumio
NMZS
Das 

## Method 2: Scraping artists from the last.fm website
last.fm is a music website where users can share their listening data and tag artists. By scraping their tag pages, we can get a list of top artists in whatever genres we are interested in.

In [24]:
hiphop_page1 = 'https://www.last.fm/tag/hip-hop/artists'
hiphop_page2 = 'https://www.last.fm/tag/hip-hop/artists?page=2'
hiphop_page3 = 'https://www.last.fm/tag/hip-hop/artists?page=3'

rock_page1 = 'https://www.last.fm/tag/rock/artists'
rock_page2 = 'https://www.last.fm/tag/rock/artists?page=2'
rock_page3 = 'https://www.last.fm/tag/rock/artists?page=3'

In [27]:
print("hey")

def scrape_page(link):
    page = rq.get(link)
    soup = bs4.BeautifulSoup(page.text, 'html.parser')
    elements = soup.findAll('h3', {'class':'big-artist-list-title'})
    artists = []
    for e in elements:
        artists.append(e.text)
    return artists
        
rock_artists = []
rock_elements = scrape_page(rock_page1)
for e in rock_elements:
    rock_artists.append(e)
rock_elements = []
rock_elements = scrape_page(rock_page2)
for e in rock_elements:
    rock_artists.append(e)
rock_elements = []
rock_elements = scrape_page(rock_page3)
for e in rock_elements:
    rock_artists.append(e)

hiphop_artists = []
hiphop_elements = scrape_page(hiphop_page1)
for e in hiphop_elements:
    hiphop_artists.append(e)
hiphop_elements = []
hiphop_elements = scrape_page(hiphop_page2)
for e in rock_elements:
    hiphop_artists.append(e)
hiphop_elements = []
hiphop_elements = scrape_page(hiphop_page3)
for e in hiphop_elements:
    hiphop_artists.append(e)

print(rock_artists)
print()
print(hiphop_artists)

hey
https://www.last.fm/tag/rock/artists
https://www.last.fm/tag/rock/artists?page=2
https://www.last.fm/tag/rock/artists?page=3
https://www.last.fm/tag/hip-hop/artists
https://www.last.fm/tag/hip-hop/artists?page=2
https://www.last.fm/tag/hip-hop/artists?page=3
['Red Hot Chili Peppers', 'The Beatles', 'Muse', 'Coldplay', 'Nirvana', 'Radiohead', 'Foo Fighters', 'U2', 'Linkin Park', 'Led Zeppelin', 'Queen', 'Pink Floyd', 'The Killers', 'The White Stripes', 'The Rolling Stones', 'Green Day', 'Oasis', "Guns N' Roses", 'The Doors', 'System of a Down', 'AC/DC', 'Placebo', 'David Bowie', 'Franz Ferdinand', 'Aerosmith', 'Evanescence', 'Arctic Monkeys', 'Pearl Jam', 'Nickelback', 'Queens of the Stone Age', 'Rage Against the Machine', 'Jimi Hendrix', 'The Strokes', 'R.E.M.', 'Metallica', 'The Who', 'My Chemical Romance', 'The Smashing Pumpkins', '30 Seconds to Mars', 'Incubus', 'Audioslave', 'Paramore', 'Kings of Leon', 'The Cranberries', '3 Doors Down', 'The Offspring', 'Bon Jovi', 'The Cure',

In [41]:
##Data Cleaning ##

new_rock_artists = []
for artist in rock_artists:
    new_artist = artist.replace("/", " ")
    new_artist = new_artist.replace("!", "")
    new_rock_artists.append(new_artist)
    
rock_artists = new_rock_artists
print(new_rock_artists)



['Red Hot Chili Peppers', 'The Beatles', 'Muse', 'Coldplay', 'Nirvana', 'Radiohead', 'Foo Fighters', 'U2', 'Linkin Park', 'Led Zeppelin', 'Queen', 'Pink Floyd', 'The Killers', 'The White Stripes', 'The Rolling Stones', 'Green Day', 'Oasis', "Guns N' Roses", 'The Doors', 'System of a Down', 'AC DC', 'Placebo', 'David Bowie', 'Franz Ferdinand', 'Aerosmith', 'Evanescence', 'Arctic Monkeys', 'Pearl Jam', 'Nickelback', 'Queens of the Stone Age', 'Rage Against the Machine', 'Jimi Hendrix', 'The Strokes', 'R.E.M.', 'Metallica', 'The Who', 'My Chemical Romance', 'The Smashing Pumpkins', '30 Seconds to Mars', 'Incubus', 'Audioslave', 'Paramore', 'Kings of Leon', 'The Cranberries', '3 Doors Down', 'The Offspring', 'Bon Jovi', 'The Cure', 'Nine Inch Nails', 'Gorillaz', 'Marilyn Manson', 'Papa Roach', 'Weezer', 'Deep Purple', 'Blur', 'Tenacious D', 'Fall Out Boy', 'Garbage', 'Dire Straits', 'Rammstein', 'Bob Dylan', 'Three Days Grace', 'Avril Lavigne']
['Red Hot Chili Peppers', 'The Beatles', 'Mus

## Method 3: Getting artists from Spotify
### Accessing the Spotify API

The Spotify Web API can be used to fetch the genres and other data associated with music artists. Its documentation can be found [here](https://developer.spotify.com/web-api/).

Accessing the API requires an OAuth token. Getting this token requires a Spotify account, which you can [create for free](https://www.spotify.com/signup/) if you do not currently have an account. Then, log into [Spotify Developer](https://beta.developer.spotify.com/dashboard/) and click "Create an App" in the dashboard. Follow the instructions to create an application. The title and description can be anything of your choosing.

You can then generate an OAuth token [here](https://developer.spotify.com/web-api/console/get-search-item/). This token is valid for one hour, but you can generate a new token on the same page once the current token expires.

In [23]:
oauth_token = 'BQDmoQpJDw1lG6EZQ-enFmesw-sW4P6zoxk59Mh0LL2B08DL8ldQNnU3cqxoYvR8KGidjl4OXG3bGxb69pgCLDMJcEGHtOV9Jj_PvBBwSwsGvib4_2CRX9tXMpVTQm9eo05ahys49Lq3ljKYFw'

### Getting the Spotify data
We want to get the genres of artists populating the most-streamed songs of the United States. Go to [spotifycharts.com](https://spotifycharts.com/) and use the filter options to get a weekly chart of the top 200 songs in the United States. We picked a week in early October to avoid lists predominated by holiday and seasonal music. Download the `.csv` and use `pandas` to read the file into a dataframe.

In [28]:
df = pd.read_csv('spotify.csv')

### Get artist details from the Spotify API
After dropping the duplicate artists from the dataframe, search each artist name using the Spotify API and the OAuth token we generated. The search returns various items, including a list of artists. Retrieve the list of genres associated with the first artist in the search results. We use this information to create a `dict` where each key is an artist, and each value is a `list` of associated genres. 

In [29]:
spotify_genres = {}
# for each artist from the data
for artist in df.drop_duplicates(['Artist'])['Artist']:
    # search artist name
    url = 'https://api.spotify.com/v1/search?q=' + artist + '&type=artist'
    headers = {'Accept':'application/json',
               'Content-Type':'application/json',
               'Authorization':'Bearer ' + oauth_token}
    search = rq.get(url, headers=headers)
    # get genres of first artist in search results
    spotify_genres[artist] = json.loads(search.text)['artists']['items'][0]['genres']

# dict of artists their genres
spotify_genres

KeyError: 'artists'

## Determine top genres
Count the total instances of all the genres that appear in the data, and sort descending.

In [33]:
count = {}

# for each artist
for key, value in spotify_genres.items():
    # for each genre in the artist's list of genres
    for genre in value:
        # increment count of that genre
        if genre in count:
            count[genre] += 1
        else:
            count[genre] = 1

top_genres = pd.DataFrame.from_dict(count, orient='index')
# sort genres descending
top_genres.sort_values(by=0, ascending=False).head()

KeyError: 0

## Simplifying the genre data
The data includes subgenres and various genre distinctions outside the scope of our project goals. We want to map the geographic distributions of hip hop, country, and rock artists' tour locations. Go through the data to sort artists into these categories. We collapse hip hop, rap, and subgenres containing these terms into one genre. Country and country subgenres are also collapsed into one, and the same is done for rock and rock subgenres. Artists of other genres are dropped.

In [31]:
# Get single genre names
genres = {}
# Filter trap
hip_hop = re.compile(r'(hip hop|^rap)',re.MULTILINE)
country = re.compile(r'country')
rock = re.compile(r'rock')

for key, value in spotify_genres.items():
    # Search genre list of each artist
    for genre in value:
        if hip_hop.match(genre):
            genres[key] = 'hip hop'
        else:
            if country.match(genre):
                genres[key] = 'country'
            else:
                if rock.match(genre):
                    genres[key] = 'rock'
                # If not hip hop, country, or rock, discard
df2 = pd.DataFrame.from_dict(genres, orient='index')
df2

## Using the setlist.fm API

setlist.fm is a website that collects the setlists of music artists' live performances. This data includes the location and date of the performance, which is also accessible through their API. The setlist.fm API is documented [here](https://api.setlist.fm/docs/1.0/index.html).

## Determine top cities
We are interested in exploring geographic differences in the tour locations of artists in various genres, but more popular cities can add noise to our data because they may be heavily-populated cities that are almost always visited by artists regardless of genre. We can determine the most popular cities by counting the number of occurrences in the data.

In [32]:
sample = pd.read_csv('sample.csv')
city_count = {}

# for each tour stop
for index, row in sample.iterrows():
    # increment count of that city
    if row['cities'] in city_count:
        city_count[row['cities']] += 1
    else:
        city_count[row['cities']] = 1

top_cities = pd.DataFrame.from_dict(city_count, orient='index')
# sort cities descending
top_cities.sort_values(by=0, ascending=False).head()

Unnamed: 0,0
Paris,4
Toronto,3
Amsterdam,3
Boston,3
Glasgow,2


## Mapping artists’ tour locations with `folium`
Now that we have the tour data of Spotify's top artists, we can plot the tour locations on a map using [`folium`](http://python-visualization.github.io/folium/docs-master/), a library that adapts the [`leaflet.js`](http://leafletjs.com/) mapping library for a Python ecosystem. Installation instructions can be found [here](http://python-visualization.github.io/folium/docs-master/installing.html#installation). The tour data includes the names of cities that artists visited, so use a free dataset from [Simplemaps.com](https://simplemaps.com/data/world-cities) to get the latitude and longitude of the cities. Use these coordinates to map the cities onto a `folium` map.  

In [38]:

cities = pd.read_csv('simplemaps-worldcities-basic.csv')
sample = pd.read_csv('sample.csv')

# map centered on United States
map = folium.Map(location=[39.5, -98.35], zoom_start=4)

for index, row in sample.iterrows():
    lat = cities.loc[cities['city'] == row['cities'], 'lat'].values
    lng = cities.loc[cities['city'] == row['cities'], 'lng'].values
    # if city found in dataset, map city
    if (len(lat) > 0):
        #sample.set_value(index, 'lat', lat[0])
        #sample.set_value(index, 'lng', lng[0])
        folium.Marker([lat[0], lng[0]], popup=row['artists'] + ', ' + row['cities'], icon=folium.Icon(color='red',icon='info-sign')).add_to(map)
map