# Tour Locations of Rock Artists vs. Hip-hop Artists
Jake Gluck, Nhien Theresa Phan

## Introduction
Do you have a favorite music artist or genre? Have they performed in your city? Rock and hip-hop are two very popular genres, and top music charts reflect this. However, these lists don't show where artists of a particular genre usually tour, nor do they show listenership of genres in a particular city. This tutorial looks into the geographic distribution of tour locations of rock artists versus hip-hop artists. First, we demonstrate how to scrape top artists of specific genres from last.fm. We use data from setlist.fm to map these artists' tour locations. We make hypotheses about the data, and by plotting these tour locations using `folium`, we can analyze the geographic distribution of genres both worldwide and within cities to see if certain areas are predominated by one genre.

## Python dependencies
You will need Python 3 and the following libraries:

- `bs4`
- `folium`
- `itertools`
- `json`
- `numpy`
- `pandas`
- `requests`
- `time`

`folium` can be installed using `pip`:

In [15]:
!pip install folium



In [16]:
import bs4
import folium
import json
import pandas as pd
import requests as rq

## Scraping artists from the last.fm website
First, we must retrieve the music artists whose tour dates we want to explore. last.fm is a music website where users can share their listening data and tag artists. By scraping their tag pages, we can get a list of top artists in whatever genres we are interested in.

For this tutorial, we will be comparing rock and hip-hop. We will be scraping the first three pages of artist results for each genre. Each page lists 22 artists, so we will have 66 rock artists and 66 hip-hop artists.

In [19]:
# pages to be scraped
hiphop_page1 = 'https://www.last.fm/tag/hip-hop/artists'
hiphop_page2 = 'https://www.last.fm/tag/hip-hop/artists?page=2'
hiphop_page3 = 'https://www.last.fm/tag/hip-hop/artists?page=3'
rock_page1 = 'https://www.last.fm/tag/rock/artists'
rock_page2 = 'https://www.last.fm/tag/rock/artists?page=2'
rock_page3 = 'https://www.last.fm/tag/rock/artists?page=3'

def scrape_page(link):
    # scrape and parse page
    soup = bs4.BeautifulSoup(rq.get(link).text, 'html.parser')
    # get artist names
    elements = soup.findAll('h3', {'class':'big-artist-list-title'})
    artists = []
    for e in elements:
        artists.append(e.text)
    return artists

# concatenate list from each page
rock_artists = scrape_page(rock_page1) + scrape_page(rock_page2) + scrape_page(rock_page3)
hiphop_artists = scrape_page(hiphop_page1) + scrape_page(hiphop_page2) + scrape_page(hiphop_page3)

print(rock_artists)
print()
print(hiphop_artists)

['Red Hot Chili Peppers', 'The Beatles', 'Muse', 'Coldplay', 'Nirvana', 'Radiohead', 'Foo Fighters', 'U2', 'Linkin Park', 'Led Zeppelin', 'Queen', 'Pink Floyd', 'The Killers', 'The White Stripes', 'The Rolling Stones', 'Green Day', 'Oasis', "Guns N' Roses", 'The Doors', 'System of a Down', 'AC/DC', 'Placebo', 'David Bowie', 'Franz Ferdinand', 'Aerosmith', 'Evanescence', 'Arctic Monkeys', 'Pearl Jam', 'Nickelback', 'Queens of the Stone Age', 'Rage Against the Machine', 'Jimi Hendrix', 'The Strokes', 'R.E.M.', 'Metallica', 'The Who', 'My Chemical Romance', 'The Smashing Pumpkins', '30 Seconds to Mars', 'Incubus', 'Audioslave', 'Paramore', 'Kings of Leon', 'The Cranberries', '3 Doors Down', 'The Offspring', 'Bon Jovi', 'The Cure', 'Nine Inch Nails', 'Gorillaz', 'Marilyn Manson', 'Papa Roach', 'Weezer', 'Deep Purple', 'Blur', 'Tenacious D', 'Fall Out Boy', 'Garbage', 'Dire Straits', 'Rammstein', 'Bob Dylan', 'Three Days Grace', 'Avril Lavigne']

['Eminem', 'Kanye West', 'Gorillaz', 'Beasti

## Using the setlist.fm API

setlist.fm is a website that collects the setlists of music artists' live performances. This data includes the location and date of the performance, which is also accessible through their API. The setlist.fm API is documented [here](https://api.setlist.fm/docs/1.0/index.html).

## Determine top cities
Count the number of tour dates that have occurred in each city that appears in the data. We will use this information later to calculate percentages of rock vs. hip-hop concerts.

In [57]:
sample = pd.read_csv('sample.csv')
city_count = {}

# for each tour stop
for index, row in sample.iterrows():
    # increment count of that city
    if row['cities'] in city_count:
        city_count[row['cities']] += 1
    else:
        city_count[row['cities']] = 1

top_cities = pd.DataFrame.from_dict(city_count, orient='index')
for index, row in top_cities.iterrows():
    top_cities.set_value(index, 'city', index)
top_cities.sort_values(by=0, ascending=False).head()

Unnamed: 0,0,city
Paris,4,Paris
Toronto,3,Toronto
Amsterdam,3,Amsterdam
Boston,3,Boston
Glasgow,2,Glasgow


## Getting latitude and longitude using Google Maps Geocoding API
Before we can plot the visited cities on a map, we need to get the latitude and longitude from each city name using the [Google Maps Geocoding API](https://developers.google.com/maps/documentation/geocoding/start). You will need to log into your Google account and [get an API key](https://developers.google.com/maps/documentation/geocoding/start#auth). Save this API key in a UTF-8 encoded text file. We can now use this API key to access the Google Maps Geocoding API.

In [33]:
file = open('google_maps_api_key.txt')
google_maps_api_key = file.read().replace('\ufeff','')
file.close()

Search each city name in the data to get the latitude and longitude of each city. Add this information to the dataframe.

In [60]:
url = 'https://maps.googleapis.com/maps/api/geocode/json?key=' + google_maps_api_key + '&address='
tour_locs = pd.read_csv('sample.csv')
city_locs = {}
rock_count = {}
hiphop_count = {}

for index, row in tour_locs.iterrows():
    # get city using Google Maps Geocoding API
    city = rq.get(url + row['cities'])
    # get latitude and longitude
    location = json.loads(city.text)['results'][0]['geometry']['location']
    # set latitude and longitude in tour date dataframe
    tour_locs.set_value(index, 'lat', location['lat'])
    tour_locs.set_value(index, 'lng', location['lng'])
    # add data to city locations
    city_locs[row['cities']] = location
    if 
tour_locs.head()

Unnamed: 0,index,venues,cities,artists,lat,lng
0,0,Rod Laver Arena,Melbourne,Drake,-37.813628,144.963058
1,1,Brisbane Entertainment Centre,Brisbane,Drake,-27.469771,153.025124
2,2,Qudos Bank Arena,Sydney,Drake,-33.86882,151.209295
3,3,Qudos Bank Arena,Sydney,Drake,-33.86882,151.209295
4,4,Spark Arena,Auckland,Drake,-36.84846,174.763331


We can also add the coordinate data to our `top_cities` dataframe.

In [61]:
for key, value in cities.items():
    top_cities.loc[top_cities['city'] == key, 'lat'] = value['lat']
    top_cities.loc[top_cities['city'] == key, 'lng'] = value['lng']
top_cities.sort_values(by=0, ascending=False).head()

Unnamed: 0,0,city,lat,lng
Paris,4,Paris,48.856614,2.352222
Toronto,3,Toronto,43.653226,-79.383184
Amsterdam,3,Amsterdam,52.370216,4.895168
Boston,3,Boston,42.360082,-71.05888
Glasgow,2,Glasgow,55.864237,-4.251806


## Mapping artists’ tour locations with `folium`
Now that we have the latitude and longitude coordinates of our artists' tour dates, we can plot the tour locations on a map using [`folium`](http://python-visualization.github.io/folium/docs-master/), a library that adapts the [`leaflet.js`](http://leafletjs.com/) mapping library for a Python ecosystem. We demonstrate how to install `folium` with `pip` in the "Python dependencies" section of this tutorial, but detailed installation instructions can be found [here](http://python-visualization.github.io/folium/docs-master/installing.html#installation).

In [64]:
# map centered on United States
map1 = folium.Map(location=[39.5, -98.35], zoom_start=4)

for index, row in tour_locs.iterrows():
    folium.Marker([row['lat'], row['lng']], popup=row['artists'] + ', ' + row['cities'], icon=folium.Icon(color='red',icon='info-sign')).add_to(map1)
map1

## Mapping individual cities
The second map we want to create will plot one marker per city that appears in the data. Each marker can be clicked on to reveal the percentage of rock concerts vs. hip hop concerts that have occurred at that city.

In [65]:
# map centered on United States
map2 = folium.Map(location=[39.5, -98.35], zoom_start=4)

for index, row in top_cities.iterrows():
    folium.Marker([row['lat'], row['lng']], popup=row['city'], icon=folium.Icon(color='red',icon='info-sign')).add_to(map2)
map2

## Analysis