# Introduction

In this project I'll try to play around with setlist information and ultimately try to see if I can make recommendations about a venue a band can play at based on how well the band's style matches with a venue's attributes.

[Retrieving Data From Setlist.fm API](#Retrieving Data From Setlist.fm API)

[Exploring Data](#Exploring Data)

[Modeling Data](#Modeling Data)

[Model Validation](#Model Validation)

## Retrieving Data From Setlist.fm API

<a id='Retrieving Data From Setlist.fm API'></a>

In [1]:
import numpy as np#Math library
import pandas as pd#Table library
import matplotlib.pyplot as plt#Plotting library
import pandas as pd
import warnings
import seaborn as sns#Plotting library
warnings.filterwarnings('ignore')#Gets rid of popup warnings
%matplotlib nbagg

Here I'll first see if I can GET info from setlist.fm using its RESTful API. I know that one of the methods is to get artist information. Since I know Elliott Smith's MID identifier, I can retrieve it first in a json format and then see if I can expand it and turn it into a dictionary that I can organize to find useful information.

I'll use the requests library (could also use urllib2) to use its get method to communicate with the API. I'll first try to retrieve items for a specific artist, Elliott Smith who has a musicbrainz ID (MID) of 03ad1736-b7c9-412a-b442-82536d63a5c4. Let's see what it returns in our json file.

In [110]:
import requests

raw_data= requests.get(
    'https://api.setlist.fm/rest/0.1/artist/03ad1736-b7c9-412a-b442-82536d63a5c4/setlists.json')

In [128]:
#Let's see what it contains
print raw_data.content[0:100]

{
 "setlists":{
  "@itemsPerPage":"20",
  "@page":"1",
  "@total":"300",
  "setlist":[
   {
    "@ev


So it looks like there are 300 setlists total and that we get 20 setlists per page, and we're only on page 1. Thus, we'll need to repeat this step multiple times to get all 300 setlists. In the documentation it mentions that we can choose page number as an argument so we'll increase the page number every time we make a request and append that our our address and append the results to a new list until we don't have any more pages.

In [129]:
#Import json library for json.loads (turns json object into dictionary object)
import json

#Initialize empty list to append setlists on page to
totalData= []

#Set main body of address
address= 'https://api.setlist.fm/rest/0.1/artist/03ad1736-b7c9-412a-b442-82536d63a5c4/setlists.json'

#Initialize counter and comparison variable
i= 1
val= True
pageArg= '?p='

#Loop while val is true. Will update counter and page number at every loop
#Will then append to address and check if response of call is 
#a valid payload response (200). If so, append data to totalData list

while val== True:
    fetch= requests.get(address+pageArg+str(i))
    if str(fetch) == '<Response [200]>':
        totalData.append(json.loads(str(fetch.content)))
        i +=1
    else:
        val= False

In [136]:
print 'Number of elements in our list is: {}'.format(len(totalData))
print 'Page number of last element is: {}'.format(totalData[14]['setlists']['@page'])

Number of elements in our list is: 15
Page number of last element is: 15


Excellent! I was able to iterate through multiple pages of the setlist.fm webpage and pulled out all of the setlist information for Elliott Smith! :)

## Exploring Data

<a id='Exploring Data'></a>

Ok so I basically have three pieces of information. I found the MusicBrainz database which is a little complicated to access (need to setup posgresql db from scratch). That said, I also found the million song tweet database, a lot of which contains rich information about geography etc etc. Finally, I have the setlist database which has info on venues and artist history at these venues. 

Let me now try to explore these data packets more closely. 

In [165]:
#I'll just fetch the data for the first event in our list

firstSet= totalData[0]['setlists']['setlist'][0]

In [166]:
firstSet.keys()

[u'artist',
 u'url',
 u'@lastUpdated',
 u'venue',
 u'@versionId',
 u'sets',
 u'@id',
 u'@eventDate']

In [167]:
#Let's look at the information available for the venue

firstSet['venue']

{u'@id': u'13d721b9',
 u'@name': u'Redfest',
 u'city': {u'@id': u'5780993',
  u'@name': u'Salt Lake City',
  u'@state': u'Utah',
  u'@stateCode': u'UT',
  u'coords': {u'@lat': u'40.7607794', u'@long': u'-111.8910474'},
  u'country': {u'@code': u'US', u'@name': u'United States'}},
 u'url': u'http://www.setlist.fm/venue/redfest-salt-lake-city-ut-usa-13d721b9.html'}

Name, city/state, and long/lat are pretty good. Now let's see if there's any additional info if we get data from the venue endpoint directly. 

In [168]:
venueAddress= 'https://api.setlist.fm/rest/0.1/venue/3d6358b.json'
fetchVenue= requests.get(venueAddress)
fetchVenue.content

'{\n "venue":{\n  "@id":"3d6358b",\n  "@name":"Bottom of the Hill",\n  "city":{\n   "@id":"5391959",\n   "@name":"San Francisco",\n   "@state":"California",\n   "@stateCode":"CA",\n   "coords":{\n    "@lat":"37.775",\n    "@long":"-122.419"\n   },\n   "country":{\n    "@code":"US",\n    "@name":"United States"\n   }\n  },\n  "url":"http:\\/\\/www.setlist.fm\\/venue\\/bottom-of-the-hill-san-francisco-ca-usa-3d6358b.html"\n }\n}'

Hmm..there's not much more than what we already had. Ok, may need to look for a different resource for venue information/descriptions. 

## Formatting Data

<a id='Formatting Data'></a>

## Formatting Data

<a id='Formatting Data'></a>

## Formatting Data

<a id='Formatting Data'></a>