# Music analysis

Spotify provide a [web API](https://developer.spotify.com/web-api/) which can be used to download data about its music. This data includes the [*audio features*](https://developer.spotify.com/web-api/object-model/#audio-features-object) of a track, a set of measures including 'acousticness', 'danceability', 'speechiness' and 'valence':
>A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

In this exercise, we shall analyse some data about music from different genres which was pulled from the spotify web API using the python [spotipy](http://spotipy.readthedocs.io) library. This was inspired by [this blog post](http://rcharlie.com/2017-02-16-fitteR-happieR/), which used R to try and identify the most depressing Radiohead songs.

In [2]:
import pandas as pd
import plotly
plotly.offline.init_notebook_mode(connected=True)
import plotly.plotly as py
import plotly.graph_objs as go
import plotly.tools as tls

Let's first import our data. These csv files were created by downloading the track data for spotify playlists from the respective genres. 

In [3]:
dfs = {'indie': pd.read_csv('indie.csv'), 'pop': pd.read_csv('pop.csv'), 'country': pd.read_csv('country.csv'), 
       'metal': pd.read_csv('metal.csv'), 'house': pd.read_csv('house.csv'), 'rap': pd.read_csv('rap.csv')}

Let's look at some of this data:

In [40]:
dfs['pop']

Unnamed: 0.1,Unnamed: 0,album,artists,duration_ms,explicit,href,id,name,popularity,preview_url,...,liveness,loudness,mode,speechiness,tempo,time_signature,track_href,type.1,uri.1,valence
0,1,All Your Fault: Pt. 1,"Bebe Rexha, Ty Dolla $ign",197253,True,https://api.spotify.com/v1/tracks/4ZJPwET9Jrgp...,4ZJPwET9Jrgpkqi4Vo3Yg8,Bad Bitch (feat. Ty Dolla $ign),85,https://p.scdn.co/mp3-preview/9284a8a94b8c16ef...,...,0.2340,-6.252,1,0.0539,139.910,4,https://api.spotify.com/v1/tracks/4ZJPwET9Jrgp...,audio_features,spotify:track:4ZJPwET9Jrgpkqi4Vo3Yg8,0.364
1,2,Good Life (with G-Eazy & Kehlani),"G-Eazy, Kehlani",225525,False,https://api.spotify.com/v1/tracks/1Eck97uRMlpr...,1Eck97uRMlprKOOJN9oO1E,Good Life (with G-Eazy & Kehlani),83,https://p.scdn.co/mp3-preview/88d8d456dcf9b55e...,...,0.0568,-5.220,1,0.2120,168.385,4,https://api.spotify.com/v1/tracks/1Eck97uRMlpr...,audio_features,spotify:track:1Eck97uRMlprKOOJN9oO1E,0.551
2,3,13 Reasons Why (A Netflix Original Series Soun...,Lord Huron,206933,False,https://api.spotify.com/v1/tracks/3FsBtu3gdlfZ...,3FsBtu3gdlfZjBLXyDvmj1,The Night We Met,32,,...,0.6390,-9.560,1,0.0378,87.024,3,https://api.spotify.com/v1/tracks/3FsBtu3gdlfZ...,audio_features,spotify:track:3FsBtu3gdlfZjBLXyDvmj1,0.117
3,4,Obsession (feat. Jon Bellion),"Vice, Jon Bellion",221982,False,https://api.spotify.com/v1/tracks/542Xd5qDeLBv...,542Xd5qDeLBvgXZXhfW7LE,Obsession (feat. Jon Bellion),84,https://p.scdn.co/mp3-preview/16ab1dd04110aa3f...,...,0.2100,-7.775,1,0.0300,101.999,4,https://api.spotify.com/v1/tracks/542Xd5qDeLBv...,audio_features,spotify:track:542Xd5qDeLBvgXZXhfW7LE,0.441
4,5,Slow Hands,Niall Horan,188174,False,https://api.spotify.com/v1/tracks/27vTihlWXiz9...,27vTihlWXiz9f9lJM3XGVU,Slow Hands,83,,...,0.0574,-6.623,1,0.0519,85.899,4,https://api.spotify.com/v1/tracks/27vTihlWXiz9...,audio_features,spotify:track:27vTihlWXiz9f9lJM3XGVU,0.874
5,6,So Good,"Clean Bandit, Zara Larsson",214866,False,https://api.spotify.com/v1/tracks/4SPLWgCPoKwU...,4SPLWgCPoKwULz2UTM8TKg,Symphony,41,,...,0.2340,-4.699,0,0.0429,122.948,4,https://api.spotify.com/v1/tracks/4SPLWgCPoKwU...,audio_features,spotify:track:4SPLWgCPoKwULz2UTM8TKg,0.470
6,7,Memories...Do Not Open,The Chainsmokers,207520,True,https://api.spotify.com/v1/tracks/6cPyTS0Kk2sc...,6cPyTS0Kk2sc4xQwC93kOg,Break Up Every Night,84,https://p.scdn.co/mp3-preview/119d2078bf607422...,...,0.0872,-5.957,1,0.0437,149.999,4,https://api.spotify.com/v1/tracks/6cPyTS0Kk2sc...,audio_features,spotify:track:6cPyTS0Kk2sc4xQwC93kOg,0.536
7,8,I'm the One,"DJ Khaled, Justin Bieber, Quavo, Chance The Ra...",288876,True,https://api.spotify.com/v1/tracks/72Q0FQQo32KJ...,72Q0FQQo32KJloivv5xge2,I'm the One,100,https://p.scdn.co/mp3-preview/f6fdecfbaae1ed54...,...,0.1340,-4.267,1,0.0367,80.984,4,https://api.spotify.com/v1/tracks/72Q0FQQo32KJ...,audio_features,spotify:track:72Q0FQQo32KJloivv5xge2,0.811
8,9,Attention,Charlie Puth,211475,False,https://api.spotify.com/v1/tracks/4iLqG9SeJSnt...,4iLqG9SeJSnt0cSPICSjxv,Attention,94,https://p.scdn.co/mp3-preview/e20bdb50a10a7c5a...,...,0.0848,-4.432,0,0.0432,100.041,4,https://api.spotify.com/v1/tracks/4iLqG9SeJSnt...,audio_features,spotify:track:4iLqG9SeJSnt0cSPICSjxv,0.758
9,10,The Cure,Lady Gaga,211363,False,https://api.spotify.com/v1/tracks/51PIvodunv6N...,51PIvodunv6NmX5250zxAh,The Cure,88,,...,0.0869,-4.842,1,0.0356,99.977,4,https://api.spotify.com/v1/tracks/51PIvodunv6N...,audio_features,spotify:track:51PIvodunv6NmX5250zxAh,0.539


We can see that each dataframe contains 50-200 tracks, and for each track we have a variety of data.

Now let's try investigating this data with some plots. First, let's plot the energy against the danceability. 

In [34]:
fig = tls.make_subplots(rows=1, cols=1)

for name, df in dfs.items():

    t = go.Scatter(x=df.danceability, y=df.energy, mode='markers', 
                       name=name, text=df.name + ' - ' + df.artists)
    fig.append_trace(t, 1, 1)

fig['layout']['xaxis1'].update(title='Danceability')
fig['layout']['yaxis1'].update(title='Energy')
fig['layout'].update(hovermode='closest')
plotly.offline.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]



This is pretty interesting: if you select just the metal, country and house datasets, you can see that the data form 3 pretty distinct clusters. The rap and house datasets surprisingly occupy a similar region in the plot. The metal dataset is the most tightly clustered, all tracks being high energy but not very danceable. 

Let's push our plotly skills further to see if there is any other interesting behaviour for the other audio features categories:

In [39]:
# function to make list of traces given dictionary of dataframes and the dataframe keys to be plotted
def make_traces(x, y, dfs):
    ts = []
    for name, df in dfs.items():
        ts.append(go.Scatter(x=df[x], y=df[y], mode='markers', 
                       name=name, text=df.name + ' - ' + df.artists))
    return ts

data = dict()

# define which categories we want to include
categories = ['duration_ms', 'popularity', 'acousticness', 'danceability', 
       'energy', 'instrumentalness', 'key', 'liveness', 'loudness',
       'mode', 'speechiness', 'tempo', 'time_signature', 'valence']

# for each category make a list of the data from each dataframe
for cat in categories:
    data[cat] = [df[cat] for df in dfs.values()]

# define behaviour of dropdown menus
# we've defined the buttons using list comprehensions - on selection, the x/y data and axis label are updated
updatemenus = list([
    dict(
         x=-0.05,
         y=0.8,
         buttons=list([   
            dict(label = cat,
                 method = 'update',
                 args = [{'x': data[cat]},
                         {'xaxis': dict(title = cat)}]) for cat in categories
        ])
    ),
    dict(
         x=-0.05,
         y=1,
         buttons=list([   
            dict(label = cat,
                 method = 'update',
                 args = [{'y': data[cat]},
                         {'yaxis': dict(title = cat)}]) for cat in categories
        ])
    )
])

# set the initial data
initial_dat = go.Data(make_traces('duration_ms', 'duration_ms', dfs))

# make the layout
layout = dict(title='Compare genres', showlegend=True,
              updatemenus=updatemenus)

fig = dict(data=initial_dat, layout=layout)
fig['layout'].update(hovermode='closest')
plotly.offline.iplot(fig)