# Assessment - Music Genre Prediction

In [1]:
import pandas as pd
import numpy as np

import IPython.display as ipd

In [2]:
DATADIR = 'Data/fma_metadata/'

## Purpose

I'm executing a bit of EDA on Music Statistics from a large dataset from Free Music Archive.

My goal is to ascertain (relatively quickly) whether there is a useful or interesting "signal" or pattern when treating "Genre" as the target.  That is, can I use the remaining data as features to provide meaningful predictions or insight relative to genre?

## Data Import

### Concerns

This one requires some thought.

There are at least a couple "oddities" or challenges here.

First, some of these csv files have mutli-line headers.

Second, and more important, it's probably prudent to use the highlighted "small" set of 8 balanced genres.  Trouble is, they didn't publish the list of these tracks, just the set of the sound files.  But they did publish the code used to generate that list.  Let's see if I can recreate that list...

**Got it!!**

There were a couple issues.  It wouldn't quite run as is since it expected the directory with all the sound files.  And there seems to have been a reference to something no longer existing in a module.  No worries, it was easy enough to recreate that piece and confirm the counts were the same at each step.

### The Small Set

In [3]:
fma_small = pd.read_pickle(DATADIR+'fma_small.pkl', compression='gzip')

In [10]:
fma_small.head(5)

Unnamed: 0_level_0,album,album,artist,artist,artist,track,track,track,track,track,...,artist,artist,artist,artist,artist,artist,artist,track,track,popularity_measure
Unnamed: 0_level_1,id,title,id,name,website,license,tags,bit_rate,comments,composer,...,latitude,location,longitude,members,related_projects,wikipedia_page,tags,genres_all,genre_top,Unnamed: 21_level_1
track_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
42377,8295,Directionless EP,9067,Broke For Free,http://brokeforfree.com/,Creative Commons Attribution,"['trip hop', 'tracks to sync', 'stellar']",320000,37,,...,36.974117,"Santa Cruz, CA",-122.030796,Tom Cascino,,,"['video', 'broke for free']","[184, 15]",Electronic,2.0
69170,12325,Enthusiast,14206,Tours,www.thesnakerecords.com,Creative Commons Attribution,['spotify'],320000,12,,...,39.95228,"Philadelphia, Pennsylvania",-75.162454,,,,['tours'],"[495, 468, 15]",Electronic,1.221671
24425,5347,Free Beats Sel. 3,6274,Black Ant,http://b-l-a-c-k.tumblr.com/,Creative Commons Attribution,['hip-hop'],128000,7,,...,26.010403,"Hollywood, Florida",-80.160084,,,,"['black ant', 'basssss', 'hip-hop']",[21],Hip-Hop,1.114242
54159,6480,Blue,11944,Mark Neil,,Attribution-Noncommercial-Share Alike 3.0 Unit...,[],320000,8,,...,,,,,,,"['edinburgh', 'mark neil']","[18, 1235]",Instrumental,0.678481
55718,10331,The agency of missing hearts,12166,et_,http://et-official.com/,Attribution-NonCommercial-ShareAlike 3.0 Inter...,[],217755,24,,...,57.786499,"Bezhetsk, Russia",36.701,,,,['et_'],"[26, 12]",Rock,0.429342


In [12]:
fma_small['track','genre_top'].value_counts()

Experimental     1000
Electronic       1000
Folk             1000
Rock             1000
Instrumental     1000
Pop              1000
Hip-Hop          1000
International    1000
Name: (track, genre_top), dtype: int64

What a perverse collection of genres!  What in the world is "Experimental"?  Or "International"?!?

I may want to use the code provided to hard-code my desired choice of genres.  Sheesh...

But it IS balanced.

In [13]:
fma_small.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8000 entries, 42377 to 129879
Data columns (total 51 columns):
(album, id)                    8000 non-null int64
(album, title)                 8000 non-null object
(artist, id)                   8000 non-null int64
(artist, name)                 8000 non-null object
(artist, website)              6452 non-null object
(track, license)               7995 non-null object
(track, tags)                  8000 non-null object
(track, bit_rate)              8000 non-null int64
(track, comments)              8000 non-null int64
(track, composer)              195 non-null object
(track, date_created)          8000 non-null datetime64[ns]
(track, date_recorded)         465 non-null datetime64[ns]
(track, duration)              8000 non-null int64
(track, favorites)             8000 non-null int64
(track, genres)                8000 non-null object
(track, information)           174 non-null object
(track, interest)              8000 non-null int

### The Raw Set

In [7]:
tracks = pd.read_csv(DATADIR+'raw_tracks.csv', index_col=0)
albums = pd.read_csv(DATADIR+'raw_albums.csv', index_col=0)
artists = pd.read_csv(DATADIR+'raw_artists.csv', index_col=0)
genres = pd.read_csv(DATADIR+'raw_genres.csv', index_col=0)


In [9]:
N = 5
ipd.display(tracks.head(N))
ipd.display(albums.head(N))
ipd.display(artists.head(N))
ipd.display(genres.head(N))

Unnamed: 0_level_0,album_id,album_title,album_url,artist_id,artist_name,artist_url,artist_website,license_image_file,license_image_file_large,license_parent_id,...,track_information,track_instrumental,track_interest,track_language_code,track_listens,track_lyricist,track_number,track_publisher,track_title,track_url
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2,1.0,AWOL - A Way Of Life,http://freemusicarchive.org/music/AWOL/AWOL_-_...,1,AWOL,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,http://i.creativecommons.org/l/by-nc-sa/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,5.0,...,,0,4656,en,1293,,3,,Food,http://freemusicarchive.org/music/AWOL/AWOL_-_...
3,1.0,AWOL - A Way Of Life,http://freemusicarchive.org/music/AWOL/AWOL_-_...,1,AWOL,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,http://i.creativecommons.org/l/by-nc-sa/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,5.0,...,,0,1470,en,514,,4,,Electric Ave,http://freemusicarchive.org/music/AWOL/AWOL_-_...
5,1.0,AWOL - A Way Of Life,http://freemusicarchive.org/music/AWOL/AWOL_-_...,1,AWOL,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,http://i.creativecommons.org/l/by-nc-sa/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,5.0,...,,0,1933,en,1151,,6,,This World,http://freemusicarchive.org/music/AWOL/AWOL_-_...
10,6.0,Constant Hitmaker,http://freemusicarchive.org/music/Kurt_Vile/Co...,6,Kurt Vile,http://freemusicarchive.org/music/Kurt_Vile/,http://kurtvile.com,http://i.creativecommons.org/l/by-nc-nd/3.0/88...,http://fma-files.s3.amazonaws.com/resources/im...,,...,,0,54881,en,50135,,1,,Freeway,http://freemusicarchive.org/music/Kurt_Vile/Co...
20,4.0,Niris,http://freemusicarchive.org/music/Chris_and_Ni...,4,Nicky Cook,http://freemusicarchive.org/music/Chris_and_Ni...,,http://i.creativecommons.org/l/by-nc-nd/3.0/88...,http://fma-files.s3.amazonaws.com/resources/im...,,...,,0,978,en,361,,3,,Spiritual Level,http://freemusicarchive.org/music/Chris_and_Ni...


Unnamed: 0_level_0,album_comments,album_date_created,album_date_released,album_engineer,album_favorites,album_handle,album_image_file,album_images,album_information,album_listens,album_producer,album_title,album_tracks,album_type,album_url,artist_name,artist_url,tags
album_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1,0,11/26/2008 01:44:45 AM,1/05/2009,,4,AWOL_-_A_Way_Of_Life,https://freemusicarchive.org/file/images/album...,"[{'image_id': '1955', 'image_file': 'https://f...",<p></p>,6073,,AWOL - A Way Of Life,7,Album,http://freemusicarchive.org/music/AWOL/AWOL_-_...,AWOL,http://freemusicarchive.org/music/AWOL/,[]
100,0,11/26/2008 01:55:44 AM,1/09/2009,,0,On_Opaque_Things,https://freemusicarchive.org/file/images/album...,"[{'image_id': '4403', 'image_file': 'https://f...",,5613,,On Opaque Things,4,Album,http://freemusicarchive.org/music/Bird_Names/O...,Bird Names,http://freemusicarchive.org/music/Bird_Names/,[]
1000,0,12/04/2008 09:28:49 AM,10/26/2008,,0,DMBQ_Live_at_2008_Record_Fair_on_WFMU_Record_F...,https://freemusicarchive.org/file/images/album...,"[{'image_id': '31997', 'image_file': 'https://...",<p>http://blog.wfmu.org/freeform/2008/10/what-...,1092,,DMBQ Live at 2008 Record Fair on WFMU Record F...,4,Live Performance,http://freemusicarchive.org/music/DMBQ/DMBQ_Li...,DMBQ,http://freemusicarchive.org/music/DMBQ/,[]
10000,0,9/05/2011 04:42:57 PM,,,0,Live_at_CKUT_on_Montreal_Sessions_1434,https://freemusicarchive.org/file/images/album...,"[{'image_id': '12266', 'image_file': 'https://...",<p>Live Set on the Montreal Session February 2...,1001,,Live at CKUT on Montreal Sessions,1,Radio Program,http://freemusicarchive.org/music/Sundrips/Liv...,Sundrips,http://freemusicarchive.org/music/Sundrips/,[]
10001,0,9/06/2011 12:02:58 AM,1/01/2006,,0,Grounds_Dream_Cosmic_Love,https://freemusicarchive.org/file/images/album...,"[{'image_id': '24091', 'image_file': 'https://...","<p>Recorded in Linnavuori, Finland, 2005 (with...",504,,Ground's Dream Cosmic Love,1,Album,http://freemusicarchive.org/music/Uton/Grounds...,Uton,http://freemusicarchive.org/music/Uton/,[]


Unnamed: 0_level_0,artist_active_year_begin,artist_active_year_end,artist_associated_labels,artist_bio,artist_comments,artist_contact,artist_date_created,artist_donation_url,artist_favorites,artist_flattr_name,...,artist_location,artist_longitude,artist_members,artist_name,artist_paypal_name,artist_related_projects,artist_url,artist_website,artist_wikipedia_page,tags
artist_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,2006.0,,,"<p>A Way Of Life, A Collective of Hip-Hop from...",0,Brown Bum aka Choke,11/26/2008 01:42:32 AM,,9,,...,New Jersey,-74.405661,"Sajje Morocco,Brownbum,ZawidaGod,Custodian of ...",AWOL,,The list of past projects is 2 long but every1...,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,,['awol']
10,,,"Mistletone, Marriage Records","<p>""Lucky Dragons"" means any recorded or perfo...",3,Lukey Dargons,11/26/2008 01:43:35 AM,http://glaciersofnice.com/shop/,111,,...,"Los Angeles, CA",-118.243685,Luke Fischbeck\nSarah Rara,Lucky Dragons,,,http://freemusicarchive.org/music/Lucky_Dragons/,http://hawksandsparrows.org/,,['lucky dragons']
100,2004.0,,"Captcha Records (HBSP-2X), Pickled Egg (Europe)","<p><span style=""font-family:Verdana, Geneva, A...",1,Chris Kalis,11/26/2008 02:05:22 AM,,8,,...,"Chicago, IL",-87.629798,"Chris Kalis, Harry Brenner, Scott McGaughey, B...",Chandeliers,,"Killer Whales, \nMichael Columbia\nMandate\nMr...",http://freemusicarchive.org/music/Chandeliers/,thechandeliers.com,,['chandeliers']
1000,,,,"<p><a href=""http://marzipanmarzipan.com"">Marzi...",0,,12/04/2008 09:24:35 AM,,0,,...,,12.56738,,Marzipan Marzipan,,,http://freemusicarchive.org/music/Marzipan_Mar...,https://soundcloud.com/marzipanmarzipan,,[]
10000,,,,"<p><span style=""font-family:'Times New Roman',...",0,,1/21/2011 02:11:31 PM,,1,,...,,,Jack Hertz\nPHOBoS\nBlue Hell,"Jack Hertz, PHOBoS, Blue Hell",,,http://freemusicarchive.org/music/Jack_Hertz_P...,http://surrism.phonoethics.com/surrism-phonoet...,,['jack hertz phobos blue hell']


Unnamed: 0_level_0,genre_color,genre_handle,genre_parent_id,genre_title
genre_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,#006666,Avant-Garde,38.0,Avant-Garde
2,#CC3300,International,,International
3,#000099,Blues,,Blues
4,#990099,Jazz,,Jazz
5,#8A8A65,Classical,,Classical


OK.  It seems clear enough I need nothing any longer from these raw files.

Leaving the output here.  But I won't be running these cells again.

### Filtering the features

There are a couple of datafiles on a per track basis.

I need to filter these to create just subsets matching the "small" set.

Furthermore, this should be straightforward so I could repeat with a different small set.

In [4]:
features = pd.read_csv(DATADIR+'features.csv', header=[0,1,2],skiprows=[3])
features['track_id']=features.iloc[:,0]
features.iloc[:,0] = ''
features.set_index('track_id',inplace=True)

In [5]:
features.head()

Unnamed: 0_level_0,feature,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,...,tonnetz,tonnetz,tonnetz,zcr,zcr,zcr,zcr,zcr,zcr,zcr
Unnamed: 0_level_1,statistics,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,...,std,std,std,kurtosis,max,mean,median,min,skew,std
Unnamed: 0_level_2,number,01,02,03,04,05,06,07,08,09,...,04,05,06,01,01,01,01,01,01,01
track_id,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2,,7.180653,5.230309,0.249321,1.34762,1.482478,0.531371,1.481593,2.691455,0.866868,...,0.054125,0.012226,0.012111,5.75889,0.459473,0.085629,0.071289,0.0,2.089872,0.061448
3,,1.888963,0.760539,0.345297,2.295201,1.654031,0.067592,1.366848,1.054094,0.108103,...,0.063831,0.014212,0.01774,2.824694,0.466309,0.084578,0.063965,0.0,1.716724,0.06933
5,,0.527563,-0.077654,-0.27961,0.685883,1.93757,0.880839,-0.923192,-0.927232,0.666617,...,0.04073,0.012691,0.014759,6.808415,0.375,0.053114,0.041504,0.0,2.193303,0.044861
10,,3.702245,-0.291193,2.196742,-0.234449,1.367364,0.998411,1.770694,1.604566,0.521217,...,0.074358,0.017952,0.013921,21.434212,0.452148,0.077515,0.071777,0.0,3.542325,0.0408
20,,-0.193837,-0.198527,0.201546,0.258556,0.775204,0.084794,-0.289294,-0.81641,0.043851,...,0.095003,0.022492,0.021355,16.669037,0.469727,0.047225,0.040039,0.000977,3.189831,0.030993


In [17]:
features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 106574 entries, 0 to 106573
Columns: 519 entries, (feature, statistics, number, track_id) to (zcr, std, 01, Unnamed: 518_level_3)
dtypes: float64(518), int64(1)
memory usage: 422.0 MB


In [12]:
features = features.loc[fma_small.index]
features.to_pickle(DATADIR+'features_small.pkl', compression='gzip')

In [13]:
echonest = pd.read_csv(DATADIR+'echonest.csv', header=[0,1,2],skiprows=[3])
echonest['track_id']=echonest.iloc[:,0]
echonest.iloc[:,0] = ''
echonest.set_index('track_id',inplace=True)

In [22]:
echonest.head()

Unnamed: 0_level_0,Unnamed: 0_level_0,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest,echonest
Unnamed: 0_level_1,Unnamed: 0_level_1,audio_features,audio_features,audio_features,audio_features,audio_features,audio_features,audio_features,audio_features,metadata,...,temporal_features,temporal_features,temporal_features,temporal_features,temporal_features,temporal_features,temporal_features,temporal_features,temporal_features,temporal_features
Unnamed: 0_level_2,Unnamed: 0_level_2,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,album_date,...,214,215,216,217,218,219,220,221,222,223
track_id,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
42377,,0.688874,0.878591,0.562876,0.855447,0.105042,0.063445,128.999,0.404459,,...,-0.366513,-1.17398,0.244819,0.23147,0.02606,0.06635,2.46376,2.39741,7.345682,80.701736
69170,,,,,,,,,,,...,,,,,,,,,,
24425,,0.06576,0.786532,0.308971,0.810298,0.111487,0.160791,90.06,0.486769,,...,-3.536418,22.815329,0.304779,0.27361,0.447004,0.06014,14.56127,14.50113,20.668648,437.816803
54159,,,,,,,,,,,...,,,,,,,,,,
55718,,0.000672,0.512246,0.861985,0.874162,0.860731,0.034535,140.042,0.309477,2010.0,...,-1.98377,4.667251,0.257487,0.21492,0.02633,0.06458,1.28127,1.21669,1.81321,4.459955


In [15]:
echonest = echonest.loc[fma_small.index]
echonest.to_pickle(DATADIR+'echonest_small.pkl', compression='gzip')

### Load from pickled

For convenience, as needed, just hop here and load the pickled data.

In [18]:
df       = pd.read_pickle(DATADIR+'fma_small.pkl', compression='gzip')
features = pd.read_pickle(DATADIR+'features_small.pkl', compression='gzip')
echonest = pd.read_pickle(DATADIR+'echonest_small.pkl', compression='gzip')


### TODO

I need to cleanup these goofy multiline headers and possibly merge these dataframes.

## Exploratory Data Analysis

In [20]:
features.head()

Unnamed: 0_level_0,feature,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,chroma_cens,...,tonnetz,tonnetz,tonnetz,zcr,zcr,zcr,zcr,zcr,zcr,zcr
Unnamed: 0_level_1,statistics,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,kurtosis,...,std,std,std,kurtosis,max,mean,median,min,skew,std
Unnamed: 0_level_2,number,01,02,03,04,05,06,07,08,09,...,04,05,06,01,01,01,01,01,01,01
track_id,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
42377,,-0.304694,-0.887206,-1.261704,-1.278487,3.638612,0.491679,-1.411359,-1.052128,-1.105154,...,0.044658,0.016154,0.017871,3.919198,0.179199,0.023434,0.013184,0.001465,1.861265,0.024401
69170,,-0.288247,0.175121,-0.661704,0.036623,-0.803657,-0.164536,-0.456644,-0.379067,0.222808,...,0.098469,0.03834,0.029368,0.347386,0.297363,0.085495,0.069824,0.001953,0.984716,0.055621
24425,,-0.849776,1.349135,2.915025,-0.045227,-0.768995,-0.914903,1.046403,5.155852,0.612009,...,0.120301,0.023772,0.031203,14.818503,0.132812,0.012246,0.008301,0.0,3.291799,0.015909
54159,,0.020482,-0.239337,-0.828526,-0.033525,0.228165,-1.083601,-0.927419,0.162571,-0.18318,...,0.066597,0.019572,0.02055,26.374187,0.181152,0.015719,0.013672,0.001465,3.734334,0.010352
55718,,-0.867856,-1.00815,2.155904,-0.164587,-0.389156,-0.404408,-0.675246,-1.165047,-0.570633,...,0.110615,0.023322,0.024289,17.184685,0.695312,0.056382,0.046875,0.001953,2.864365,0.044526


In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8000 entries, 42377 to 129879
Data columns (total 51 columns):
(album, id)                    8000 non-null int64
(album, title)                 8000 non-null object
(artist, id)                   8000 non-null int64
(artist, name)                 8000 non-null object
(artist, website)              6452 non-null object
(track, license)               7995 non-null object
(track, tags)                  8000 non-null object
(track, bit_rate)              8000 non-null int64
(track, comments)              8000 non-null int64
(track, composer)              195 non-null object
(track, date_created)          8000 non-null datetime64[ns]
(track, date_recorded)         465 non-null datetime64[ns]
(track, duration)              8000 non-null int64
(track, favorites)             8000 non-null int64
(track, genres)                8000 non-null object
(track, information)           174 non-null object
(track, interest)              8000 non-null int

In [40]:
with pd.option_context('display.max_rows', None):
    ipd.display(
        pd.concat([
            pd.get_dummies(df[('track','genre_top')]),
            features
        ],axis=1
        ).corr()
    )

Unnamed: 0,Electronic,Experimental,Folk,Hip-Hop,Instrumental,International,Pop,Rock,"(chroma_cens, kurtosis, 01)","(chroma_cens, kurtosis, 02)",...,"(tonnetz, std, 04)","(tonnetz, std, 05)","(tonnetz, std, 06)","(zcr, kurtosis, 01)","(zcr, max, 01)","(zcr, mean, 01)","(zcr, median, 01)","(zcr, min, 01)","(zcr, skew, 01)","(zcr, std, 01)"
Electronic,1.0,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,0.003738,-0.008584,...,-0.071869,-0.043146,-0.047878,-0.054487,0.080204,0.048369,-0.015825,-0.103478,-0.06422,0.153693
Experimental,-0.142857,1.0,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,0.034901,0.0917,...,-0.129275,-0.040718,-0.026481,0.031247,-0.034923,0.090152,0.097525,0.02853,0.000269,0.028001
Folk,-0.142857,-0.142857,1.0,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,0.007723,-0.020359,...,0.243497,0.215825,0.203829,0.064938,-0.061234,-0.197127,-0.175349,0.006253,0.132587,-0.128526
Hip-Hop,-0.142857,-0.142857,-0.142857,1.0,-0.142857,-0.142857,-0.142857,-0.142857,-0.005957,0.009134,...,-0.177153,-0.172596,-0.159164,-0.085572,0.143866,0.093643,0.01293,-0.101589,-0.083435,0.214114
Instrumental,-0.142857,-0.142857,-0.142857,-0.142857,1.0,-0.142857,-0.142857,-0.142857,-0.017885,-0.043836,...,0.233043,0.282035,0.274963,0.131872,-0.120761,-0.216097,-0.191052,0.013967,0.154382,-0.141248
International,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,1.0,-0.142857,-0.142857,-0.02693,-0.030199,...,-0.055282,-0.110185,-0.109662,-0.047749,0.006109,0.081675,0.099441,0.151879,-0.070274,-0.03021
Pop,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,1.0,-0.142857,-0.034641,-0.025401,...,0.0398,0.009593,0.000112,-0.006682,0.046249,0.005272,0.013448,-0.063018,0.01488,-0.002651
Rock,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,1.0,0.039052,0.027545,...,-0.082762,-0.140808,-0.135719,-0.033567,-0.059509,0.094112,0.158883,0.067455,-0.084189,-0.093172
"(chroma_cens, kurtosis, 01)",0.003738,0.034901,0.007723,-0.005957,-0.017885,-0.02693,-0.034641,0.039052,1.0,0.333799,...,-0.182373,-0.15199,-0.1366,-0.008573,-0.028872,0.033587,0.045052,-0.002953,-0.026875,-0.009319
"(chroma_cens, kurtosis, 02)",-0.008584,0.0917,-0.020359,0.009134,-0.043836,-0.030199,-0.025401,0.027545,0.333799,1.0,...,-0.196685,-0.160046,-0.144811,-0.009148,-0.030996,0.058468,0.068913,0.014279,-0.042154,-0.003265


In [41]:
with pd.option_context('display.max_rows', None):
    ipd.display(
        pd.concat([
            pd.get_dummies(df[('track','genre_top')]),
            echonest
        ],axis=1
        ).corr()
    )

Unnamed: 0,Electronic,Experimental,Folk,Hip-Hop,Instrumental,International,Pop,Rock,"(echonest, audio_features, acousticness)","(echonest, audio_features, danceability)",...,"(echonest, temporal_features, 214)","(echonest, temporal_features, 215)","(echonest, temporal_features, 216)","(echonest, temporal_features, 217)","(echonest, temporal_features, 218)","(echonest, temporal_features, 219)","(echonest, temporal_features, 220)","(echonest, temporal_features, 221)","(echonest, temporal_features, 222)","(echonest, temporal_features, 223)"
Electronic,1.0,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.225507,0.137061,...,-0.006453,0.002966,-0.175331,-0.196304,-0.025652,0.078236,0.035387,0.035205,0.113689,0.128479
Experimental,-0.142857,1.0,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,0.019177,0.035525,...,-0.011338,0.001471,0.013598,0.000524,-0.00612,0.030043,-0.01363,-0.013701,-7.6e-05,-0.018392
Folk,-0.142857,-0.142857,1.0,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,0.412454,-0.294444,...,0.143783,-0.089686,0.346892,0.34165,0.097325,-0.064308,0.120607,0.120759,0.009762,0.001036
Hip-Hop,-0.142857,-0.142857,-0.142857,1.0,-0.142857,-0.142857,-0.142857,-0.142857,-0.293513,0.357631,...,0.003066,0.017897,-0.312689,-0.268216,-0.107717,-0.024614,-0.184397,-0.184343,-0.111896,-0.058958
Instrumental,-0.142857,-0.142857,-0.142857,-0.142857,1.0,-0.142857,-0.142857,-0.142857,0.131912,0.01679,...,0.013197,-0.033719,0.040975,0.016365,0.080407,-0.026513,0.06245,0.062513,0.034697,0.017525
International,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,1.0,-0.142857,-0.142857,0.247187,-0.030724,...,0.08696,-0.031157,-0.171193,-0.133596,-0.074335,-0.055506,-0.07936,-0.079232,0.013903,0.031147
Pop,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,1.0,-0.142857,-0.081477,0.128643,...,0.013245,-0.058284,0.031667,0.023093,0.025709,-0.010662,0.013622,0.013647,-0.005854,-0.039851
Rock,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,-0.142857,1.0,-0.136422,-0.27577,...,-0.231937,0.17066,0.154738,0.130003,0.015133,0.084767,0.043909,0.043712,-0.012742,-0.035434
"(echonest, audio_features, acousticness)",-0.225507,0.019177,0.412454,-0.293513,0.131912,0.247187,-0.081477,-0.136422,1.0,-0.257567,...,0.211417,-0.155337,0.285862,0.274421,0.133761,-0.009861,0.073904,0.073928,-0.062697,-0.084717
"(echonest, audio_features, danceability)",0.137061,0.035525,-0.294444,0.357631,0.01679,-0.030724,0.128643,-0.27577,-0.257567,1.0,...,0.288992,-0.236777,-0.490027,-0.369749,-0.190202,0.021947,-0.152127,-0.152181,0.125653,0.154411


## Summary

I don't have a clue how to interpret any of this!

Most of that is weak correlation.  But you can see different patterns.

I need to do more exploration to come to a better understanding of what this data represents.

Nonetheless, this data seems to be at a point I could push it into Postgresql and deal with it.

I will bump this down in priority but continue with it.