# Reference

In [6]:
import pandas as pd
import seaborn as sns

In [3]:
filePath = "data-processed/tracks-tags.csv"
df = pd.read_csv(filePath, header=[0,1], index_col=0)
df.head()

Unnamed: 0_level_0,artist,album,track,track,track,track
Unnamed: 0_level_1,name,title,title,genres,genres_all,tags
track_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
137,Airway,Live at LACE,Side A,"[1, 32]","[32, 1, 38]",['lafms']
138,Airway,Live at LACE,Side B,"[1, 32]","[32, 1, 38]",['lafms']
850,Human Host,Exploding Demon,Tomb Of Science,[12],[12],['baltimore']
851,Human Host,Exploding Demon,Six Realms,[12],[12],['baltimore']
852,Human Host,Exploding Demon,Escape From the Organ Chamber,[12],[12],['baltimore']


## Note: this is a multiindex dataframe

Some of examples of how to select columns:

In [4]:
df.columns # See what the column names are

MultiIndex([('artist',       'name'),
            ( 'album',      'title'),
            ( 'track',      'title'),
            ( 'track',     'genres'),
            ( 'track', 'genres_all'),
            ( 'track',       'tags')],
           )

The column names are [tuples](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences) (like lists that cannot be modified).

To access a column:

In [5]:
df[("album", "title")]

track_id
137                    Live at LACE
138                    Live at LACE
850                 Exploding Demon
851                 Exploding Demon
852                 Exploding Demon
                    ...            
155269                     Volatile
155275                     Dog Wave
155276                     Dog Wave
155277                     Dog Wave
155320    What I Tell Myself Vol. 2
Name: (album, title), Length: 23496, dtype: object

To access multiple columns:

In [6]:
df[[("album", "title"), ("track", "tags")]]

Unnamed: 0_level_0,album,track
Unnamed: 0_level_1,title,tags
track_id,Unnamed: 1_level_2,Unnamed: 2_level_2
137,Live at LACE,['lafms']
138,Live at LACE,['lafms']
850,Exploding Demon,['baltimore']
851,Exploding Demon,['baltimore']
852,Exploding Demon,['baltimore']
...,...,...
155269,Volatile,"['dark ambient', 'dark', 'ambient', 'noise', '..."
155275,Dog Wave,"['noise', 'stretching is magic', 'free music',..."
155276,Dog Wave,"['noise', 'stretching is magic', 'free music',..."
155277,Dog Wave,"['noise', 'stretching is magic', 'free music',..."


In [7]:
df[("track", "genres")] # pandas series

track_id
137              [1, 32]
138              [1, 32]
850                 [12]
851                 [12]
852                 [12]
               ...      
155269    [42, 107, 183]
155275      [15, 32, 38]
155276      [15, 32, 38]
155277      [15, 32, 38]
155320     [10, 12, 169]
Name: (track, genres), Length: 23496, dtype: object

In [8]:
df[[("track", "genres")]] # pandas dataframe

Unnamed: 0_level_0,track
Unnamed: 0_level_1,genres
track_id,Unnamed: 1_level_2
137,"[1, 32]"
138,"[1, 32]"
850,[12]
851,[12]
852,[12]
...,...
155269,"[42, 107, 183]"
155275,"[15, 32, 38]"
155276,"[15, 32, 38]"
155277,"[15, 32, 38]"


### Filter rows

We can create a 'mask' to filter out rows. For example, look at `track genres` for anything containing the word 'noise':

In [8]:
mask = df2[('track', 'genres')].str.contains("Noise")
mask

track_id
137      NaN
138      NaN
850      NaN
851      NaN
852      NaN
          ..
155269   NaN
155275   NaN
155276   NaN
155277   NaN
155320   NaN
Name: (track, genres), Length: 23496, dtype: float64

There results are `true` or `false`, this can be used to select rows:

In [9]:
df2[mask].head()

KeyError: '[nan nan nan ... nan nan nan] not in index'

This return only tracks with "Noise" in the `tags genres` column.

This can also be coded all at once (for example, just get "Rock" genres):

In [10]:
df2[df2[('track', 'genres')].str.contains("Rock")].head()

KeyError: '[nan nan nan ... nan nan nan] not in index'

This can also be applied to the `track tags` column:

In [11]:
df2[df2[('track', 'tags')].str.contains("mellow")].head()

KeyError: '[nan nan nan ... nan nan nan] not in index'