In [1]:
import pandas as pd

## Read data

FMA repository is [here](https://github.com/mdeff/fma)

- Download the metadata file [here](https://os.unil.cloud.switch.ch/fma/fma_metadata.zip) and unzip to `data-FMA` folder
- `tracks.csv` contains album, name, genre, and tags
- only a portion have complete tag information

## Read FMA tracks subset

In [3]:
filePath = "data-processed/tracks-tags.csv"
df = pd.read_csv(filePath, header=[0,1], index_col=0)
df.head()

Unnamed: 0_level_0,artist,album,track,track,track,track
Unnamed: 0_level_1,name,title,title,genres,genres_all,tags
track_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
137,Airway,Live at LACE,Side A,"[1, 32]","[32, 1, 38]",['lafms']
138,Airway,Live at LACE,Side B,"[1, 32]","[32, 1, 38]",['lafms']
850,Human Host,Exploding Demon,Tomb Of Science,[12],[12],['baltimore']
851,Human Host,Exploding Demon,Six Realms,[12],[12],['baltimore']
852,Human Host,Exploding Demon,Escape From the Organ Chamber,[12],[12],['baltimore']


## Note: this is a multiindex dataframe

Some of examples of how to select columns:

In [10]:
df.columns # See what the column names are

MultiIndex([('artist',       'name'),
            ( 'album',      'title'),
            ( 'track',      'title'),
            ( 'track',     'genres'),
            ( 'track', 'genres_all'),
            ( 'track',       'tags')],
           )

The column names are [tuples](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences) (like lists that cannot be modified).

To access a column:

In [11]:
df[("album", "title")]

track_id
137                    Live at LACE
138                    Live at LACE
850                 Exploding Demon
851                 Exploding Demon
852                 Exploding Demon
                    ...            
155269                     Volatile
155275                     Dog Wave
155276                     Dog Wave
155277                     Dog Wave
155320    What I Tell Myself Vol. 2
Name: (album, title), Length: 23496, dtype: object

To access multiple columns:

In [12]:
df[[("album", "title"), ("track", "tags")]]

Unnamed: 0_level_0,album,track
Unnamed: 0_level_1,title,tags
track_id,Unnamed: 1_level_2,Unnamed: 2_level_2
137,Live at LACE,['lafms']
138,Live at LACE,['lafms']
850,Exploding Demon,['baltimore']
851,Exploding Demon,['baltimore']
852,Exploding Demon,['baltimore']
...,...,...
155269,Volatile,"['dark ambient', 'dark', 'ambient', 'noise', '..."
155275,Dog Wave,"['noise', 'stretching is magic', 'free music',..."
155276,Dog Wave,"['noise', 'stretching is magic', 'free music',..."
155277,Dog Wave,"['noise', 'stretching is magic', 'free music',..."


## TODO: replace or map genre codes with names

ref: pandas [replace](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html#pandas.DataFrame.replace) or [map](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.map.html#pandas.DataFrame.map)

## TODO: parse the cells in the track-tags column

ref: python [strings](https://docs.python.org/3/library/string.html#module-string) and the [re](https://docs.python.org/3/library/re.html#module-re) package

- practice re [here](https://regex101.com)