# Making Sense of Data

Dear Diary, <br/>
It is Saturday, September 21. I am sitting in the Bean🥫.<br /><br/>
The purpose of this notebook is to make sense of the data contained in the [UCI FMA Music Analysis Dataset](https://archive.ics.uci.edu/ml/datasets/FMA:+A+Dataset+For+Music+Analysis): **genres, and tracks**. <br/>For genres, we are interested in exploring the **colors** associated with each sub-genre and the **hierarchy structure** organizing the 164 genres. For tracks, we are interested in mapping tracks to genres to find the genres with the most songs to use for our initial model. We also want to explore associated track metadata, such as **year**.

<hr />

# Genres
The `raw_genres.csv` file was small enough that it was easier to analyze the data in Google Sheets. Sorry to betray the CS community by using layman's tools.

The file had 164 rows with the following columns:

   | genre_id | genre_color | genre_handle | genre_parent_id | genre_title |
   | :-: | :-: | :-: | :-: | :-: |
   | 46	| #CC3300 | Latin_America| 2 | Latin America |
   | ... | ... | ... | ... | ... |

### Comments:
- Parent genres did not have `parent_id`s.
- The rows were in a haphazard order; they were not sorted numerically by `genre_id`/`genre_parent_id` nor alphabetically by `genre_handle`/`genre_title`.
- I did not consider `genre_color`, but if it was sorted by color, that's not useful to me.

### In Google Sheets, I did the following:
1. Sorted rows by `parent_id` to get a sense of which genres had the most breadth (the most sub-genres).
2. This moved all the parent rows to the bottom, and I pulled them out to the side.
3. I created two new columns for the parent sub-table, `num sub_genres`.
4. I counted all instances of each sub genre and added it to the parent table.

<hr />

### Results:

|  Top Genres (sub-genres) | Graph |
| :- | :-: |
| <ol><li>International (15)</li><li>Rock (15)</li><li>Electronic (14)</li><li>Experimental (14)</li><li>Spoken (8) </li></ol> | <img src="images/sub_genre_pie_uci_fma.png" /> |

<!--
| genre_id | genre_color | genre_handle | genre_parent_id | genre_title | num sub-genres |
| :-: | :-: | :-: | :-: | :-: |	:-: |
| 2	|#CC3300|	International |	|	International	|15|
|3|	#000099	|Blues	|	|Blues	|1|
|4|	#990099	|Jazz	|	|Jazz	|6|
|5|	#8A8A65	|Classical|	|	Classical|	7|
|8|	#665666	|Old-Time__Historic	|	|Old-Time / Historic|	0|
|9|	#663366	|Country	|	|Country	|4|
|10| #009900|	Pop	|	|Pop	|2|
|12	|#840000|	Rock	| |	Rock|	15|
|14	|#330033|	Soul-RB	|	| Soul-RnB|	2|
|15	|#FF6600|	Electronic|	|	Electronic	|14|
|17	|#5E6D3F|	Folk	|	|Folk|	5|
|20	|#006699|	Spoken	| |	Spoken|	8|
|21	|#CC0000|	Hip-Hop	| 	|Hip-Hop|	7|
|38	|#dddd00|	Experimental|	|	Experimental|	14|
|1235|	#000000|	Instrumental|	|	Instrumental	|3|
-->    

## Tracks
The file containing track data is too big to assess in Google Sheets (wah). Let's do some pandas parsing activities:

In [10]:
import numpy as np
import pandas as pd

# change filepath
tracks = pd.read_csv("/Users/mkarroqe/Desktop/github/dancing-screen/fma_metadata/raw_tracks.csv")
print(tracks)

        track_id  album_id                                  album_title  \
0              2       1.0                         AWOL - A Way Of Life   
1              3       1.0                         AWOL - A Way Of Life   
2              5       1.0                         AWOL - A Way Of Life   
3             10       6.0                            Constant Hitmaker   
4             20       4.0                                        Niris   
5             26       4.0                                        Niris   
6             30       4.0                                        Niris   
7             46       4.0                                        Niris   
8             48       4.0                                        Niris   
9            134       1.0                         AWOL - A Way Of Life   
10           135      58.0                                          mp3   
11           136      58.0                                          mp3   
12           137      59.