Skip to content

Commit

Permalink
notebooks: wording and import ordering
Browse files Browse the repository at this point in the history
  • Loading branch information
mdeff committed Apr 25, 2017
1 parent 4ee18fc commit b262902
Show file tree
Hide file tree
Showing 6 changed files with 79 additions and 111 deletions.
19 changes: 11 additions & 8 deletions analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,15 @@
"\n",
"import IPython.display as ipd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import utils\n",
"\n",
"from sklearn.preprocessing import MultiLabelBinarizer\n",
"\n",
"sns.set_context(\"notebook\", font_scale=1.5)"
"import utils\n",
"\n",
"sns.set_context(\"notebook\", font_scale=1.5)\n",
"plt.rcParams['figure.figsize'] = (17, 5)"
]
},
{
Expand Down Expand Up @@ -639,8 +640,8 @@
"source": [
"### 4.1 Genre hierarchy\n",
"\n",
"1. As genres have parent genres, we can plot a tree using the [DOT] language.\n",
"2. Save the full genre tree as a PDF.\n",
"* As genres have parent genres, we can plot a tree using the [DOT] language.\n",
"* Save the full genre tree as a PDF.\n",
"\n",
"Todo:\n",
"* Color nodes according to FMA genre color.\n",
Expand Down Expand Up @@ -733,14 +734,16 @@
"source": [
"## 5 Audio\n",
"\n",
"e.g. audio features (echonest / librosa, spectrograms) to show diversity"
"Todo: e.g. audio features (echonest / librosa, spectrograms) to show diversity."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6 Features"
"## 6 Features\n",
"\n",
"Todo: understand features by listening to segments who have them, e.g. <http://musicinformationretrieval.com/feature_sonification.html>."
]
},
{
Expand Down
82 changes: 22 additions & 60 deletions baselines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,8 @@
"\n",
"## Baselines\n",
"\n",
"We explore three types of baselines:\n",
"1. simple algorithms,\n",
"2. state-of-the-art in genre recognition,\n",
"3. deep Learning approaches,\n",
"\n",
"using different input features:\n",
"1. raw audio,\n",
"2. echonest features,\n",
"3. audio features from librosa or [kapre](https://github.com/keunwoochoi/kapre).\n",
"\n",
"We aim at showing that given sufficient data, DL approaches can outperfom all the others without domain-specific / expert knowledge."
"* This notebook evalutates standard classifiers from scikit-learn on the provided features.\n",
"* Moreover, it evaluates Deep Learning models on both audio and spectrograms."
]
},
{
Expand All @@ -29,21 +20,15 @@
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
"%matplotlib inline\n",
"import time\n",
"import os\n",
"\n",
"import utils\n",
"import IPython.display as ipd\n",
"from tqdm import tqdm_notebook\n",
"import numpy as np\n",
"import pandas as pd\n",
"import keras\n",
"from keras.layers import Activation, Dense, Conv1D, Conv2D, MaxPooling1D, Flatten, Reshape\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np\n",
"import IPython.display as ipd\n",
"import time\n",
"import os\n",
"import ast\n",
"\n",
"from sklearn.utils import shuffle\n",
"from sklearn.preprocessing import MultiLabelBinarizer, LabelEncoder, LabelBinarizer, StandardScaler\n",
Expand All @@ -57,7 +42,9 @@
"from sklearn.neural_network import MLPClassifier\n",
"from sklearn.naive_bayes import GaussianNB\n",
"from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis\n",
"from sklearn.multiclass import OneVsRestClassifier"
"from sklearn.multiclass import OneVsRestClassifier\n",
"\n",
"import utils"
]
},
{
Expand All @@ -73,6 +60,7 @@
"echonest = utils.load('echonest.csv')\n",
"\n",
"np.testing.assert_array_equal(features.index, tracks.index)\n",
"assert echonest.index.isin(tracks.index).all()\n",
"\n",
"tracks.shape, features.shape, echonest.shape"
]
Expand Down Expand Up @@ -182,15 +170,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.2 Single genre\n",
"\n",
"`fma_small`:\n",
"* <36% for echonest_audio.\n",
"* <15% for echonest_social.\n",
"* <46% for echonset_temporal.\n",
"* <43% for mfcc.\n",
"* <44% for all except echonest.\n",
"* <45% for best non-echonest combination"
"### 1.2 Single genre"
]
},
{
Expand Down Expand Up @@ -276,14 +256,8 @@
"source": [
"### 1.3 Multiple genres\n",
"\n",
"Maximum observed on `fma_small` (was 7.6% on `fma_medium`).\n",
"* <15% for echonest_audio.\n",
"* <22% for echonset_temporal.\n",
"* <17% for mfcc.\n",
"* <20% for best non-echonest combination\n",
"\n",
"Todo:\n",
"* Eliminate rare genres. On small only the 10 selected genres are meaningful."
"* Ignore rare genres? Count them higher up in the genre tree? On the other hand it's not much tracks."
]
},
{
Expand Down Expand Up @@ -317,7 +291,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2 Deep learning on raw audio"
"## 2 Deep learning on raw audio\n",
"\n",
"Other architectures:\n",
"* [Learning Features of Music from Scratch (MusicNet)](https://arxiv.org/abs/1611.09827), John Thickstun, Zaid Harchaoui, Sham Kakade."
]
},
{
Expand All @@ -336,7 +313,7 @@
"source": [
"Load audio samples in parallel using `multiprocessing` so as to maximize CPU usage when decoding MP3s and making some optional pre-processing. There are multiple ways to load a waveform from a compressed MP3:\n",
"* librosa uses audioread in the backend which can use many native libraries, e.g. ffmpeg\n",
" * resampling is very slow\n",
" * resampling is very slow --> use `kaiser_fast`\n",
" * does not work with multi-processing, for keras `fit_generator()`\n",
"* pydub is a high-level interface for audio modification, uses ffmpeg to load\n",
" * store a temporary `.wav`\n",
Expand Down Expand Up @@ -376,22 +353,7 @@
"\n",
"* Two layers with 10 hiddens is no better than random, ~11%.\n",
"\n",
"Optimize data loading to be CPU / GPU bound, not IO bound. Larger batches means reduced training time, so increase batch time until memory exhaustion. Number of workers and queue size have no influence on speed.\n",
"\n",
"CPU\n",
"* batch 4, worker 8, queue 1, 600s\n",
"* batch 20, worker 24, queue 5, 190s\n",
"* batch 20, worker 12, queue 10, 185s\n",
"* batch 40, worker 12, queue 10, 135s\n",
"* batch 64, worker 12, queue 10, 110s\n",
"* batch 128, worker 12, queue 10, 100s\n",
"\n",
"GPU Tesla K40c\n",
"* batch 4, worker 12, queue 10, 250s\n",
"* batch 16, worker 12, queue 10, 100s\n",
"* batch 32, worker 12, queue 10, 90s\n",
"* batch 64, worker 12, queue 10, 70s\n",
"* batch 96-128 --> memory error"
"Optimize data loading to be CPU / GPU bound, not IO bound. Larger batches means reduced training time, so increase batch time until memory exhaustion. Number of workers and queue size have no influence on speed."
]
},
{
Expand Down Expand Up @@ -431,7 +393,7 @@
"source": [
"### 2.2 Convolutional neural network\n",
"\n",
"* Architecture from [End-to-end learning for music audio](http://www.mirlab.org/conference_papers/International_Conference/ICASSP%202014/papers/p7014-dieleman.pdf) by Sander Dieleman, Benjamin Schrauwen.\n",
"* Architecture: [End-to-end learning for music audio](http://www.mirlab.org/conference_papers/International_Conference/ICASSP%202014/papers/p7014-dieleman.pdf), Sander Dieleman, Benjamin Schrauwen.\n",
"* Missing: track segmentation and class averaging (majority voting)\n",
"* Compared with log-scaled mel-spectrograms instead of strided convolution as first layer.\n",
"* Larger net: http://benanne.github.io/2014/08/05/spotify-cnns.html"
Expand Down Expand Up @@ -502,7 +464,7 @@
"source": [
"## 3 Deep learning on extracted audio features\n",
"\n",
"Todo:\n",
"Look at:\n",
"* Pre-processing in Keras: https://github.com/keunwoochoi/kapre\n",
"* Convolutional Recurrent Neural Networks for Music Classification: https://github.com/keunwoochoi/icassp_2017\n",
"* Music Auto-Tagger: https://github.com/keunwoochoi/music-auto_tagging-keras\n",
Expand All @@ -515,7 +477,7 @@
"source": [
"### 3.1 ConvNet on MFCC\n",
"\n",
"* Architecture from [Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network](http://www.iaeng.org/publication/IMECS2010/IMECS2010_pp546-550.pdf) by Tom LH. Li, Antoni B. Chan and Andy HW. Chun\n",
"* Architecture: [Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network](http://www.iaeng.org/publication/IMECS2010/IMECS2010_pp546-550.pdf), Tom LH. Li, Antoni B. Chan and Andy HW. Chun\n",
"* Missing: track segmentation and majority voting.\n",
"* Best seen: 17.6%"
]
Expand Down
43 changes: 19 additions & 24 deletions creation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,15 @@
"\n",
"Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.\n",
"\n",
"## Generation / Collection / Creation\n",
"## Creation\n",
"\n",
"From `raw_*.csv`, this notebook generates:\n",
"* `tracks.csv`: per-track / album / artist metadata.\n",
"* `genres.csv`: genre hierarchy.\n",
"* `echonest.csv`: cleaned Echonest features.\n",
"\n",
"A companion script, [creation.py](creation.py):\n",
"1. Query the API and store metadata in `raw_tracks.csv`, `raw_albums.csv`, `raw_artists.csv` and `raw_genres.csv`.\n",
"1. Query the [API](https://freemusicarchive.org/api) and store metadata in `raw_tracks.csv`, `raw_albums.csv`, `raw_artists.csv` and `raw_genres.csv`.\n",
"2. Download the audio for each track.\n",
"3. Trim the audio to 30s clips.\n",
"4. Normalize the permissions and modification / access times.\n",
Expand All @@ -28,15 +29,14 @@
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
"\n",
"import os\n",
"import ast\n",
"import pickle\n",
"\n",
"import IPython.display as ipd\n",
"import numpy as np\n",
"import pandas as pd\n",
"import IPython.display as ipd\n",
"\n",
"import utils\n",
"import creation"
]
Expand All @@ -60,16 +60,16 @@
"## 1 Retrieve metadata and audio from FMA\n",
"\n",
"1. Crawl the tracks, albums and artists metadata through their [API](https://freemusicarchive.org/api).\n",
"2. Download original `.mp3` by HTTPS for each track id (only if it does not exist already).\n",
"2. Download original `.mp3` by HTTPS for each track id (only if we don't have it already).\n",
"\n",
"Todo:\n",
"* Scrap curators.\n",
"* Download images (`track_image_file`, `album_image_file`, `artist_image_file`). Beware the quality.\n",
"* Verify checksum for some random tracks.\n",
"\n",
"Examples:\n",
"Dataset update:\n",
"* To add new tracks: iterate from largest known track id to the most recent only.\n",
"* To update user data: get them all again."
"* To update user data: we need to get all tracks again."
]
},
{
Expand Down Expand Up @@ -160,7 +160,13 @@
"\n",
"Todo:\n",
"* Sanitize values, e.g. list of words for tags, valid links in `artist_wikipedia_page`, remove html markup in free-form text.\n",
"* Fill metadata about encoding: length, number of samples, sample rate, bit rate, channels (mono/stereo), 16bits?."
" * Clean tags. E.g. some tags are just artist names.\n",
"* Fill metadata about encoding: length, number of samples, sample rate, bit rate, channels (mono/stereo), 16bits?.\n",
"* Update duration from audio\n",
" * 2624 is marked as 05:05:50 (18350s) although it is reported as 00:21:15.15 by ffmpeg.\n",
" * 112067: 3714s --> 01:59:55.06, 112808: 3718s --> 01:59:59.56\n",
" * ffmpeg: Estimating duration from bitrate, this may be inaccurate\n",
" * Solution, decode the complete mp3: `ffmpeg -i input.mp3 -f null -`"
]
},
{
Expand Down Expand Up @@ -449,15 +455,7 @@
"source": [
"## 3 Data cleaning\n",
"\n",
"Todo\n",
"* Duplicates (metadata and audio)\n",
"* Kill some (top-level) genres ? Like Easy Listening for medium.\n",
"* Update duration from audio\n",
" * 2624 is marked as 05:05:50 (18350s) although it is reported as 00:21:15.15 by ffmpeg.\n",
" * 112067: 3714s --> 01:59:55.06, 112808: 3718s --> 01:59:59.56\n",
" * ffmpeg: Estimating duration from bitrate, this may be inaccurate\n",
" * Solution, decode the complete mp3: `ffmpeg -i input.mp3 -f null -`\n",
"* Clean tags. E.g. some tags are just artist names."
"Todo: duplicates (metadata and audio)"
]
},
{
Expand Down Expand Up @@ -594,7 +592,7 @@
"# --> listed as child of Sound Effects on website\n",
"genres.at[763, 'parent'] = 16\n",
"\n",
"# Todo: should novely be under Experimental? It is alone on website."
"# Todo: should novelty be under Experimental? It is alone on website."
]
},
{
Expand Down Expand Up @@ -1026,10 +1024,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8 Description\n",
"\n",
"Todo:\n",
"* verify all dtypes"
"## 8 Description"
]
},
{
Expand Down
Loading

0 comments on commit b262902

Please sign in to comment.