notebooks: wording and import ordering

mdeff · Apr 25, 2017 · b262902 · b262902
1 parent 4ee18fc
commit b262902
Show file tree

Hide file tree

Showing 6 changed files with 79 additions and 111 deletions.
diff --git a/analysis.ipynb b/analysis.ipynb
@@ -25,14 +25,15 @@
     "\n",
     "import IPython.display as ipd\n",
     "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
     "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
     "import seaborn as sns\n",
-    "import utils\n",
-    "\n",
     "from sklearn.preprocessing import MultiLabelBinarizer\n",
     "\n",
-    "sns.set_context(\"notebook\", font_scale=1.5)"
+    "import utils\n",
+    "\n",
+    "sns.set_context(\"notebook\", font_scale=1.5)\n",
+    "plt.rcParams['figure.figsize'] = (17, 5)"
    ]
   },
   {
@@ -639,8 +640,8 @@
    "source": [
     "### 4.1 Genre hierarchy\n",
     "\n",
-    "1. As genres have parent genres, we can plot a tree using the [DOT] language.\n",
-    "2. Save the full genre tree as a PDF.\n",
+    "* As genres have parent genres, we can plot a tree using the [DOT] language.\n",
+    "* Save the full genre tree as a PDF.\n",
     "\n",
     "Todo:\n",
     "* Color nodes according to FMA genre color.\n",
@@ -733,14 +734,16 @@
    "source": [
     "## 5 Audio\n",
     "\n",
-    "e.g. audio features (echonest / librosa, spectrograms) to show diversity"
+    "Todo: e.g. audio features (echonest / librosa, spectrograms) to show diversity."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 6 Features"
+    "## 6 Features\n",
+    "\n",
+    "Todo: understand features by listening to segments who have them, e.g. <http://musicinformationretrieval.com/feature_sonification.html>."
    ]
   },
   {

diff --git a/baselines.ipynb b/baselines.ipynb
@@ -10,17 +10,8 @@
     "\n",
     "## Baselines\n",
     "\n",
-    "We explore three types of baselines:\n",
-    "1. simple algorithms,\n",
-    "2. state-of-the-art in genre recognition,\n",
-    "3. deep Learning approaches,\n",
-    "\n",
-    "using different input features:\n",
-    "1. raw audio,\n",
-    "2. echonest features,\n",
-    "3. audio features from librosa or [kapre](https://github.com/keunwoochoi/kapre).\n",
-    "\n",
-    "We aim at showing that given sufficient data, DL approaches can outperfom all the others without domain-specific / expert knowledge."
+    "* This notebook evalutates standard classifiers from scikit-learn on the provided features.\n",
+    "* Moreover, it evaluates Deep Learning models on both audio and spectrograms."
    ]
   },
   {
@@ -29,21 +20,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%load_ext autoreload\n",
-    "%autoreload 2\n",
-    "%matplotlib inline\n",
+    "import time\n",
+    "import os\n",
     "\n",
-    "import utils\n",
+    "import IPython.display as ipd\n",
     "from tqdm import tqdm_notebook\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
     "import keras\n",
     "from keras.layers import Activation, Dense, Conv1D, Conv2D, MaxPooling1D, Flatten, Reshape\n",
-    "import matplotlib.pyplot as plt\n",
-    "import pandas as pd\n",
-    "import numpy as np\n",
-    "import IPython.display as ipd\n",
-    "import time\n",
-    "import os\n",
-    "import ast\n",
     "\n",
     "from sklearn.utils import shuffle\n",
     "from sklearn.preprocessing import MultiLabelBinarizer, LabelEncoder, LabelBinarizer, StandardScaler\n",
@@ -57,7 +42,9 @@
     "from sklearn.neural_network import MLPClassifier\n",
     "from sklearn.naive_bayes import GaussianNB\n",
     "from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis\n",
-    "from sklearn.multiclass import OneVsRestClassifier"
+    "from sklearn.multiclass import OneVsRestClassifier\n",
+    "\n",
+    "import utils"
    ]
   },
   {
@@ -73,6 +60,7 @@
     "echonest = utils.load('echonest.csv')\n",
     "\n",
     "np.testing.assert_array_equal(features.index, tracks.index)\n",
+    "assert echonest.index.isin(tracks.index).all()\n",
     "\n",
     "tracks.shape, features.shape, echonest.shape"
    ]
@@ -182,15 +170,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1.2 Single genre\n",
-    "\n",
-    "`fma_small`:\n",
-    "* <36% for echonest_audio.\n",
-    "* <15% for echonest_social.\n",
-    "* <46% for echonset_temporal.\n",
-    "* <43% for mfcc.\n",
-    "* <44% for all except echonest.\n",
-    "* <45% for best non-echonest combination"
+    "### 1.2 Single genre"
    ]
   },
   {
@@ -276,14 +256,8 @@
    "source": [
     "### 1.3 Multiple genres\n",
     "\n",
-    "Maximum observed on `fma_small` (was 7.6% on `fma_medium`).\n",
-    "* <15% for echonest_audio.\n",
-    "* <22% for echonset_temporal.\n",
-    "* <17% for mfcc.\n",
-    "* <20% for best non-echonest combination\n",
-    "\n",
     "Todo:\n",
-    "* Eliminate rare genres. On small only the 10 selected genres are meaningful."
+    "* Ignore rare genres? Count them higher up in the genre tree? On the other hand it's not much tracks."
    ]
   },
   {
@@ -317,7 +291,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2 Deep learning on raw audio"
+    "## 2 Deep learning on raw audio\n",
+    "\n",
+    "Other architectures:\n",
+    "* [Learning Features of Music from Scratch (MusicNet)](https://arxiv.org/abs/1611.09827), John Thickstun, Zaid Harchaoui, Sham Kakade."
    ]
   },
   {
@@ -336,7 +313,7 @@
    "source": [
     "Load audio samples in parallel using `multiprocessing` so as to maximize CPU usage when decoding MP3s and making some optional pre-processing. There are multiple ways to load a waveform from a compressed MP3:\n",
     "* librosa uses audioread in the backend which can use many native libraries, e.g. ffmpeg\n",
-    "    * resampling is very slow\n",
+    "    * resampling is very slow --> use `kaiser_fast`\n",
     "    * does not work with multi-processing, for keras `fit_generator()`\n",
     "* pydub is a high-level interface for audio modification, uses ffmpeg to load\n",
     "    * store a temporary `.wav`\n",
@@ -376,22 +353,7 @@
     "\n",
     "* Two layers with 10 hiddens is no better than random, ~11%.\n",
     "\n",
-    "Optimize data loading to be CPU / GPU bound, not IO bound. Larger batches means reduced training time, so increase batch time until memory exhaustion. Number of workers and queue size have no influence on speed.\n",
-    "\n",
-    "CPU\n",
-    "* batch 4, worker 8, queue 1, 600s\n",
-    "* batch 20, worker 24, queue 5, 190s\n",
-    "* batch 20, worker 12, queue 10, 185s\n",
-    "* batch 40, worker 12, queue 10, 135s\n",
-    "* batch 64, worker 12, queue 10, 110s\n",
-    "* batch 128, worker 12, queue 10, 100s\n",
-    "\n",
-    "GPU Tesla K40c\n",
-    "* batch 4, worker 12, queue 10, 250s\n",
-    "* batch 16, worker 12, queue 10, 100s\n",
-    "* batch 32, worker 12, queue 10, 90s\n",
-    "* batch 64, worker 12, queue 10, 70s\n",
-    "* batch 96-128 --> memory error"
+    "Optimize data loading to be CPU / GPU bound, not IO bound. Larger batches means reduced training time, so increase batch time until memory exhaustion. Number of workers and queue size have no influence on speed."
    ]
   },
   {
@@ -431,7 +393,7 @@
    "source": [
     "### 2.2 Convolutional neural network\n",
     "\n",
-    "* Architecture from [End-to-end learning for music audio](http://www.mirlab.org/conference_papers/International_Conference/ICASSP%202014/papers/p7014-dieleman.pdf) by Sander Dieleman, Benjamin Schrauwen.\n",
+    "* Architecture: [End-to-end learning for music audio](http://www.mirlab.org/conference_papers/International_Conference/ICASSP%202014/papers/p7014-dieleman.pdf), Sander Dieleman, Benjamin Schrauwen.\n",
     "* Missing: track segmentation and class averaging (majority voting)\n",
     "* Compared with log-scaled mel-spectrograms instead of strided convolution as first layer.\n",
     "* Larger net: http://benanne.github.io/2014/08/05/spotify-cnns.html"
@@ -502,7 +464,7 @@
    "source": [
     "## 3 Deep learning on extracted audio features\n",
     "\n",
-    "Todo:\n",
+    "Look at:\n",
     "* Pre-processing in Keras: https://github.com/keunwoochoi/kapre\n",
     "* Convolutional Recurrent Neural Networks for Music Classification: https://github.com/keunwoochoi/icassp_2017\n",
     "* Music Auto-Tagger: https://github.com/keunwoochoi/music-auto_tagging-keras\n",
@@ -515,7 +477,7 @@
    "source": [
     "### 3.1 ConvNet on MFCC\n",
     "\n",
-    "* Architecture from [Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network](http://www.iaeng.org/publication/IMECS2010/IMECS2010_pp546-550.pdf) by Tom LH. Li, Antoni B. Chan and Andy HW. Chun\n",
+    "* Architecture: [Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network](http://www.iaeng.org/publication/IMECS2010/IMECS2010_pp546-550.pdf), Tom LH. Li, Antoni B. Chan and Andy HW. Chun\n",
     "* Missing: track segmentation and majority voting.\n",
     "* Best seen: 17.6%"
    ]

diff --git a/creation.ipynb b/creation.ipynb
@@ -8,14 +8,15 @@
     "\n",
     "Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.\n",
     "\n",
-    "## Generation / Collection / Creation\n",
+    "## Creation\n",
     "\n",
     "From `raw_*.csv`, this notebook generates:\n",
     "* `tracks.csv`: per-track / album / artist metadata.\n",
     "* `genres.csv`: genre hierarchy.\n",
+    "* `echonest.csv`: cleaned Echonest features.\n",
     "\n",
     "A companion script, [creation.py](creation.py):\n",
-    "1. Query the API and store metadata in `raw_tracks.csv`, `raw_albums.csv`, `raw_artists.csv` and `raw_genres.csv`.\n",
+    "1. Query the [API](https://freemusicarchive.org/api) and store metadata in `raw_tracks.csv`, `raw_albums.csv`, `raw_artists.csv` and `raw_genres.csv`.\n",
     "2. Download the audio for each track.\n",
     "3. Trim the audio to 30s clips.\n",
     "4. Normalize the permissions and modification / access times.\n",
@@ -28,15 +29,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%load_ext autoreload\n",
-    "%autoreload 2\n",
-    "\n",
     "import os\n",
     "import ast\n",
     "import pickle\n",
+    "\n",
+    "import IPython.display as ipd\n",
     "import numpy as np\n",
     "import pandas as pd\n",
-    "import IPython.display as ipd\n",
+    "\n",
     "import utils\n",
     "import creation"
    ]
@@ -60,16 +60,16 @@
     "## 1 Retrieve metadata and audio from FMA\n",
     "\n",
     "1. Crawl the tracks, albums and artists metadata through their [API](https://freemusicarchive.org/api).\n",
-    "2. Download original `.mp3` by HTTPS for each track id (only if it does not exist already).\n",
+    "2. Download original `.mp3` by HTTPS for each track id (only if we don't have it already).\n",
     "\n",
     "Todo:\n",
     "* Scrap curators.\n",
     "* Download images (`track_image_file`, `album_image_file`, `artist_image_file`). Beware the quality.\n",
     "* Verify checksum for some random tracks.\n",
     "\n",
-    "Examples:\n",
+    "Dataset update:\n",
     "* To add new tracks: iterate from largest known track id to the most recent only.\n",
-    "* To update user data: get them all again."
+    "* To update user data: we need to get all tracks again."
    ]
   },
   {
@@ -160,7 +160,13 @@
     "\n",
     "Todo:\n",
     "* Sanitize values, e.g. list of words for tags, valid links in `artist_wikipedia_page`, remove html markup in free-form text.\n",
-    "* Fill metadata about encoding: length, number of samples, sample rate, bit rate, channels (mono/stereo), 16bits?."
+    "    * Clean tags. E.g. some tags are just artist names.\n",
+    "* Fill metadata about encoding: length, number of samples, sample rate, bit rate, channels (mono/stereo), 16bits?.\n",
+    "* Update duration from audio\n",
+    "    * 2624 is marked as 05:05:50 (18350s) although it is reported as 00:21:15.15 by ffmpeg.\n",
+    "    * 112067: 3714s --> 01:59:55.06, 112808: 3718s --> 01:59:59.56\n",
+    "    * ffmpeg: Estimating duration from bitrate, this may be inaccurate\n",
+    "    * Solution, decode the complete mp3: `ffmpeg -i input.mp3 -f null -`"
    ]
   },
   {
@@ -449,15 +455,7 @@
    "source": [
     "## 3 Data cleaning\n",
     "\n",
-    "Todo\n",
-    "* Duplicates (metadata and audio)\n",
-    "* Kill some (top-level) genres ? Like Easy Listening for medium.\n",
-    "* Update duration from audio\n",
-    "    * 2624 is marked as 05:05:50 (18350s) although it is reported as 00:21:15.15 by ffmpeg.\n",
-    "    * 112067: 3714s --> 01:59:55.06, 112808: 3718s --> 01:59:59.56\n",
-    "    * ffmpeg: Estimating duration from bitrate, this may be inaccurate\n",
-    "    * Solution, decode the complete mp3: `ffmpeg -i input.mp3 -f null -`\n",
-    "* Clean tags. E.g. some tags are just artist names."
+    "Todo: duplicates (metadata and audio)"
    ]
   },
   {
@@ -594,7 +592,7 @@
     "# --> listed as child of Sound Effects on website\n",
     "genres.at[763, 'parent'] = 16\n",
     "\n",
-    "# Todo: should novely be under Experimental? It is alone on website."
+    "# Todo: should novelty be under Experimental? It is alone on website."
    ]
   },
   {
@@ -1026,10 +1024,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 8 Description\n",
-    "\n",
-    "Todo:\n",
-    "* verify all dtypes"
+    "## 8 Description"
    ]
   },
   {