# AMPLab assignment: text-based audio classification 
Original notebook wirtten by Frederic Font, extended by Philip Tovstogan

## PART A - Prepare and explore your dataset

To run the cells in this notebook you'll need to install the following Python dependencies:
 * `numpy`
 * `sklearn`

If you have not installed them you should be able to do so by running: `pip install numpy sklearn`.

Furthermore, you'll need to get **Freesound API** credentials so that you can retreive sounds' metadata for building your dataset. Please create a Freesound account (if you don't already have one) and go here: [http://www.freesound.org/apiv2/apply/](http://www.freesound.org/apiv2/apply/). Set `API_KEY` variable below to the given API key.

*All comments by that are specific to experiment will be italicized*

In [1]:
from __future__ import print_function
import freesound
import random
import utils
import json
from IPython.core.display import display, HTML
from collections import defaultdict

API_KEY='4d04f7cdc18cce80b1fbeeec17f99eaf5c3842fd'
c = freesound.FreesoundClient()
c.set_token(API_KEY,"token")
PAGE_SIZE = 50 # Page size for fs requests, no need to change that

### 1) Select your audio categories and get metadata from Freesound for each category

For the task in this session we provide a pre-selected number of audio categories with a number of Freesound sound IDs linked to them. The first step is to choose which classes to include in the dataset that we'll use later for classification and to get metadata for each sound in our dataset.

 * Chose the categories you want to work with by editing the `DATASET_CLASSES` list below. 
 * You can limit the number of sounds chosen per class by setting the `N` parameter. 
 * The resulting dataset will be saved to a file named `DATASET_NAME.json`. This file will be loaded later in another  notebook to carry out the classification task.
 
*The goal of my experiment is to compare performace of classifier that distinguishes bowed and plucked string instruments with and without audio features. Audio features that potentially are relevant to this distinction: **spectral centroid** (should be higher for plucked) and **HFC** (should be larger for plucked because of transients)*

In [2]:
# Configure dataset parameters and audio categories

DATASET_NAME = 'bowed_vs_plucked' # Dataset will be saved in a .json file with this name
N = None # Number of sounds per class (set 'None' to get all in dataset)
DATASET_CLASSES = [
    #'Percussion', 
    #'Brass instrument', 
    #'Wind instrument, woodwind instrument', 
    'Bowed string instrument', 
    'Plucked string instrument', 
    #'Singing', 
    #'Speech', 
    #'Domestic sounds, home sounds', 
    #'Vehicle', 
    #'Generic impact sounds', 
    #'Mechanisms'
]

# Get sound examples from Freesound
data_index =  json.load(open('data_index.json'))
dataset = defaultdict(list)

for klass in DATASET_CLASSES:
    sound_ids = data_index.get(klass, None)
    if sound_ids is None:
        print('Skipping class %s as no data is available for it' % klass)
        continue
    random.shuffle(sound_ids)  # Shuffle order of sound ids
    sound_ids = sound_ids[:N]  # Limit number of selected sound ids
    
    print('Getting sounds\' data for class %s...' % klass)
    for i in range(0, len(sound_ids), PAGE_SIZE):
        current_sound_ids = sound_ids[i:i + PAGE_SIZE]
    
        results_pager = c.text_search(
            filter='id:(%s)' % ' OR '.join([str(sid) for sid in current_sound_ids]),
            page_size=PAGE_SIZE,
            fields = "id,tags,description,username,analysis",
            descriptors = "lowlevel.spectral_centroid.mean,lowlevel.hfc.mean"
        )
        dataset[klass] += results_pager.results
        
        # TIP ON AUDIO FEATURES: you can get also audio features extracted in freesound by passing a 'descriptors' 
        # parameter in the text_search function and including 'analysis' in the fields list 
        # (see http://www.freesound.org/docs/api/resources_apiv2.html#response-sound-list):
        #
        # fields = "id,tags,description,username,analysis"
        # descriptors = "lowlevel.spectral_centroid,lowlevel.barkbands.mean"
        #
        # e.g.: results_page = c.text_search(query=target_query, ..., fields=fields, descriptors=descriptors)
        # ...

# Show information and save dataset to file so we can work with it later on
utils.save_to_json('%s.json' % DATASET_NAME, dataset)
print('\nDataset created with %i classes:' % len(dataset))
for klass, sounds in dataset.items():
    print('\t%s: %i sounds' % (klass, len(sounds)))
print('Saved to %s' % '%s.json' % DATASET_NAME)

Getting sounds' data for class Bowed string instrument...
Getting sounds' data for class Plucked string instrument...

Dataset created with 2 classes:
	Plucked string instrument: 200 sounds
	Bowed string instrument: 200 sounds
Saved to bowed_vs_plucked.json


### 2) Explore the dataset (know your data!)

The cell below displays some data about your recently created dataset so you can have an idea of its contents. Feel free to experiment with the data and add any relevant plots that you might find useful for the future classificaiton task.

In [3]:
# Pick some sounds from each category and show players (Freesound embeds) to listen to them
for count, (class_name, sounds) in enumerate(dataset.items()):
    html = "<h3 style='color:#bbb;'>%i) %s</h3>" % (count + 1, class_name)
    html += "<h4>Example sounds:</h4><br>"
    html += utils.generate_html_with_sound_examples([sound['id'] for sound in sounds][:6])
    html += "<h4>Most commons tags tagcloud:</h4><br>"
    class_tags = utils.get_all_tags_from_class(class_name, dataset)
    html += utils.generate_html_tagcloud(class_tags, N=100, max_px=30, min_px=10, pow_scale=1.2)
    html += "<br><br>"
    display(HTML(html)) # <- This is pure jupyter notebook AWESOMENESS which renders the HTML in the output of the cell

*Here we can see that our data is not clean enough, there are some pizzicatto samples in bowed class. This limits the performance of our classifier, but because we are focusing on relative improvement it is fine for this experiment.*