# <center> MUSIC INFORMATION RETRIEVAL</center>
## <center>Simple Feature Extraction</center>      

In [None]:
%matplotlib inline

from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import librosa
from sklearn.preprocessing import MinMaxScaler
import IPython.display as ipd

### About this notebook

We will explore feature extraction for sound classification using a simple example. For sound classification, features are computed directly from audio. Ideally, good features clearly separate your classes. However, multiple instances from the same class will still have a variety (e.g. play different pitches, at different loudness, an recorded in different conditions). We need to find features that are invariant to the characteristics that do not help distinguish between classes. For example, if we want to classify instruments by timbre, we will want features that distinguish sounds by their timbre and not their pitch.

**Note**: *this notebook is loosely based on a similar notebook available in [musicinformationretrieval.com](https://musicinformationretrieval.com/)*

### How to run the notebook
You can download the notebook and run it locally in your computer.

You can also run it in Google Colab by using the following link.

<table align="center">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/mrocamora/mir_workshop/blob/main/notebooks/MIR_workshop_simple_feature_extraction.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

### Install required packages

You should install the following packages by running the corresponding cells.

In [None]:
!pip install librosa

If you experience issues with librosa because of numpy, try the following.

In [None]:
#!pip uninstall numpy

In [None]:
#!pip uninstall numpy==1.20.0

### Download and load audio files

Let's download some audio files to work with. They are kick drum and snare drum samples, and each audio file contains one drum hit.

In [None]:
!wget https://iie.fing.edu.uy/~rocamora/downloads/tmp/drum_samples.zip
!unzip drum_samples.zip

Now we load the audio signals of each class into two different lists:

In [None]:
kick_signals = [
    librosa.load(str(p))[0] for p in Path().glob('drum_samples/train/kick_*.mp3')
]
snare_signals = [
    librosa.load(str(p))[0] for p in Path().glob('drum_samples/train/snare_*.mp3')
]

Let's check the number of files of each class.

In [None]:
n_kick = len(kick_signals)
print(n_kick)

In [None]:
n_snare = len(snare_signals)
print(n_snare)

We now display some of the kick drum signals:

In [None]:
plt.figure(figsize=(15, 6))
for i, x in enumerate(kick_signals[:10]):
    plt.subplot(2, 5, i+1)
    librosa.display.waveshow(x[:10000])
    plt.ylim(-1, 1)

And some of the  snare drum signals:

In [None]:
plt.figure(figsize=(15, 6))
for i, x in enumerate(snare_signals[:10]):
    plt.subplot(2, 5, i+1)
    librosa.display.waveshow(x[:10000])
    plt.ylim(-1, 1)

We can listen to some of the audio signals. Recall that `librosa` resamples audio to `fs=22050` by default.

In [None]:
ipd.display(ipd.Audio(snare_signals[2], rate=22050))

In [None]:
ipd.display(ipd.Audio(kick_signals[10], rate=22050))

### Part 1. Computing features

The following is a simple function that constructs a two-dimensional feature vector from a signal.

Add the code needed to compute some relevant features to be able to distinguish between the two classes.

You should get only **one number for each feature**, so try to combine short-time feature values in a proper way (**temporal integration**).

In [None]:
def extract_features(signal):
    return [
        # compute feature 1,
        # YOUR CODE HERE


        # compute feature 2,
        # YOUR CODE HERE

    ]

If we want to aggregate all of the feature vectors among signals in a collection, we can use a list comprehension as follows:

In [None]:
kick_features = np.array([extract_features(x) for x in kick_signals])
snare_features = np.array([extract_features(x) for x in snare_signals])

### Part 2. Visualizing features

In order to visualize the differences in the features values we can plot separate histograms for each of the classes:

In [None]:
plt.figure(figsize=(14, 5))
plt.hist(kick_features[:,0], color='b', alpha=0.5, bins=20)
plt.hist(snare_features[:,0], color='r', alpha=0.5, bins=20)
plt.legend(('kicks', 'snares'))
plt.xlabel('Feature 1')
plt.ylabel('Count')

In [None]:
plt.figure(figsize=(14, 5))
plt.hist(kick_features[:,1], color='b', bins=30, alpha=0.6)
plt.hist(snare_features[:,1], color='r', bins=30, alpha=0.6)
plt.legend(('kicks', 'snares'))
plt.xlabel('Feature 2')
plt.ylabel('Count')

### Part 3. Feature scaling

 The two features we computed before are expressed using different units, so they can take very different values. This discrepancy can pose problems when performing classification later. Therefore, we will normalize each feature vector to a common range and store the normalization parameters for later use.  

Many techniques exist for scaling your features. For now, we'll use [`sklearn.preprocessing.MinMaxScaler`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html).  `MinMaxScaler` returns an array of scaled values such that each feature dimension is in the range -1 to 1.

Let's concatenate all of our feature vectors into one *feature table*:

In [None]:
feature_table = np.vstack((kick_features, snare_features))
print(feature_table.shape)

You can check the minimum and maximim values of the features before scaling.

In [None]:
print(feature_table.min(axis=0))
print(feature_table.max(axis=0))

Scale each feature dimension to be in the range -1 to 1:

In [None]:
# YOUR CODE HERE

# scaled_features =



Now check that the features were actually scaled.

In [None]:
print(scaled_features.min(axis=0))
print(scaled_features.max(axis=0))

Plot the scaled features:

In [None]:
plt.scatter(scaled_features[:n_kick,0], scaled_features[:n_kick,1], c='b')
plt.scatter(scaled_features[n_kick:,0], scaled_features[n_kick:,1], c='r')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

### Part 4. Analysis

Compute different features, plot their values, and try to answer the following:

1.   How can you do temporal integration?
2.   Which features are most useful for discriminating between the classes?
3.   Is it useful to combine features?


#### YOUR RESPONSE HERE

Decide on the two most useful features for discriminating between classes and try to justify the rationale behind it.


#### YOUR RESPONSE HERE