# AUDIO CLASSIFICATION BASED ON 'MODALITY' USING CHROMA FEATURES


#### This series of notebooks demonstrate a tutorial for classifying audio files based on the tonal concept of 'modality in music'. The concept of 'modality' exists in various tonal music traditions around the world. In this tutorial, we apply supervised learning based on chroma features.

There are 2 main steps for the audio classification: Feature Extraction and Classification. 
---
#### This notebook shows the steps of Chroma Feature Based Automatic Makam Recognition.


#### PARAMETERS:
 - numBins(int) : Number of bins in the Chroma Vectors. This is parametrized in consideration of possible microtonalities existing in non-Western music traditions (12,24,36,48, ...)
 - modality(str) : Name of the modality type specific for the music tradition.
 
 
         Classical Western Music / Jazz : Mode, tonality, scale
 
         Classical Turkish Music : Makam
 
         Classical North Indian Music : Rag, Raga
 
         Classical Arab-Andalusian Music : Tab
 

IMPORTANT : All the audio files should be located in the directory specifically according to their modality type.

Note that the functions that are used in this tutorial can be found in ######(TODO) that is provided in the repository.


## STEP 1 : GATHERING DATA / DOWNLOADING THE AUDIO FILES

### Input : 
#### JSON file :
which contains the mbids of audio files, the tonic information, the modality category (e.g. makam, mode, rag, tab, etc. ) In the case of multiple modalities within one song, the start and end times of different sections needs to be specified.

#### Directory :
Output directory for the dataset to be downloaded


In [23]:
from compmusic import dunya
from modalityUtils.utilities import downloadDataset

#token will be shared upon request
dunya.set_token('___yourTOKENhere___') 

### Please set the directory for the dataset to be downloaded and the name of the annotations file (JSON).
dataDir = 'data/'
annotationsFile = 'annotations.json'
modality = 'makam'

In [None]:
downloadDataset(annotationsFile,dataDir)

## STEP 2 : FEATURE EXTRACTION

### Input : 
 - JSON file :
    which contains the mbids of audio files, the tonic information, the modality category (e.g. makam, mode, rag, tab, etc. ) In the case of multiple modalities within one song, the start and end times of different sections needs to be specified.

 - numBins(int) : Number of bins in the Chroma Vectors. This is parametrized in consideration of possible microtonalities existing in non-Western music traditions (12,24,36,48, ...)
 - musicTradition (str) : Name of the modality type specific for the music tradition.
 
 
         Classical Western Music / Jazz : Mode, tonality, scale
 
         Classical Turkish Music : Makam
 
         Classical North Indian Music : Rag, Raga
 
         Classical Arab-Andalusian Music : Tab
 

### Output : 
#### PICKLE file :
which has a list of dictionaries where each dictionary contains the Frame-based features, the global statistical features and the ground truth information (tonic, modality type) of each of the audio files.

#### CSV file : 
which has the global statistical features of audio files and the ground truth modality type. This CSV file is generated in the proper format for further Machine Learning / Automatic Classification steps. Use this CSV file as the input for the second notebook.


IMPORTANT : All the audio files should be located in the directory specifically according to their modality type.


Note that the functions that are used in this tutorial can be found in modalityUtils/utilities.py, which is provided in the repository.

In [4]:
!python3 ChromaFeatureExtraction.py -h

usage: ChromaFeatureExtraction.py [-h] -t TRADITION -n NUMBEROFBINS -o
                                  OUTPUT_DIRECTORY

A tool for Chroma (HPCP) Feature Extraction using Essentia library.

optional arguments:
  -h, --help            show this help message and exit
  -t TRADITION, --tradition TRADITION
                        Input music tradition to perform the modality
                        classification task
  -n NUMBEROFBINS, --numberofBins NUMBEROFBINS
                        Input number of bins per octave in chroma vectors
  -o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
                        Output directory for the pickle file that contains the
                        dataset with the extracted features


In [21]:
!python3 ChromaFeatureExtraction.py -t TurkishClassicalMusic -n 12 -o outDir

Analysis on TurkishClassicalMusic Tradition.

Number of bins per octave in the Chroma Vectors : 12
Modality categories in the dataset : 

{'Kurdilihicazkar', 'Mahur', 'Muhayyer', 'Sultaniyegah', 'Suzinak', 'Ussak', 'Huseyni', 'Beyati', 'Rast', 'Karcigar', 'Nihavent', 'Hicazkar', 'Bestenigar', 'Acemkurdi', 'Hicaz', 'Neva', 'Acemasiran', 'Saba', 'Segah', 'Huzzam'} 

Number of Categories in the dataset
20
Feature Extraction in Process. This might take a while...
extracting Features for modality :  Acemasiran
extracting Features for modality :  Acemkurdi
extracting Features for modality :  Bestenigar
^C
Traceback (most recent call last):
  File "ChromaFeatureExtraction.py", line 63, in <module>
    main(MusicTradition, numBins, outputDir)
  File "ChromaFeatureExtraction.py", line 57, in main
    dataslist = mu.FeatureExtraction(outputDir,dataDir,dataList,modality)
  File "/home/ds/notebooks/Supervised_Modality_Classification/modalityUtils.py", line 205, in FeatureExtraction
    computeHPCP

In [43]:
!python3 DataFormatting.py -h

usage: DataFormatting.py [-h] -i INPUT_FILENAME -f FEATURES_SET -r REGION -c
                         COMBINED

A tool for converting the feature data into proper format for the machine
learning pipeline

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILENAME, --input_filename INPUT_FILENAME
                        Name of the input pickle file which is the output file
                        of ChromaFeatureExtraction.py
  -f FEATURES_SET, --features_set FEATURES_SET
                        Set of features to include in the Feature Data.
                        options = {"mean","std","all"}
  -r REGION, --region REGION
                        Input region of audios (from the beginning) for
                        extracting local features. values = [0,1] (region = 1
                        is equivalent of using the Global Features)
  -c COMBINED, --combined COMBINED
                        Input number of bins per octave in

## STEP 3 :  MACHINE LEARNING & AUTOMATIC AUDIO CLASSIFICATION

### Input : 
#### PICKLE file : 
which is the output of the first notebook in the series (Feature Extraction).


### Output :
#### CSV file : 
The output CSV file that contains the FeatureData (X) and the class labels (Y), in appropriate format for the Machine Learning Pipeline.

#### Parameters:


 - numBins(int) : number of bins in the HPCPvectors ([12,24,36,48])
 
 
 - region(float) : select the first 'x*100'% of the audio files for analysis. The classification is performed using the features that are extracted locally only from specified region of the songs.
 

 - combined (boolean) : if combined = 1, classification is performed using the combination of features that are extracted both locally and globally. if combined = 0, only locally obtained features are used for classification. The local region is specified by partSong. Default = 1.
 

In [52]:
!python3 DataFormatting.py -i 'extractedFeatures_formakamtradition(48bins).pkl' -r 0.3 -c False

computing Local Features for the first0.3region of the audio files is COMPLETE 

Generating CSV file for the features meanLocal+stdGlobal is COMPLETE
Generating CSV file for the features stdLocal+meanGlobal is COMPLETE
Generating CSV file for the features meanLocal+meanGlobal is COMPLETE
Generating CSV file for the features stdLocal+stdGlobal is COMPLETE


In [17]:
!python3  Classification.py -h

usage: Classification.py [-h] -i INPUT_FILENAME -c COMBINED -r REGION

A tool for Chroma (HPCP) Feature Extraction using Essentia
library.(CLASSIFICATION)

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILENAME, --input_filename INPUT_FILENAME
                        Name of the input pickle file which is the output file
                        of ChromaFeatureExtraction.py
  -c COMBINED, --combined COMBINED
                        Input number of bins per octave in chroma vectors SET
                        combined = 1 FOR PERFORMING CLASSIFICATION USING THE
                        COMBINATION OF LOCAL AND GLOBAL FEATURES; combined = 0
                        FOR USING ONLY THE LOCAL FEATURES (except the case
                        region = 1, which ALREADY corresponds to global
                        features)
  -r REGION, --region REGION
                        Input region of audios (from the beginning) for
            

In [62]:
###CHOOSE THE CSV FILE WITH DESIRED FEATURE SET, TO INPUT FOR THE MACHINE LEARNING PIPELINE
import os
os.listdir(dataDir)

['Huzzam',
 'annotations.json',
 'DataCSVforstage_48bins_meanLocal+stdGlobal.csv',
 'DataCSVforstage_48bins_meanLocal+meanGlobal.csv',
 'extractedFeatures_formakamtradition(48bins).pkl',
 'scoresstdLocal+stdGlobal_48.txt',
 'scoresmeanLocal+stdGlobal_48.txt',
 'DataCSVforstage_48bins_stdLocal+meanGlobal.csv',
 'DataCSVforstage_48bins_stdLocal+stdGlobal.csv',
 'scoresstdLocal+meanGlobal_48.txt']

In [65]:
!python3 Classification.py -i 'DataCSVforstage_48bins_stdLocal+stdGlobal.csv' -m makam -r 0.3

This process might take a while (5-10 min) 
 CROSS-VALIDATION & TRAINING 
{'Ussak', 'Rast', 'Nihavent', 'Beyati', 'Karcigar', 'Suzinak', 'Neva', 'Huseyni', 'Saba', 'Acemasiran', 'Hicaz', 'Kurdilihicazkar', 'Muhayyer', 'Acemkurdi', 'Sultaniyegah', 'Mahur', 'Hicazkar', 'Huzzam', 'Bestenigar', 'Segah'}
Accuracy score for the Feature Set stdLocal+stdGlobal : 
F-measure (mean,std) --- FINAL
0.76 0.0348500070129
Accuracy (mean,std) FINAL
0.77 0.0366196668472
Confusion matrix, without normalization
