# PHONOS
## Analyze source collection, plot and cluster

This notebook includes the code to analyze the collection of sounds compiled in the previous notebook and that will be later used as the source collection in our audio mosaicing code. The notebook also contains the code to analyze the target audio file that will be later reconstructed using sound chunks from the source collection.

The audio analysis carried out in this notebook uses the Pythonn bindings of the Essentia library which was introduced in the first session of AMPLAB. Please make sure you checked the [Essentia Python tutorial](https://essentia.upf.edu/documentation/essentia_python_tutorial.html) to get familiarized with using Essentia in Python. Also useful is to always have a browser tab opened with Essentia's [Algorithms Reference](https://essentia.upf.edu/documentation/algorithms_reference.html) documentation page.

# AVOID REPEATING EXPENSIVE CODE:

In [54]:
generate_track_json = False
generate_track_analysis = False
generate_codebook = False # this is the most expensive process, like +30min
generate_track_encoding = True
generate_next_previous_frames = True
generate_by_word_json = True
generate_frame_by_id_json = True

In [2]:
!pip3 install librosa

[33mYou are using pip version 8.1.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [3]:
import os 
from os import listdir
from os.path import isfile, join
import codecs, json 
import pandas as pd
import essentia
import essentia.standard as estd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display, Audio

### build sound collection

In [4]:
DATASET_LOCAL_DIR = './phonos_dataset_local/' # notebook can't use absolute paths...
DATASET_ABSOLUTE_DIR = '/Users/lluissuros/Documents/Datasets/phonos_dataset/' # ... but SC needs them
DATAFRAME_CSV_TRACKS = './files/tracks.csv'
JSON_TRACKS_TO_PATH = './files/tracks_to_path.json'


def is_sound(file):
    return file.lower().endswith(('.wav', '.aiff'))

def make_pandas_record(sound_name, file_path = DATASET_LOCAL_DIR): 
    """Create a dictionary with the metadata that we want to store for each sound.""" 
    record = {}
#    record = {key: sound_name.as_dict()[key] for key in METADATA_FIELDS}
    record['track_id'] = sound_name # name will be id
    record['path'] = file_path + sound_name
    record['absolute_path'] = DATASET_ABSOLUTE_DIR + sound_name
    return record

def build_sound_collection(sound_files_path, csv_filename):
    # Make a Pandas DataFrame with the metadata of our sound collection and save it
    tracks = [f for f in listdir(sound_files_path) if isfile(join(sound_files_path, f)) and is_sound(join(sound_files_path, f))]
    df =  pd.DataFrame([make_pandas_record(tr, DATASET_LOCAL_DIR) for tr in tracks])
    df.sort_values('track_id', inplace=True) #alphabetically

    df.to_csv(csv_filename)
    print('Saved DataFrame with {0} entries! {1}'.format(len(df), csv_filename))    
    return df, tracks


Save collection to json:

In [5]:
def create_tracks_dict(tracks_df, labels):
    print('creating tracks dictionary ...')
    track_ids = tracks_df.loc[:,'track_id'].values
    tracks = tracks_df.set_index('track_id')
    tracks_to_path = {}
    for track_id in track_ids:
        tracks_to_path[track_id] = tracks.loc[track_id, 'absolute_path']
        

    #df_clean = tracks_df.loc[:, labels] # only desired labels
    #tracks_to_path = df_clean.to_dict('index')
    
    print(len(tracks_to_path.keys()), '... by_frame_id entries created!')
    return tracks_to_path


def save_dict_to_json(dictionary, filename = './files/no_name.json'):
    print('saving dict ...')
    with open(filename, 'w') as fp:
        json.dump(dictionary, fp, sort_keys=True, indent=4) #pretty json
    print('... dict saved as', filename)
    
    
def load_json_dict(filename = './files/no_name.json'):
    with open(filename) as json_file:  
        data = json.load(json_file)
    return data
    
    
if(generate_track_json):
    df_tracks, _ = build_sound_collection(DATASET_LOCAL_DIR, DATAFRAME_CSV_TRACKS)
    #df_tracks.head(5)
    tracks_to_path = create_tracks_dict(df_tracks, ['absolute_path', 'track_id'])
    save_dict_to_json(tracks_to_path, JSON_TRACKS_TO_PATH)

    
load_json_dict(JSON_TRACKS_TO_PATH)

{'733.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/733.wav',
 '222217.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/222217.wav',
 '97737.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/97737.wav',
 '185124.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/185124.wav',
 '186347.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/186347.wav',
 '358550.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/358550.wav',
 '144449.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/144449.wav',
 '92951.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/92951.wav',
 '201609.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/201609.wav',
 '180894.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/180894.wav',
 '37849.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/37849.wav',
 '357067.wav': '/Users/lluissuros/Documents/Datasets/phonos_dataset/357067.wav',
 '332179.wav': '/Users/lluissuros/Docume

## analyse sounds:
# TODO: I would like to analysie PITCH, see freesound exercise

In [6]:
# Define here our sound analysis function
# NOTE: remember that if you update this function and want to run a new analysis you'll need to re-run both 
# this cell and the cells below that carry out the audio analysis and that call the analysis function. 
# After analyzing the source collection or the target file, make sure to correct descriptors have been 
# extracted by checking the DataFrame contents. DataFrame contents can be printed on screen as a table 
# using 'display(data_frame_object)'


#TODO: at the moment there is not hop-size!!!
def analyze_sound(audio_path, absolute_path='missing', frame_size=None, audio_id=None):
    """Analyze the audio file given in 'sound_path'.
    Use the parameter 'frame_size' to set the size of the chunks in which the audio will 
    be split for analysis. If no frame_size is given, the whole audio will be analyzed as 
    a single frame.
    Use the 'audio_id' parameter to pass a custom identifier for the audio that will be 
    included in the analysis results. This can be useful to later identify to which file an analysis belongs.
    """
    analysis_output = []  # Here we'll store the analysis results for each chunk (frame) of the audio file
    
    # Load audio file
    #loader = estd.MonoLoader(filename=audio_path)
    loader = estd.EqloudLoader(filename=audio_path) #normalizes gain 

    audio = loader()
    
    # Some processing of frame_size parameter to avoid later problems
    if frame_size is None:
        frame_size = len(audio)  # If no frame_size is given use no frames (analyze all audio at once)
    if frame_size % 2 != 0:
        frame_size = frame_size + 1 # Make frame size even
    
    # Calculate the start and end samples for each equally-spaced audio frame
    frame_start_samples = range(0, len(audio), frame_size)
    frame_start_end_samples = zip(frame_start_samples[:-1], frame_start_samples[1:])

    # extract key and scale
    key_algo = estd.KeyExtractor()
    key, scale, key_strength = key_algo(audio)
    
    # Loudness extractor
    loudness_algo = estd.Loudness()
    
    # MFCC coefficients extractor
    w_algo = estd.Windowing(type = 'hann')
    spectrum_algo = estd.Spectrum()
    mfcc_algo = estd.MFCC()
    
    
    # Iterate over audio frames and analyze each one
    for count, (fstart, fend) in enumerate(frame_start_end_samples):
        
        # Get corresponding audio chunk and initialize dictionary to store analysis results with some basic metadata
        frame = audio[fstart:fend]
        frame_output = {
            'track_id': audio_id,
            'frame_id': '{0}_f{1}'.format(audio_id, count),
            'path': audio_path,
            'absolute_path': absolute_path,
            'start_sample': fstart,
            'end_sample': fend,
        }
        
        # Extract loudness
        loudness = loudness_algo(frame)
        frame_output['loudness'] = loudness / len(frame)  # Normnalize by length of frame

        # Extract MFCC coefficients
        spec = spectrum_algo(w_algo(frame))
        _, mfcc_coeffs = mfcc_algo(spec)
        frame_output.update({'mfcc_{0}'.format(j): mfcc_coeffs[j] for j in range(0, len(mfcc_coeffs))})
        
        # Other tonal features
        key, scale, key_strength = key_algo(frame)
        frame_output.update({'scale': scale,
                             'key_strength': key_strength})

        
        # Add frame analysis results to output
        analysis_output.append(frame_output)

    return analysis_output
    

## Analyze source collection

In [7]:
#DATAFRAME_FILENAME = 'dataframe.csv'  # DataFrame file of the sound source collection to analyze
#DATAFRAME_SOURCE_FILENAME = 'dataframe_source.csv'  # DataFrame file where to store the results of our analysis
#FRAME_SIZE = 8820

#TODO: at the moment there is not hop-size!!! 
def analyze_source_collection(dataframe_source, dataframe_results, frame_size=8820):
    #print('frame size ', frame_size )
    # Load the DataFrame of the sound source collection created in previous notebook and analyze all sound files in it
    df = pd.read_csv(open(dataframe_source), index_col=0)
    analyses = []
    for i in range(0, len(df)):
        sound = df.iloc[i]  # Get DataFrame sound at position 'i'
        try:
            #print('Analyzing sound with id {0} [{1}/{2}]'.format(sound['track_id'], i + 1, len(df)))
            analysis_output = analyze_sound(sound['path'], sound['absolute_path'], frame_size=frame_size, audio_id=sound['track_id'])  # Split audio in chunks of 200ms (44100/5 samples)
            analyses += analysis_output
        except:
            print('ERROR on sound with id {0} [{1}/{2}]'.format(sound['track_id'], i + 1, len(df)))


    # Store analysis results in a new Pandas DataFrame and save it
    df_source = pd.DataFrame(analyses)
    df_source.to_csv(dataframe_results)
    print('Saved source DataFrame with {0} entries! {1}'.format(len(df_source), dataframe_results))

    display(df_source)  # Show DataFrane contents
    df_source.describe()  # Show some statistics of numerical fields in the DataFrame
    return df


#analyze_source_collection(DATAFRAME_FILENAME, DATAFRAME_SOURCE_FILENAME)



## Analyze sources files and the target sound file

In [8]:
FRAME_SIZE_LOW_LEVEL = 2048
FRAME_SIZE_TONAL = 4096 #TODO how to deal with different frame sizes when bulking on the same dataframe

# DataFrame file where to store the results of our analysis
DATAFRAME_CSV_ANALYSIS = './files/tracks_analysis.csv'  


if(generate_track_analysis):
    analyze_source_collection(
        DATAFRAME_CSV_TRACKS,
        DATAFRAME_CSV_ANALYSIS, 
        FRAME_SIZE_TONAL)


df = pd.read_csv(DATAFRAME_CSV_ANALYSIS) 


# DOUBT: MAybe I need to scale mfccs before clustering ?

## Plot audio and features

In [9]:
#utilities for pandas

#retrieve freesound_id for a given audio file path
def get_fs_id_from_path(df, sound_path):
    return df.loc[df['path'] == sound_path].iloc[0]['freesound_id']

def feature_values_by_path(df, sound_path, feature):
    return df.loc[df['path'] == sound_path][feature].values


#TODO do it with external library if possible?
def scale_list(mylist):
    '''normalize against the maximum'''
    maximum = max([abs(min(mylist)), abs(max(mylist))])
    return [float(i)/maximum for i in mylist]




## PLOT clusters and centroids with k-means
#### TODO Is is better to PCA after or before clustering?

In [10]:
#https://musicinformationretrieval.com/kmeans.html
from sklearn.cluster import KMeans
from itertools import cycle
from matplotlib import colors


'''
TODO reuse commented code above to PCA

PCA_components = finalDf[['principal component 1', 'principal component 2']].values
print(PCA_components.shape)
features_2D = PCA_components # like this wecan change PCA for any other thing
'''

df = pd.read_csv(DATAFRAME_CSV_ANALYSIS) 

if(generate_codebook):
    N_WORDS = 128 #256 was killing kernel...why? 

    features = [ 'mfcc_0', 'mfcc_1', 'mfcc_2' , 'mfcc_3', 'mfcc_4', 'mfcc_5', 'mfcc_6', 'mfcc_7', 'mfcc_8', 'mfcc_9', 'mfcc_10', 'mfcc_11', 'mfcc_12']
    features_df = df.loc[:, features].values # Separating out the features

    print("creating codebook of ", N_WORDS, " words, might some minutes, specially with high number of words")
    print("centroids (words) will have", len(features) , " dimensions")


    #time to cluster! with k-means
    k_means = KMeans(n_clusters=N_WORDS)
    k_means.fit_predict(features_df)
    labels = k_means.labels_ 
    centroids = k_means.cluster_centers_ #take the cluster center

    print("codebook finished! ")


    print(labels)
    print(centroids)




creating codebook of  128  words, might some minutes, specially with high number of words
centroids (words) will have 13  dimensions
codebook finished! 
[ 60  60 112 ...,  59  59  33]
[[-1065.72536196    47.98095815     5.46425294 ...,    -4.81822202
     -5.40275586    -6.43064385]
 [ -761.7011111    100.47332777   -57.42371326 ...,    -5.05069564
     -6.15523563    -4.87300673]
 [ -869.46906111   182.79412705    10.61827434 ...,    -7.02568045
     -4.98587384    -4.86468704]
 ..., 
 [ -815.54322154   141.24434964   -31.65335425 ...,    -6.735424
     -5.72144171    -3.77270581]
 [ -870.19160674   160.82134536    -6.25295976 ...,   -13.34323455
    -13.48633166   -13.14074138]
 [ -806.82436389   161.07850118   -62.57183722 ...,   -12.18214716
    -10.95974422   -10.81136733]]


### Save the codebook
Like this, we dont need to recompute all the time if we don't want to 
#### NOTE: Remember to convert it to numpy arrays when loading

In [11]:
# Save Codebook
if(generate_codebook):
    CODEBOOK_FILENAME = './files/codebook.json'
    codebook_to_save = {'centroids': centroids.tolist(), 'labels': labels.tolist()}

    save_dict_to_json(codebook_to_save, CODEBOOK_FILENAME)

    print('\n codebook is saved as ', CODEBOOK_FILENAME)



saving dict ...
... dict saved as ./files/codebook.json

 codebook is saved as  ./files/codebook.json


Load Codebook (very imporatnt if we don't want to recompute): 

In [13]:
codebook_saved_dict = load_json_dict(CODEBOOK_FILENAME)

centroids = np.array(codebook_saved_dict['centroids'])
labels = np.array(codebook_saved_dict['labels'])

print('\n codebook is loaded from disc', CODEBOOK_FILENAME)


 codebook is loaded from disc ./files/codebook.json


## Encode tracks with the obtained CODEBOOK, and compute histograms
Will create two dictionaries by_id, one with the encoded frames, and other with the histogram of this encode frames.
An histogram is a graphical representation of the value distribution.


### We are gettting the 3 nearest centroids at the moment:
The 1-nn is the codeword for that frame!!
We are also appending the new info to the analysis dataframe ans saving it.


## TODO: If interested in histograms, should we use 1-nn?


In [15]:
from sklearn.neighbors import NearestNeighbors


def create_histograms(encoded_tracks_by_id, codebook):
    '''returns a dictionary with id and the corresponding histogram of encoded frames'''
    bins = create_bins_for_histogram(codebook)
    histograms_by_id = {} 
    for track_id in encoded_tracks_by_id.keys():
        histogram, _ = np.histogram(encoded_tracks_by_id[track_id], bins)
        histograms_by_id[track_id] = histogram
    return histograms_by_id


def create_bins_for_histogram(codebook):
    '''
    https://stackoverflow.com/questions/30112420/histogram-for-discrete-values-with-matplotlib
    the default bins will not be centered around the integer: 
    ... so the trick is to set up the bins centered on the integers, i.e.
    -0.5, 0.5, 1,5, 2.5, ... up to max(data) + 1.5. Then you substract -0.5 to eliminate the extra bin at the end.
    '''
    return np.arange(0, len(codebook) + 1.5) - 0.5


def encode_all_audios(codebook, dataframe, features):
    '''
    Codebook: array of n-dimensional points corresponding to codewords (cluster centroids)
       
    Will encode all audios, finding which is the nearest cluster centroid(code vector) for each frame.
    1-nearest-neighbour is used to get the closest centroid.
    Returns a dictionary by id and the corresponding array of encoded frames.
    '''
    nbrs = NearestNeighbors(n_neighbors=N_NEIGHBORS, algorithm='ball_tree').fit(centroids) 
    unique_ids = dataframe['track_id'].unique() #get unique ids
    encoded_tracks_by_id = {}
    for count,track_id in enumerate(unique_ids):    
        encoded_track = encode_track_frames(track_id, dataframe, features, nbrs)
        encoded_tracks_by_id[track_id] = encoded_track
        #print('encoded track ', track_id , ' , number:' , count, '/', len(unique_ids))
    return encoded_tracks_by_id


def encode_track_frames(track_id, dataframe, features, nbrs):
    file_frames = dataframe.loc[dataframe['track_id'] == track_id] # frames belonging the track_id
    frames_values = file_frames.loc[:, features].values
    _, indices = nbrs.kneighbors(frames_values) #obtain the nearest centroid for each frame on the input
    indices = indices.squeeze() # squeeze will get rid of unnecesary dimesions
    return indices 


analysis_df = pd.read_csv(DATAFRAME_CSV_ANALYSIS) 

if(generate_track_encoding):
    N_NEIGHBORS = 3
    print('encoding, getting ', N_NEIGHBORS,  '-nn... \n')

    #features used in encoding 
    #TODO: dry, only one place!!!!
    features = [ 'mfcc_0', 'mfcc_1', 'mfcc_2' , 'mfcc_3', 'mfcc_4', 'mfcc_5', 'mfcc_6', 'mfcc_7', 'mfcc_8', 'mfcc_9', 'mfcc_10', 'mfcc_11', 'mfcc_12']

    #encode all tracks:
    encoded_tracks_by_id = encode_all_audios(centroids, analysis_df, features)    
    print('encoded_tracks_by_id was computed')

    #create histograms:
    histograms_by_id = create_histograms(encoded_tracks_by_id, centroids)
    print('histograms_by_id was computed')



encoding, getting  3 -nn... 

encoded_tracks_by_id was computed
histograms_by_id was computed


### Add nearest neighbours columns, to save it later in the json

In [48]:
#print(encoded_tracks_by_id['125278.wav'])
#print(encoded_tracks_by_id['125169.wav'])

len(encoded_tracks_by_id['125169.wav'].shape)
len(encoded_tracks_by_id['125278.wav'].shape)
encoded_tracks_by_id['125169.wav'][:,1]


encoded_tracks_by_id['125278.wav'][0]
#analysis_df.head(5)

83

In [52]:
def add_nearest_neighbours_columns(dataframe, encoded_tracks_by_id):
    #Add new columns to dataframe:
    for column in range(N_NEIGHBORS):
        new_column = 'word_' + str(column+1) +  '_nearest'
        dataframe[new_column] = 0 
        print('Add new column:' , new_column)

    #fill values for each track:
    for track_id in dataframe['track_id'].unique():
        idx = dataframe.index[dataframe['track_id'] == track_id]
        for n_neighbour in range(N_NEIGHBORS):
            nearest_n_column = 'word_'+ str(n_neighbour+1) +'_nearest'
            #print("trackid: ", track_id, " n_neigbour ", n_neighbour)
            if len(encoded_tracks_by_id[track_id].shape) == 1:
                print("weird track with only one frame, trackId :", track_id, "  index: ", idx)
                dataframe.at[idx, nearest_n_column] = encoded_tracks_by_id[track_id][n_neighbour]
            else:
                dataframe.at[idx, nearest_n_column] = encoded_tracks_by_id[track_id][:,n_neighbour]
       
    
if(generate_track_encoding):
    add_nearest_neighbours_columns(analysis_df, encoded_tracks_by_id)
    analysis_df.to_csv(DATAFRAME_CSV_ANALYSIS)
    print('nearest neighbours added and csv saved')
    analysis_df.head(10)



analysis_df = pd.read_csv(DATAFRAME_CSV_ANALYSIS) 

Add new column: word_1_nearest
Add new column: word_2_nearest
Add new column: word_3_nearest
weird track with only one frame, trackId : 125278.wav   index:  Int64Index([27390], dtype='int64')
weird track with only one frame, trackId : 125278.wav   index:  Int64Index([27390], dtype='int64')
weird track with only one frame, trackId : 125278.wav   index:  Int64Index([27390], dtype='int64')
weird track with only one frame, trackId : 125285.wav   index:  Int64Index([27391], dtype='int64')
weird track with only one frame, trackId : 125285.wav   index:  Int64Index([27391], dtype='int64')
weird track with only one frame, trackId : 125285.wav   index:  Int64Index([27391], dtype='int64')
weird track with only one frame, trackId : 128275.wav   index:  Int64Index([30005], dtype='int64')
weird track with only one frame, trackId : 128275.wav   index:  Int64Index([30005], dtype='int64')
weird track with only one frame, trackId : 128275.wav   index:  Int64Index([30005], dtype='int64')
weird track with

weird track with only one frame, trackId : 87731.wav   index:  Int64Index([392620], dtype='int64')
weird track with only one frame, trackId : 87731.wav   index:  Int64Index([392620], dtype='int64')
weird track with only one frame, trackId : 87731.wav   index:  Int64Index([392620], dtype='int64')
nearest neighbours added and csv saved


### Add next_frame_id and previous_frame_id columns for json

In [None]:
def add_next_previous_columns(dataframe):
    #Add new columns to dataframe:
    dataframe['previous_frame_id'] = 'TODO'
    dataframe['next_frame_id'] = 'TODO'
    print('Add new column:', 'previous_frame_id')
    print('Add new column:', 'next_frame_id')
    
    #fill values for each track:
    for track_id in dataframe['track_id'].unique():
        idxs = dataframe.index[dataframe['track_id'] == track_id]
        for count, idx in enumerate(idxs):
            previous_idx = idx -1
            next_idx = idx +1 
            #exceptions:
            if count == 0:
                previous_idx = idxs[-1] #go to the last one
            if count == len(idxs)-1:
                next_idx = idxs[0] #back to first one
                
            previous_frame_id = analysis_df.loc[previous_idx, 'frame_id']
            next_frame_id = analysis_df.loc[next_idx, 'frame_id']
            dataframe.at[idx, 'previous_frame_id'] = previous_frame_id
            dataframe.at[idx, 'next_frame_id'] = next_frame_id        

            
if(generate_next_previous_frames):   
    add_next_previous_columns(analysis_df)
    analysis_df.to_csv(DATAFRAME_CSV_ANALYSIS)
    print('previous and next frame_id added and csv saved')


analysis_df = pd.read_csv(DATAFRAME_CSV_ANALYSIS) 
analysis_df.head(10)

Add new column: previous_frame_id
Add new column: next_frame_id


### create by_word dictionary:
This will offer a dictionary where an array of frame_id is offered for every cluster id.
This will be used by SuperCollider to get other frames on the same encoded "word"

In [None]:
def create_by_word_id_dict(dataframe = analysis_df):
    by_word = {}
    for word_id in dataframe['word_1_nearest'].unique():
        word_idx = dataframe.index[dataframe['word_1_nearest'] == word_id]
        frame_ids = dataframe.loc[word_idx, 'frame_id'].values
        by_word[str(word_id)] = frame_ids.tolist()
    return by_word


if(generate_by_word_json):
    BY_WORD_DICT_FILENAME = './files/by_word.json'
    by_word = create_by_word_id_dict(analysis_df)

    save_dict_to_json(by_word, BY_WORD_DICT_FILENAME)

    print('\n frames_id list by word dictionary was saved in ', BY_WORD_DICT_FILENAME)




## create by_frame_id dictionary
Instead of a csv, it will be much more confortable to deal with Objects in superCollider, so I will create the jsons:


## TODO: more fields could come later like embeddings, most_similars, codeword, knn(3), pitch ...

In [None]:
def create_by_frame_id_dict(tracks_df, analysis_df, labels):
    print('creating by_frame_id ...')
    
    track_ids = tracks_df['track_id'].unique() #just in case
    df_clean = analysis_df.loc[:, labels] # only desired labels
    df_clean.set_index('frame_id', inplace=True)
    by_frame_id = df_clean.to_dict('index')
    
    print(len(by_frame_id.keys()), '... by_frame_id entries created!')
    return by_frame_id
   
    
# list(df.columns.values) # all labels

columns_to_keep = [
 'absolute_path',   
 'end_sample',
 'frame_id',
 'previous_frame_id',
 'next_frame_id',
 'loudness',
 'mfcc_0',
 'mfcc_1',
 'mfcc_10',
 'mfcc_11',
 'mfcc_12',
 'mfcc_2',
 'mfcc_3',
 'mfcc_4',
 'mfcc_5',
 'mfcc_6',
 'mfcc_7',
 'mfcc_8',
 'mfcc_9',
 'word_1_nearest',
 'word_2_nearest',
 'word_3_nearest',
# 'path',
 'scale',
 'start_sample',
 'track_id']


if(generate_frame_by_id_json):

    BY_FRAME_ID_JSON = './files/by_frame_id.json'
    tracks_df = pd.read_csv(DATAFRAME_CSV_TRACKS) 
    analysis_df = pd.read_csv(DATAFRAME_CSV_ANALYSIS)

    by_frame_id = create_by_frame_id_dict(tracks_df, analysis_df, columns_to_keep)

    save_dict_to_json(by_frame_id, BY_FRAME_ID_JSON)

    test_id = 'spot1.wav_f1146'
    print('\n test frame_id: ', test_id)
    by_frame_id[test_id] #test


=====================


### Plot some histograms:
https://matplotlib.org/gallery/subplots_axes_and_figures/subplots_demo.html


## TODO (from Xavier):
 * codebook encoding
 * histogram ()
 * similarity function/matrix over the "codebook features" (varias opciones seguramente)
 * plot: plot features and their mean to show how different they are

next week with xavier:
 * degree clustering xavier technique

## SOTA:
 * Codebook:

https://www.sciencedirect.com/topics/engineering/vector-quantization 
Each input vector can be viewed as a point in an n-dimensional space. The vector quantizer is defined by a partition of this space into a set of nonoverlapping n-dimensional regions. The vector is encoded by comparing it with a codebook consisting of a set of stored reference vectors known as codevectors. 
 The optimality criterion is that a quantization region should consist of all vectors that are closer to its codevector than any of the other codevectors, and the codevector should be the average of all vectors that are in the quantization region.
 //Ali Grami, in Introduction to Digital Communications, 2016, chapter 5.2.3 Vector Quantization


Vector quantization (VQ) provides an efficient technique for data compression. Compression is achieved by transmitting the index of the codeword instead of the vector itself.
 VQ can be defined as a mapping that assigns each vector x=(x0,x1,…,xn-1)T in the n-dimensional space Rn to a codeword from a finite subset of Rn. The subset Y={yi:i=1,2,…,M} representing the set of possible reconstruction vectors is called a codebook of size M. Its members are called the codewords. In the encoding process, a distance measure is evaluated to locate the closest codeword for each input vector x. Then, the address corresponding to the codeword is assigned to x and transmitted. 
 A vector quantizer achieving a minimum encoding error is referred to as a Voronoi quantizer. Figure 7.9 shows an input data space partitioned into four different regions, called Voronoi cells, and the corresponding Voronoi vectors. These regions describe the collection of only those input vectors that are very close to the respective Voronoi vector.
 //Anke Meyer-Baese, Volker Schmid, in Pattern Recognition and Signal Analysis in Medical Imaging (Second Edition), 2014
 
 
  Thus the entire space Sb is divided into a finite number of cells and a code point is associated with each one. The code point is used to represent all of the points in that cell during the clustering process. The point with the smallest function value of a cell is the most suitable code point. Further, code points need not be sample points; they can be generated independently. They may also be centroids of the cells. Identification of a cluster is done using vector quantization of the reduced sample points. 
  // Jasbir Singh Arora, in Introduction to Optimum Design (Fourth Edition), 2017
 
 
 
 
 ----
 More on this but different:
 * Learnig Vector Quantisation: 
 Recent developments in neural network architectures have led to a new VQ concept, the so-called learning vector quantization (LVQ). It represents an unsupervised learning algorithm associated with a competitive neural network consisting of one input and one output layer. The algorithm permits only the update of the winning prototype, that is, the closest prototype (Voronoi vector) of the LVQ network.
  LVQ procedures are intuitively clear and easy to implement. The classification of data is based on a comparison with a number of so-called prototype vectors.
The relative simplicity of the LVQ and its ability to work in unsupervised mode have made it a useful tool for image segmentation problems [190]
 
 
 
 //k-means for example can be used for vector quantization, taking the centroid of each cluster as the codebook.
In computer vision, the bag-of-words model (BoW model) can be applied to image classification, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features.
https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision#cite_note-feifeicvpr2005-1
 
 
------- 
------- 
( 
https://machinelearningmastery.com/implement-learning-vector-quantization-scratch-python/
LVQ is a supervised version of vector quantization that can be used when we have labelled input data.
A limitation of k-Nearest Neighbors is that you must keep a large database of training examples in order to make predictions.

The Learning Vector Quantization algorithm addresses this by learning a much smaller subset of patterns that best represent the training data.
Predictions are made by finding the best match among a library of patterns. The difference is that the library of patterns is learned from training data, rather than using the training patterns themselves

(In LVQ) The library of patterns are called codebook vectors and each pattern is called a codebook. The codebook vectors are initialized to randomly selected values from the training dataset. Then, over a number of epochs, they are adapted to best summarize the training data using a learning algorithm.
 )



---
(Importance of Vector Quantization in Audio Signals Processing)
https://www.dsprelated.com/thread/3543/importance-of-vector-quantization-in-audio-signals-processing
The mathematical construction of GMMs allows people to apply fancy training criterion, especially those that make statistical sense, e.g. max likelihood or max a posteriori or minimum phoneme error rate. That also brings in a lot of computational complexity though. On the contrary, plain VQ requires just a discrete HMM, trainable with the textbook version of EM algorithm, an order of magnitude faster than HMM-GMM.

By 2000s computers are fast enough that none of those presents a problem unless you're doing the so called deep learning stuffs.

## DUDAS PARA XAVIER:
* Fuzzy clustering para codebook: En vez de fuzzy clustering, podriamos hacer el codebook con "hard" clustering (eg k-means) y hacer k-nn <1? (entonces quiza no sabríamos el porcentaje?)