# Comparison of F0 Estimation Algorithms for Monophonic Vocal Signals

The following cell contains helper functions that will be used for the evaluation of the CREPE and pYIN algorithms.

get_files_to_process goes through the medlelDB directory and uses an accompanying csv file to retrieve a list of the songs which have vocals. I am also only selecting the raw vocal files from stems that are marked as having one of the following instrument labels (as noted in the medleyDB metadata yaml files "./metadata/*METADATA.yml").

does_estimation_exist determines whether or not the output estimation already exists.

write_estimation will create a txt file containing the pair of timestamp and f0 for the given song. The left column will be the timestamp, and the right column will be the estimated f0. The columns are separated by a single space 

In [18]:
import os
import csv
import numpy
import yaml

def get_files_to_process(medleyDB_directory, medley_db_metadata_csv_path):
    metaDataDirectory = './metadata'
    # Reference the short list of songs that have vocals
    with open(medley_db_metadata_csv_path) as f:
        reader = csv.DictReader(f)
        # store meta data in memory
        meta_data = list(reader)

    # These are the instrument names in medlyDb annotaitons that apply to the singing voices
    keywords = ['male singer', 'female singer', 'vocalists']
    files_to_process = []
    for dir_paths, dir_names, files in os.walk(medleyDB_directory):
        for file in files:
            if '._LizNelson_Rainfall' in file:
                # getting a weird bug with this hidden file that doesn't appear in medley db, Skip it
                continue
            full_path = os.path.join(dir_paths, file)
            song_name = full_path.split('/')[2]
            for song_metadata in meta_data:
                # Only process songs that are marked as having vocals 
                if song_metadata['Song'] == song_name and int(song_metadata['Has_Vocals']):
                    if 'RAW' in full_path:
                        metaDataPath = os.path.join(metaDataDirectory, song_name + '_METADATA.yaml')
                        with open(metaDataPath, 'r') as yaml_in:
                            metaDataObject = yaml.safe_load(yaml_in)
                        stemNumber = 'S' + file.split('_')[-2]
                        rawNumber = 'R' + file.split('_')[-1].split('.')[0]
                        stem = metaDataObject['stems'][stemNumber]
                        raw = stem['raw'][rawNumber]
                        if raw['instrument'] in keywords:
                            files_to_process.append(full_path)
    return files_to_process

def write_estimation(estimations_dir, path, times, f0):
    path_to_write = estimations_dir + '/' + path.split('/')[-1].split('.')[0] + '.txt'
    numpy.savetxt(path_to_write, numpy.c_[numpy.asarray(times), numpy.asarray(f0)])

def does_estimation_exist(estimations_dir, path):
    path_to_write = estimations_dir + '/' + path.split('/')[-1].split('.')[0] + '.txt'
    fullpath = os.path.join(estimations_dir, path)
    return os.path.isfile(path_to_write)

Defining a variable for the files to process, using the medleyDB path and the accompanying metadata.csv

In [19]:
files_to_process = get_files_to_process('./V1/', 'MedleyDBV1_Metadata.csv')

## pYIN

This cell will create estimations for the pYIN algorithm. If the estimations for the file in questions already exists, it will be skipped, otherwise the estimation will be created using pYIN and the write_estimation function. This was done due to the fact that these calculations take significant amounts of time and this allows the developer to split the task while maintaining progress of processed files. librosa is used to load the audio and invoke the pyin esimation function. Values of NaN are converted to 0 before saving to the estimation file for the sake of consistency and later use in the evaluations.

In [20]:
import librosa

### pYIN
### All f0 estimations will go in this folder
estimations_dir = './pyin_estimations'

for path in files_to_process:
    if does_estimation_exist(estimations_dir, path):
        print('skipping:', path)
    else:
        print('processing:', path)
        (y, sr) = librosa.load(path)
        (f0, voiced_flag, voiced_probs) = librosa.pyin(y, fmin=librosa.note_to_hz('C2'), fmax=librosa.note_to_hz('C7'))
        times = librosa.times_like(f0)
        f0[numpy.isnan(f0)] = 0
        write_estimation(estimations_dir, path, times, f0)


skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_08_01.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_10_01.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_10_02.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_13_01.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_13_02.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_13_03.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_13_04.wav
skipping: ./V1/AimeeNorwich_Child/AimeeNorwich_Child_RAW/AimeeNorwich_Child_RAW_04_01.wav
skipping: ./V1/AimeeNorwich_Child/AimeeNorwich_Child_RAW/AimeeNorwich_Child_RAW_04_02.wav
skipping: ./V1/AimeeNo

## CREPE

The cell below is similar to the one above, however this implements the crepe algorithm for predictions. This similarly checks whether the estimation already exists, and writes a new one if it needs to. One of the main differences between this and the pYIN implementaiton is the use of a confidence threshold for voicing. A frequency estimation was converted to zero if the confidence of that prediction was less than 50%. 

In [25]:
import crepe

### Crepe 
### All f0 estimations will go in this folder
estimations_dir = './crepe_estimations'

for path in files_to_process:
    if does_estimation_exist(estimations_dir, path):
        print('skipping:', path)
    else:
        print('processing:', path)
        (y, sr) = librosa.load(path)
        time, frequency, confidence, activation = crepe.predict(y, sr, viterbi=True)
        # using confidence for non voicing
        confidence_threshold = 0.5
        # converting NaNs to 0
        frequency[numpy.isnan(frequency)] = 0
        confidence[numpy.isnan(confidence)] = 0

        for i in range(len(frequency)):
            if confidence[i] < confidence_threshold:
                frequency[i]=0 
        write_estimation(estimations_dir, path, time, frequency)

skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_08_01.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_10_01.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_10_02.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_13_01.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_13_02.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_13_03.wav
skipping: ./V1/AClassicEducation_NightOwl/AClassicEducation_NightOwl_RAW/AClassicEducation_NightOwl_RAW_13_04.wav
skipping: ./V1/AimeeNorwich_Child/AimeeNorwich_Child_RAW/AimeeNorwich_Child_RAW_04_01.wav
skipping: ./V1/AimeeNorwich_Child/AimeeNorwich_Child_RAW/AimeeNorwich_Child_RAW_04_02.wav
skipping: ./V1/AimeeNo

# Evaluation

The following cell iterates each file in the estimation directory for both of the algorithms used. I determine whether or not the file in question is the raw file of the predominant melody. This information is stored in the medleyDB annotations in the ranking folder, the predominant melody will contain ',1' in the ranking file for this song. If the stem of the file being processed is not the predominant melody, I skip it. If it is the predominant melody, I get the ground truth annotation that correseponds to the file, use mir_eval.melody.evaluate() to calculate the RPA, RCA, VR, VFA, and OA for each estimation. The results are also saved to a file

In [22]:
import mir_eval
import pandas
import json

evaluations = ['crepe', 'pyin']
# Store values for each file in a csv
for algorithm in evaluations:
    algorithmEstimationsPath = './{0}_estimations/'.format(algorithm)
    for dir_paths, dir_names, files in os.walk(algorithmEstimationsPath):
        for file in files:

            songName = file.split('_RAW')[0]
            if 'Grants_PunchDrunk' in songName: # skipping a rap song that got included as vocals
                continue

            # get ranking
            stemNumber = file.split('_')[-2]
            rankingPath = './annotations/Ranking/{0}_RANKING.txt'.format(songName)
            f = open(rankingPath,"r")
            lines=f.readlines()
            index = [idx for idx, s in enumerate(lines) if ',1' in s][0] # get ranking = 1
            melodyPath = lines[index]
            # Melody number corresponds to the stem number of the predominant melody according to the rankings
            melodyNumber = melodyPath.split('_')[-1].split('.')[0]
            if melodyNumber != stemNumber:
                # skip song if not predominant melody
                continue

            # Get annotation
            annotationPath = './annotations/Melody1/{0}_MELODY1.csv'.format(songName)
            df = pandas.read_csv(annotationPath, header=None)
            annotationAsNPArray = df.to_numpy()
            referenceTime = annotationAsNPArray[:,0]
            referenceFreq = annotationAsNPArray[:,1]

            # Get estimation
            estimationPath = os.path.join(dir_paths, file)
            f = open(estimationPath,"r")
            lines=f.readlines()
            estimationTime = []
            estimationFreq = []
            for x in lines:
                estimationTime.append(float(x.split(' ')[0]))
                estimationFreq.append(float(x.split(' ')[1]))
            f.close()

            # calculate evaluation
            evaluation = mir_eval.melody.evaluate(referenceTime, referenceFreq, numpy.asarray(estimationTime), numpy.asarray(estimationFreq))

            # write evaluation to file
            with open('./{0}_evaluations/{1}.json'.format(algorithm, file.split('.')[0]), "w") as fp:
                json.dump(evaluation, fp)

### Averages

This last cell takes the average of all previous evaluations

In [34]:
algorithms = {'crepe': {}, 'pyin': {}}
for algorithm in algorithms:
    numberOfSongs = 0
    algorithmEvaluationsPath = './{0}_evaluations/'.format(algorithm)
    for dir_paths, dir_names, files in os.walk(algorithmEvaluationsPath):
        for file in files:
            songName = file.split('_RAW')[0]
            numberOfSongs += 1
            with open(os.path.join(dir_paths, file)) as f:
                evaluation = json.load(f)
            if not algorithms[algorithm]:
                algorithms[algorithm] = evaluation
            else:
                for key in evaluation:
                    algorithms[algorithm][key] += evaluation[key]
    for key in algorithms[algorithm]:
        algorithms[algorithm][key] /= numberOfSongs
print(algorithms)
            

{'crepe': {'Voicing Recall': 0.9032354884602297, 'Voicing False Alarm': 0.23387268590438198, 'Raw Pitch Accuracy': 0.8060665978353974, 'Raw Chroma Accuracy': 0.8084688557679225, 'Overall Accuracy': 0.803104728629202}, 'pyin': {'Voicing Recall': 0.8994599671642168, 'Voicing False Alarm': 0.3129949987499292, 'Raw Pitch Accuracy': 0.7112585582536881, 'Raw Chroma Accuracy': 0.7199449247978366, 'Overall Accuracy': 0.7121402578693595}}
