[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1vSPf0npfBJLW-ZV8dXD292VVpP6V1vbW?usp=drive_link)


We are working with audiofiles. We plan on feeding them as spectrograms to the neural network. First we begin with downloading and importing the dependencies.

In [1]:
# install packages
!pip install gdown
!pip install librosa



In [2]:
# import packages
import pandas as pd
import numpy as np
import gdown
import os
import librosa.display
import librosa
import zipfile
from sklearn.model_selection import train_test_split
from itertools import chain

The files are supplied with labels that relate to the species of bird present in the recording. There are 264 different categories in the provided dataset. We plan on using on-hot arrays as target values, and this class creates one-hot arrays from these labels based on the header of the sample_submission.csv in the provided dataset.

In [3]:
# Class for generating one-hot codes from the labels
class LabelCoder:
  # Extracting the categories from the header of a csv file (sample_submissions.csv in this case). Filepath, and the index of the first column in the header with a valid label are required arguments
  def __init__(self, path_to_file, labels_start_in_header):
    self.encoding_list = pd.read_csv(path_to_file, header=None).loc[0, labels_start_in_header:].tolist()

  # Converting the labels to one-hot codes. This method considers the possibility for one file to have multiple labels
  def encode(self, input):
    indices = []
    # Gathering the indices of the labels in the original list
    # If there are multiple labels
    if type(input) == list:
      for label in input:
        if label in self.encoding_list:
          indices.append(self.encoding_list.index(label))
    # If there is only one
    else:
      if input in self.encoding_list:
        indices.append(self.encoding_list.index(input))

    # Creating an array with the size of the number of categories with all zero values
    encoded_info = np.zeros(len(self.encoding_list))
    # Replacing zeros with ones at the indices of the present categories
    encoded_info[indices] = 1
    return encoded_info

  # Converting one-hot arrays to lists of categories
  def decode(self, array):
    # Gathering the indices of the ones in the
    indices = np.where(array != 0)[0]
    labels_list = []
    for index in indices:
        # Retrieving the appropriate label for the index from the original list of categories
        labels_list.append(self.encoding_list[index])
    return labels_list

There around 17000 audiofiles in the provided training dataset. About 2000 of them has multiple labels. We decided to exclude them from the training dataset for a simpler input for the neural network and because there is plenty of data at hand. Also, in the metadata of the files there is rating describing the quality of the sound files. We decided to throw the files with low ratings.

In [4]:
def cleandata(df):

    df = pd.read_csv('train_metadata.csv')

    print("Fill missing data")

    # get columns with missing data (longitude ad latitude)
    cols_w_missing_data = df.columns[df.isnull().any()]

    # fill missing data with the average of the other values in the column
    for column in cols_w_missing_data:
        mean = df[column].mean()
        df[column].fillna(mean,inplace=True)


    print("Cleaning data...")

    # delete data with low rating (rating in interval [0.0, 0.5]) from dataframe
    # get the names of the files with a rating of 0.0 or 0.5 - these are the files we want to get rid of
    poorRatingFilenames = df.loc[df['rating'].isin([0.0, 0.5])]['filename'].values.tolist()

    rows_to_delete = df.loc[df['rating'].isin([0.0, 0.5])].index
    df.drop(rows_to_delete, inplace=True)

    # the folder which contains the audio files
    trainpath='train_audio'

    # creating an array of the files to delete with the full path
    FilesToDelete = [os.path.join(subdir, file) for subdir, dirs, files in os.walk(trainpath) for file in files if os.path.basename(subdir)+'/'+file in poorRatingFilenames]

    # delete files with more than 1 labels
    rows_with_multiple_labels = df[df['secondary_labels'].apply(lambda x: len(x) > 2)]
    df.drop(rows_with_multiple_labels.index, inplace=True)
    files_with_multiple_labels = rows_with_multiple_labels['filename']
    file_array = np.array(files_with_multiple_labels)

    for i in range(len(file_array)):
        file_path = (os.path.join("train_audio/" , file_array[i]))
        FilesToDelete.append(file_path)

    # also delete files that are shorter than 1 sec
    # it takes a lot of time (about 45 mins) to search these files so we built it in the code
    short_files = ["categr/XC368933.ogg","categr/XC368934.ogg","eubeat1/XC647701.ogg","gargan/XC310912.ogg","gobbun1/XC200993.ogg","greegr/XC338469.ogg","litegr/XC147857.ogg","piekin1/XC601791.ogg","rerswa1/XC191112.ogg","strher/XC255388.ogg"]
    for i in range(len(short_files)):
        df.drop(df[df['filename'] == short_files[i]].index,inplace=True)
        file_path = os.path.join("train_audio/" , short_files[i])
        FilesToDelete.append(file_path)

    FilesToDelete = np.unique(FilesToDelete)

    ## if there aren't any files to delete, then we dont need to do anything - assuming the data path is right
    if len(FilesToDelete) == 0:
        print("Data has already been cleaned")
    else:
        count = sum(1 for file in FilesToDelete if os.remove(file) is None)
        print(f"Deleted {count} files")

    return df

As storing spectrograms requires lots of memory (about 10 MB each) and we have around 14000 remaining audiofiles, storing them all at once would require around 140 GB of memory. To avoid this, we decided to feed the files in batches to the neural network and only convert them to spectrograms right before passing them as inputs to the neural network.  Before that, we refer to them by their file paths. Similarly, we do the same with the labels, we only create the one-hot arrays before passing the data to the neural network.

This piece of code reads the filepaths and sorts them in an alphabetic order (as the metadata is sorted like that as well)

In [5]:
def read_file_paths():
    main_directory = 'train_audio'
    file_paths = []

    # go through all folders and get the paths of all .ogg audio files
    for root, directories, files in os.walk(main_directory):
        for file in files:
            if file.endswith('.ogg'):
                file_path = os.path.join(root, file)
                file_paths.append(file_path)

    # os.walk may not go in alphabetical order thus it needs to be sorted
    file_paths.sort()
    return file_paths

This function fetches the audiofiles and converts them to spectrograms, and then normalizes them in batches. It also creates their respective-labels to one-hot arrays.

In [6]:
# Reading the sound files in batches and converting them to spectrograms and normalizing the data as well as converting the labels to one-hot arrays
def get_batch(file_paths,labels,batch_number,batch_size,data_size):
    lc = LabelCoder('sample_submission.csv', 1)
    batch = []
    encoded_labels = []
    end_of_batch = np.min(((batch_number+1)*batch_size,data_size))
    for i in range(batch_number*batch_size,end_of_batch):
        filename = file_paths[i]
        print(filename)
        samples, sample_rate = librosa.load(filename, sr=None)
        spectrogram = librosa.amplitude_to_db(np.abs(librosa.stft(samples)),ref=np.max)
        mean = np.mean(spectrogram)
        std = np.std(spectrogram)
        batch.append((spectrogram-mean)/std)
        encoded_labels.append(lc.encode(labels[i]))

    return batch, encoded_labels

We download the data, unzip the folder, and delete the unneccesary files as well as remove thei data from the metadata csv.

In [7]:
# download data
url = 'https://drive.google.com/u/0/uc?id=1y3XTDabEW5vhhh2Seh3FYsdMA3yNodMz&export=download'
output = 'database.zip'

gdown.download(url,output)

Downloading...
From: https://drive.google.com/u/0/uc?id=1y3XTDabEW5vhhh2Seh3FYsdMA3yNodMz&export=download
To: /content/database.zip
100%|██████████| 5.27G/5.27G [00:46<00:00, 114MB/s] 


'database.zip'

In [8]:
# unzip data
with zipfile.ZipFile('database.zip', 'r') as zip_ref:
    zip_ref.extractall()

In [9]:
# print dataframe
df = pd.read_csv('train_metadata.csv')
df

Unnamed: 0,primary_label,secondary_labels,type,latitude,longitude,scientific_name,common_name,author,license,rating,url,filename
0,abethr1,[],['song'],4.3906,38.2788,Turdus tephronotus,African Bare-eyed Thrush,Rolf A. de By,Creative Commons Attribution-NonCommercial-Sha...,4.0,https://www.xeno-canto.org/128013,abethr1/XC128013.ogg
1,abethr1,[],['call'],-2.9524,38.2921,Turdus tephronotus,African Bare-eyed Thrush,James Bradley,Creative Commons Attribution-NonCommercial-Sha...,3.5,https://www.xeno-canto.org/363501,abethr1/XC363501.ogg
2,abethr1,[],['song'],-2.9524,38.2921,Turdus tephronotus,African Bare-eyed Thrush,James Bradley,Creative Commons Attribution-NonCommercial-Sha...,3.5,https://www.xeno-canto.org/363502,abethr1/XC363502.ogg
3,abethr1,[],['song'],-2.9524,38.2921,Turdus tephronotus,African Bare-eyed Thrush,James Bradley,Creative Commons Attribution-NonCommercial-Sha...,5.0,https://www.xeno-canto.org/363503,abethr1/XC363503.ogg
4,abethr1,[],"['call', 'song']",-2.9524,38.2921,Turdus tephronotus,African Bare-eyed Thrush,James Bradley,Creative Commons Attribution-NonCommercial-Sha...,4.5,https://www.xeno-canto.org/363504,abethr1/XC363504.ogg
...,...,...,...,...,...,...,...,...,...,...,...,...
16936,yewgre1,[],[''],-1.2502,29.7971,Eurillas latirostris,Yellow-whiskered Greenbul,András Schmidt,Creative Commons Attribution-NonCommercial-Sha...,3.0,https://xeno-canto.org/703472,yewgre1/XC703472.ogg
16937,yewgre1,[],[''],-1.2489,29.7923,Eurillas latirostris,Yellow-whiskered Greenbul,András Schmidt,Creative Commons Attribution-NonCommercial-Sha...,4.0,https://xeno-canto.org/703485,yewgre1/XC703485.ogg
16938,yewgre1,[],[''],-1.2433,29.7844,Eurillas latirostris,Yellow-whiskered Greenbul,András Schmidt,Creative Commons Attribution-NonCommercial-Sha...,4.0,https://xeno-canto.org/704433,yewgre1/XC704433.ogg
16939,yewgre1,[],[''],0.0452,36.3699,Eurillas latirostris,Yellow-whiskered Greenbul,Lars Lachmann,Creative Commons Attribution-NonCommercial-Sha...,4.0,https://xeno-canto.org/752974,yewgre1/XC752974.ogg


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16941 entries, 0 to 16940
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   primary_label     16941 non-null  object 
 1   secondary_labels  16941 non-null  object 
 2   type              16941 non-null  object 
 3   latitude          16714 non-null  float64
 4   longitude         16714 non-null  float64
 5   scientific_name   16941 non-null  object 
 6   common_name       16941 non-null  object 
 7   author            16941 non-null  object 
 8   license           16941 non-null  object 
 9   rating            16941 non-null  float64
 10  url               16941 non-null  object 
 11  filename          16941 non-null  object 
dtypes: float64(3), object(9)
memory usage: 1.6+ MB


In [11]:
# Cleaning the data
df = cleandata(df)

Fill missing data
Cleaning data...
Deleted 2735 files


We read the filenames and the labels as the X and Y values and split the data into training, test and validation sets.


In [12]:
# Reading the filenames and the respective labels
X = read_file_paths()
Y = list(df['primary_label'])

In [13]:
# Splitting the data to test and validation sets
X_train, X_test_val, Y_train, Y_test_val = train_test_split(X, Y, test_size=0.2, random_state=42)

X_test, X_val, Y_test, Y_val = train_test_split(X_test_val, Y_test_val, test_size=0.5, random_state=42)

Showing how we plan on passing the data to the neural network.

In [14]:
# Reading the batches and processing the data to be fed to the neural network will be part of the learn function, this only shows how it will work
batch_number = 0
X_for_learn, Y_for_learn = get_batch(file_paths=X,labels=Y,batch_number=batch_number,batch_size=1,data_size=len(X))
X_for_learn, Y_for_learn

train_audio/abethr1/XC128013.ogg


([array([[-1.999567 ,  0.7775941,  3.3069386, ...,  2.4107227,  4.0941734,
           3.8054092],
         [-1.8875908,  1.5222849,  3.4738145, ...,  3.585174 ,  4.2658105,
           3.6609228],
         [-1.3954798,  2.043205 ,  3.5988164, ...,  3.9832664,  4.206435 ,
           3.440446 ],
         ...,
         [-1.999567 , -1.999567 , -1.999567 , ..., -1.999567 , -1.999567 ,
          -1.999567 ],
         [-1.999567 , -1.999567 , -1.999567 , ..., -1.999567 , -1.999567 ,
          -1.999567 ],
         [-1.999567 , -1.999567 , -1.999567 , ..., -1.999567 , -1.999567 ,
          -1.999567 ]], dtype=float32)],
 [array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,