This Notebook is copied from https://www.kaggle.com/thailssonclementino/melspectrograms-with-librosa

I just translated the text with google translator.

Esse notebook foi feito para gerar mel-espectrogramas usando a biblioteca librosa.

[English translation]
This notebook was made to generate mel spectrograms using the librosa library.


In [None]:
#Importando biblioteca utilizadas.
#important libraries
import librosa as librosa
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import librosa.display
import warnings
import cv2
import timeit
import pandas as pd

warnings.filterwarnings('ignore')

A função abaixo recebe um mel-espectrograma gerado pelo librosa e retorna uma array que representa uma imagem. Será usado mais abaixo.

[English translation]The function below receives a mel-spectrogram generated by librosa and returns an array representing an image. It will be used below.

In [None]:
def mono_to_color(X: np.ndarray,
                  mean=None,
                  std=None,
                  norm_max=None,
                  norm_min=None,
                  eps=1e-6):
    """
    Code from https://www.kaggle.com/daisukelab/creating-fat2019-preprocessed-data
    """
    # Stack X as [X,X,X]
    #X = np.stack([X, X, X], axis=-1)

    # Standardize
    mean = mean or X.mean()
    X = X - mean
    std = std or X.std()
    Xstd = X / (std + eps)
    _min, _max = Xstd.min(), Xstd.max()
    norm_max = norm_max or _max
    norm_min = norm_min or _min
    if (_max - _min) > eps:
        # Normalize to [0, 255]
        V = Xstd
        V[V < norm_min] = norm_min
        V[V > norm_max] = norm_max
        V = 255 * (V - norm_min) / (norm_max - norm_min)
        V = V.astype(np.uint8)
    else:
        # Just zero
        V = np.zeros_like(Xstd, dtype=np.uint8)
    return V

O código em baixo é uma maneira mais rápida de carregar os audios com o librosa

The code below is a faster way to load audios with librosa

In [None]:
def get_clip_sr(path,offset=0,duration=None):
    clip, sr_native = librosa.core.audio.__audioread_load(path, offset=offset, duration=duration, dtype=np.float32)
    clip = librosa.to_mono(clip)
    sr = 22050
    if sr_native > 0:
        clip = librosa.resample(clip, sr_native, sr, res_type='kaiser_fast')
    return clip, sr

Abaixo estou iterando por todos as linhas presentes no arquivo train.csv e extraindo os mel-espectrogramas e gerando imagens para os 5s iniciais de cada audios pertencente a primeira espécie.

Below I am iterating through all the lines present in the train.csv file and extracting the mel spectrograms and generating images for the initial 5s of each audios belonging to the first species.


In [None]:
#Definindo os caminhos dos audios e abrindo o arquivo csv
#Defining the audio paths and opening the csv file
PATH_TRAIN = "../input/birdsong-recognition/train_audio/"
train_info = pd.read_csv('../input/birdsong-recognition/train.csv')

#Declarando variavel para contar o tempo
#Declaring variable to count time

start = timeit.default_timer()

cnt = 0 

#Iterando pelas linhas do arquivo csv
#Iterating through the lines of the csv file

for index, row in train_info.iterrows():

    #Saindo do loop se a espécie não for a primeira(aldfly)
    #Leaving the loop if the species is not the first (aldfly)
    
    ebird_code = row['ebird_code']
    if(ebird_code != 'aldfly'):
        break   
    
    #Verificando a duração dos audios
    #Checking the duration of audios
    duration = row['duration']
    if(duration < 5):
        continue
        
    #Montando o caminho do audio    
    #Setting the audio path
    
    full_path = PATH_TRAIN + ebird_code +'/' + row['filename']
    
    #Carregando os 5s iniciais do aúdio
    #Loading the initial 5s of the audio
    y,sr =  get_clip_sr(full_path,0,5);
    
    #Transformando em mel-spectrograma
    #Turning into mel spectrogram
    
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=310,fmin=160,fmax=10300)
    S_dB = librosa.power_to_db(S, ref=np.max)
    
    #Ajustando imagem
    #Adjusting image
    
    im = mono_to_color(S_dB);
    im = cv2.resize(im, (224,224))
    im = cv2.flip(im, 0)
    
    #Escrevendo imagem
    #Writing image
    cv2.imwrite('{}.png'.format(str(cnt)),im)
    cnt += 1

#contador final do tempo
#end time counter
stop = timeit.default_timer()

print('Time: ', stop - start) 

Exemplo de visualização das imagens geradas.


Example of visualization of the generated images.

In [None]:
import matplotlib.image as mpimg
img = mpimg.imread('./0.png')
imgplot = plt.imshow(img)