# Contents
* [Introduction](#Introduction)
* [Imports and configuration](#Imports-and-configuration)
* [Data](#Data)
* [Loading](#Loading)
  * [Trial 1](#Trial-1)
  * [Trial 2](#Trial-2)
  * [Trial 3](#Trial-3)
* [Discussion](#Discussion)

## Introduction

In this notebook, we compare `torchaudio` and `librosa` on how fast they can load a .wav file from my data. Using a slow load method is a potential drag on ease of development that could be avoided. `torchaudio` wins 3 out of 3 trials.

## Imports and configuration

In [1]:
# set random seeds

from os import environ
from random import seed as random_seed
from numpy.random import seed as np_seed
from tensorflow.random import set_seed


def reset_seeds(seed: int) -> None:
    """Utility function for resetting random seeds"""
    environ["PYTHONHASHSEED"] = str(seed)
    random_seed(seed)
    np_seed(seed)
    set_seed(seed)


reset_seeds(SEED := 2021)

In [2]:
# extensions
%load_ext autotime
%load_ext lab_black
%load_ext nb_black

In [3]:
# core
import numpy as np
import pandas as pd

# loading methods
import librosa
import torchaudio
from librosa import load as librosa_load
from torchaudio import load as torchaudio_load

# utility
from gc import collect as gc_collect

# visualization
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_context("notebook")
sns.set_style("whitegrid")
%matplotlib inline

# display outputs w/o print calls
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

time: 3.13 s


In [4]:
# Location of pickled dataframes
PICKLED_DF_FOLDER = (
    "../1.0-mic-divide_data_by_duration"
)

# The preprocessed data from the Unified Multilingual Dataset of Emotional Human utterances
WAV_DIRECTORY = "../../../unified_multilingual_dataset_of_emotional_human_utterances/data/preprocessed"

time: 1 ms


## Data

### Loading

In [5]:
# read medium dataframe
files = pd.read_pickle(f"{PICKLED_DF_FOLDER}/medium.pkl").file
files.describe()

count                                          81099
unique                                         81099
top       00000+aesdd+aesdd.1+f+ang+-1+ell+el-gr.wav
freq                                               1
Name: file, dtype: object

time: 146 ms


Let's take a look at the returned outputs of either load method. We want the outputs to be somewhat similar.

In [6]:
sample_files = files.head(5)

time: 0 ns


In [7]:
librosa_load_sample = sample_files.apply(
    lambda row: librosa_load(path=f"{WAV_DIRECTORY}/{row}", sr=None)[0]
)
librosa_load_sample
librosa_load_sample[0]
len(librosa_load_sample[0])

id
00000    [0.0, 0.0007324219, 0.0012207031, 0.002380371,...
00001    [0.0020446777, 0.0004272461, -9.1552734e-05, -...
00002    [-0.0011291504, -0.0012512207, -0.0014953613, ...
00003    [0.0012207031, 0.0010375977, 0.0008239746, 0.0...
00004    [0.0, -9.1552734e-05, 0.0, 0.0, 0.0, 0.0, 0.0,...
Name: file, dtype: object

array([ 0.        ,  0.00073242,  0.0012207 , ..., -0.00030518,
       -0.0005188 , -0.00030518], dtype=float32)

66064

time: 20 ms


In [8]:
torchaudio_load_sample = sample_files.apply(
    lambda row: torchaudio_load(filepath=f"{WAV_DIRECTORY}/{row}")[0][0]
)
torchaudio_load_sample
torchaudio_load_sample[0]
len(torchaudio_load_sample[0])

id
00000    [tensor(0.), tensor(0.0007), tensor(0.0012), t...
00001    [tensor(0.0020), tensor(0.0004), tensor(-9.155...
00002    [tensor(-0.0011), tensor(-0.0013), tensor(-0.0...
00003    [tensor(0.0012), tensor(0.0010), tensor(0.0008...
00004    [tensor(0.), tensor(-9.1553e-05), tensor(0.), ...
Name: file, dtype: object

tensor([ 0.0000,  0.0007,  0.0012,  ..., -0.0003, -0.0005, -0.0003])

66064

time: 1.7 s


The resultant `Series` are similar. One holds `numpy` arrays and more precision while the other holds `pytorch` tensors.

## Results

We will redo some of the previous steps in case there is caching under the hood.

In [9]:
print(torchaudio.__version__)
print(librosa.__version__)
del sample_files
_ = gc_collect()

0.10.0+cpu
0.8.1
time: 117 ms


### Trial 1

In [10]:
files = pd.read_pickle(f"{PICKLED_DF_FOLDER}/medium.pkl").file.sample(
    frac=0.05, random_state=SEED + 1
)

time: 136 ms


In [11]:
_ = gc_collect()
%timeit files.apply(lambda row: librosa_load(path=f"{WAV_DIRECTORY}/{row}", sr=None)[0])
_ = gc_collect()
%timeit files.apply(lambda row: torchaudio_load(filepath=f"{WAV_DIRECTORY}/{row}")[0][0])

2.67 s ± 199 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.51 s ± 17.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time: 41.4 s


In [12]:
del files

time: 3 ms


### Trial 2

In [13]:
files = pd.read_pickle(f"{PICKLED_DF_FOLDER}/medium.pkl").file.sample(
    frac=0.05, random_state=SEED + 2
)

time: 149 ms


In [14]:
_ = gc_collect()
%timeit files.apply(lambda row: librosa_load(path=f"{WAV_DIRECTORY}/{row}", sr=None)[0])
_ = gc_collect()
%timeit files.apply(lambda row: torchaudio_load(filepath=f"{WAV_DIRECTORY}/{row}")[0][0])

2.59 s ± 170 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.54 s ± 60.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time: 41 s


In [15]:
del files

time: 2 ms


### Trial 3

In [16]:
files = pd.read_pickle(f"{PICKLED_DF_FOLDER}/medium.pkl").file.sample(
    frac=0.05, random_state=SEED + 3
)

time: 157 ms


In [17]:
_ = gc_collect()
%timeit files.apply(lambda row: librosa_load(path=f"{WAV_DIRECTORY}/{row}", sr=None)[0])
_ = gc_collect()
%timeit files.apply(lambda row: torchaudio_load(filepath=f"{WAV_DIRECTORY}/{row}")[0][0])

2.54 s ± 132 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.53 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time: 40.5 s


In [18]:
del files

time: 3 ms


## Discussion

`pytorch`/`torchaudio` wins 3 out of 3 trials vs `librosa`.

[^top](#Contents)