Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MP3Compressor introduces audible glitches #127

Closed
iCorv opened this issue Jul 27, 2022 · 4 comments · Fixed by #129
Closed

MP3Compressor introduces audible glitches #127

iCorv opened this issue Jul 27, 2022 · 4 comments · Fixed by #129

Comments

@iCorv
Copy link
Contributor

iCorv commented Jul 27, 2022

MP3Compressor: Introduces audible glitches in multiprocessing environment, in particular tf.data.

Expected behavior

  • Releases Python's Global Interpreter Lock (GIL) to allow use of multiple CPU cores
  • Tested compatibility with TensorFlow - can be used in tf.data pipelines!

Therefore, would expect that multiple instances of MP3Compressor are ok.

Actual behavior

When calling MP3Compressor from a tf.numpy_function() using tf.data:

def rand_mp3compression(signal, sample_rate):
    return MP3Compressor(np.random.uniform(1.0, 9.5)).process(signal, sample_rate)

dataset.map(
    lambda audio: tf.numpy_function(rand_mp3compression, [audio, sample_rate], tf.float32), 
    num_parallel_calls=2
)

This results in audible glitches which are also visible in the spectrum:
Screenshot 2022-07-27 at 18 54 50

The issue vanishes when setting num_parallel_calls=1, which indicates a problem with multiprocessing. Using the GSM compression does not show a similar issue, so maybe it is connected to the Lame mp3 implementation?

Steps to reproduce the behavior

Working example:

import pedalboard
import tensorflow as tf
import numpy as np
from pedalboard import MP3Compressor
import librosa
from librosa import display
import matplotlib.pyplot as plt

AUTOTUNE = tf.data.experimental.AUTOTUNE

sample_rate = 24000
audio_len = 24000 * 5

audio, sr = librosa.load(librosa.example('brahms'), sr=sample_rate)

def data_gen():
  yield audio[audio_len*3:audio_len*4]

def rand_mp3compression(signal, sample_rate):
  return MP3Compressor(np.random.uniform(1.0, 9.5)).process(signal, sample_rate)

dataset = tf.data.Dataset.from_generator(
    data_gen,
    output_signature=(
        tf.TensorSpec(shape=(int(audio_len),), dtype=tf.float32)
        )
    )
dataset = dataset.repeat()

dataset = dataset.map(
    lambda audio: (tf.numpy_function(rand_mp3compression, [audio, sample_rate], tf.float32), audio), num_parallel_calls=2
)

dataset = dataset.batch(32)

for elem in dataset.take(1):
  elem = elem[0][4]
  print(elem.shape)

y = elem.numpy()
D = librosa.stft(y)  
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

plt.figure(figsize=(10,10))
display.specshow(S_db)
plt.colorbar()
@psobot
Copy link
Member

psobot commented Jul 27, 2022

Thanks for the detailed repro @iCorv! This is really odd; MP3Compressor just links against LAME, and LAME itself "should" be thread-safe. I'll dig into this to figure out if we're unintentionally sharing global or class-level state somehow, or if we're using LAME in a way that removes its thread safety.

@psobot
Copy link
Member

psobot commented Jul 28, 2022

This has been fixed in v0.5.8, which should be on PyPI within the next couple of hours. (Root cause: LAME's MP3 encoding libraries are thread-safe, but its MP3 decoding interface is not.)

@iCorv
Copy link
Contributor Author

iCorv commented Jul 29, 2022

Great and fast fix, just tested it in a tf.data pipeline, and it works like a charm!

@iCorv
Copy link
Contributor Author

iCorv commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants