Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in wave normalization. #49

Closed
sweetcocoa opened this issue Jan 16, 2020 · 7 comments
Closed

Bug in wave normalization. #49

sweetcocoa opened this issue Jan 16, 2020 · 7 comments

Comments

@sweetcocoa
Copy link

sweetcocoa commented Jan 16, 2020

Hi, thank you for sharing this awesome work.

I've encountered this code below, and I dont' think this is correct.

crepe/core.py

    frames = as_strided(audio, shape=(1024, n_frames),
                        strides=(audio.itemsize, hop_length * audio.itemsize))
    frames = frames.transpose()

    # normalize each frame -- this is expected by the model
    frames -= np.mean(frames, axis=1)[:, np.newaxis]
    frames /= np.std(frames, axis=1)[:, np.newaxis]

This code is meant to normalize wave by each frame, but numpy's as_strided function does not allocate new memory, but does just create a new view of the array. (for example, the memory of first frame and the memory of second frame is not separated.)

See numpy documentation for more details.

So when do "in-place" calculation on that array, unintented result may raise. Here is an example.

np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )

# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
                    strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose()

# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]

print(frames.sum())
146.7856165123937

Since frame is normalized, its sum should be about zero. but it's not.

To avoid this behavior, target array should be copied or something, to do operation correctly as intended. here is an example. only one line is changed. (.copy())

np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )

# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
                    strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose().copy()

# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]

print(frames.sum())
-2.384759056894836e-13
  • This code is tested with numpy version 1.17.4
@jongwook
Copy link
Member

Thank you for the detailed report! This is a great catch, and it's striking that the code works regardless.

I'll push the fix shortly.

@angusturner
Copy link

angusturner commented Feb 20, 2020

Just a quick question, was the model trained on audio normalized the old / incorrect way?

I am using a PyTorch port of this repo with your weights, and am wondering whether I should change or update the normalization code? Or if the model requires re-training with the fixed normalization?

@jongwook
Copy link
Member

The pre-processing code I used for training is different from this, which did not have the normalization bug in this issue. So it shouldn't require retraining and you can use the same pre-trained model, unless you want to fine-tune the model with some other dataset.

@adarob
Copy link
Contributor

adarob commented Mar 25, 2020

@jongwook will you be releasing an updated pip package with this fix?

@justinsalamon
Copy link
Collaborator

Yes :) @jongwook let me know if you're slammed right now and I can make a release. Cheers!

@jongwook
Copy link
Member

jongwook commented Mar 26, 2020

@adarob @justinsalamon Thanks for the heads up! Just pushed v0.0.11 :)

Btw you can use:

git+https://github.com/marl/crepe.git

in requirements.txt to pull the latest commit from Github, or:

git+https://github.com/marl/crepe.git@7f0bf0593847f38df47d2760934d585ad2f618c0

for a specific commit.

@HudsonHuang
Copy link

Just a quick question, was the model trained on audio normalized the old / incorrect way?

I am using a PyTorch port of this repo with your weights, and am wondering whether I should change or update the normalization code? Or if the model requires re-training with the fixed normalization?

@angusturner
Hi, I‘m looking for the pytorch version of CREPE, but found nothing. Does the "PyTorch port" you mentioned is public available? or would you mind about releasing it? Thank you very much in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants