Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save_dataset_statistics broken with DDSP 3.2.0? #427

Open
vvolhejn opened this issue Mar 7, 2022 · 8 comments
Open

save_dataset_statistics broken with DDSP 3.2.0? #427

vvolhejn opened this issue Mar 7, 2022 · 8 comments

Comments

@vvolhejn
Copy link

vvolhejn commented Mar 7, 2022

Hi, I'm trying to run the code from the train_autoencoder notebook with a newer DDSP version (3.2.0). I made a little script that wraps save_dataset_statistics:

import argparse
import os

from ddsp.colab import colab_utils
import ddsp.training

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "train_tfrecord_filepattern", help="Glob of the .tfrecord files to analyze"
    )
    parser.add_argument("save_dir", help="In which directory to save the statistics")
    args = parser.parse_args()

    data_provider = ddsp.training.data.TFRecordProvider(args.train_tfrecord_filepattern)
    dataset = data_provider.get_dataset(shuffle=False)

    filename = "dataset_statistics.pkl"
    pickle_path = os.path.join(args.save_dir, filename)

    if os.path.exists(pickle_path):
        raise ValueError(f"The file {pickle_path} already exists.")

    _ = colab_utils.save_dataset_statistics(data_provider, pickle_path, batch_size=1)

This produces the following output:

Calculating dataset statistics for <ddsp.training.data.TFRecordProvider object at 0x2b3b88aecc70>
Traceback (most recent call last):
  File "[...]/get_dataset_statistics.py", line 24, in <module>
    _ = colab_utils.save_dataset_statistics(data_provider, pickle_path, batch_size=1)
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/colab/colab_utils.py", line 20
4, in save_dataset_statistics
    ds_stats = ddsp.training.postprocessing.compute_dataset_statistics(
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/training/postprocessing.py", l
ine 285, in compute_dataset_statistics
    spectral_ops.compute_power(batch['audio'],
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/spectral_ops.py", line 248, in
 compute_power
    rms_energy = compute_rms_energy(
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/spectral_ops.py", line 234, in
 compute_rms_energy
    audio = pad(audio, frame_size, hop_size, padding=padding)
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/spectral_ops.py", line 201, in
 pad
    raise ValueError(f'During padding, frame_size ({frame_size})'
ValueError: During padding, frame_size (256) must be greater than hop_size (320).

I'm running on the same violin files mentioned in this paper.
My guess is that this is because padding was added after this notebook and the default arguments now break it somehow.

I would submit a pull request to fix this but I'm not too familiar with the codebase so I'm not sure which one to choose: disable padding in this case or change the arguments so that hop size becomes smaller?

@sharp-trickster
Copy link

I have the same exact problem.
I´m trying to run it locally, and I honestly dont know how it still runs in collab but it crashes here, with the same values.
I tried increasing the frame size to 512 and 320 (just to see if it works) and then I get another error:
File "...\ddsp\training\postprocessing.py", line 330, in get_stats max_list.append(np.max(x_i[m])) IndexError: boolean index did not match indexed array along dimension 0; dimension is 181 but corresponding boolean dimension is 980

I´m not that thrilled to see that this happened to you in March and there´s still no replys too.

@theloni-monk
Copy link

I just tried the same thing locally and I'm getting the same error.

@vvolhejn
Copy link
Author

vvolhejn commented Dec 3, 2022

@sharp-trickster @theloni-monk I'm no longer working on this, but if I recall correctly the issue is the the TFRecordProvider must be initialized with the exact same arguments that the dataset was created with in ddsp_prepare_tfrecord, otherwise you get mysterious errors like this. For example,

data_provider = ddsp.training.data.TFRecordProvider(
    args.train_tfrecord_filepattern,
    # Make sure these arguments match what the dataset was created with!
    # sample_rate=44100,
    frame_rate=50,
    centered=True,
    with_jukebox=False,
)

Look for ddsp_prepare_tfrecord.py to see what the defaults are.

See also the updated script I made for my thesis.

@olaviinha
Copy link

Anybody ever solved this?

@vvolhejn
Copy link
Author

@olaviinha see my comment above, it's about the arguments to the TFRecordProvider not matching what the data was created with. The frame_rate and centered arguments must match, otherwise you get mysterious errors like this.

@olaviinha
Copy link

olaviinha commented Feb 11, 2023

Granted, I am having this problem with version 3.5.0, thought perhaps issue might have been the same. I saw it, and tried it, but sadly to no avail. The default values of these parameters seem to match. I tried to match them manually in the notebook as well as changing them in both place to these values in your example, but still getting the same frame_size vs. hop_size error.

@myersjm
Copy link

myersjm commented Feb 12, 2024

I am experiencing the same issue. The arguments match, so I don't know what is wrong. My audio is 16 khz also. Did anyone figure out a solution to this?

@JinFoolish
Copy link

JinFoolish commented Feb 18, 2024

I am experiencing the same issue. The arguments match, so I don't know what is wrong. My audio is 16 khz also. Did anyone figure out a solution to this?

I have fixed it by coping codes out and modifying the parameters of compute_dataset_statistics in save_dataset_statistics.
replace the block by following:

from ddsp.colab import colab_utils
import ddsp.training
import pickle
from ddsp import spectral_ops
from ddsp.core import hz_to_midi
from ddsp.training.postprocessing import detect_notes, fit_quantile_transform
import numpy as np
import tensorflow.compat.v2 as tf
def save_dataset_statistics(data_provider,
                            file_path=None,
                            batch_size=1,
                            power_frame_size=256,):
  print('here')
  ds_stats = compute_dataset_statistics(
      data_provider, batch_size, power_frame_size, 250)
  # 250 is frame_rate which should equal to the param of ddsp_prepare_tfrecord

  # Save.
  if file_path is not None:
    with tf.io.gfile.GFile(file_path, 'wb') as f:
      pickle.dump(ds_stats, f)
    print(f'Done! Saved dataset statistics to: {file_path}')

  return ds_stats

def compute_dataset_statistics(data_provider,
                               batch_size=1,
                               power_frame_size=1024,
                               power_frame_rate=50):
  print('Calculating dataset statistics for', data_provider)
  ds = data_provider.get_batch(batch_size, repeats=1)

  # Unpack dataset.
  i = 0
  loudness = []
  power = []
  f0 = []
  f0_conf = []
  audio = []

  batch = next(iter(ds))
  audio_key = 'audio_16k' if 'audio_16k' in batch.keys() else 'audio'

  for batch in iter(ds):
    loudness.append(batch['loudness_db'])
    power.append(
        spectral_ops.compute_power(batch[audio_key],
                                   frame_size=power_frame_size,
                                   frame_rate=power_frame_rate))
    f0.append(batch['f0_hz'])
    f0_conf.append(batch['f0_confidence'])
    audio.append(batch[audio_key])
    i += 1

  print(f'Computing statistics for {i * batch_size} examples.')

  loudness = np.vstack(loudness)
  power = np.vstack(power)
  f0 = np.vstack(f0)
  f0_conf = np.vstack(f0_conf)
  audio = np.vstack(audio)

  # Fit the transform.
  trim_end = 20
  f0_trimmed = f0[:, :-trim_end]
  pitch_trimmed = hz_to_midi(f0_trimmed)
  power_trimmed = power[:, :-trim_end]
  loudness_trimmed = loudness[:, :-trim_end]
  f0_conf_trimmed = f0_conf[:, :-trim_end]

  # Detect notes.
  mask_on, _ = detect_notes(loudness_trimmed, f0_conf_trimmed)

  # If no notes detected, just default to using full signal.
  mask_on = np.logical_or(
      mask_on, np.logical_not(np.any(mask_on, axis=1, keepdims=True)))

  quantile_transform = fit_quantile_transform(loudness_trimmed, mask_on)

  # Pitch statistics.
  def get_stats(x, prefix='x', note_mask=None):
    if note_mask is None:
      mean_max = np.mean(np.max(x, axis=-1))
      mean_min = np.mean(np.min(x, axis=-1))
    else:
      max_list = []
      for x_i, m in zip(x, note_mask):
        if np.sum(m) > 0:
          max_list.append(np.max(x_i[m]))
      mean_max = np.mean(max_list)

      min_list = []
      for x_i, m in zip(x, note_mask):
        if np.sum(m) > 0:
          min_list.append(np.min(x_i[m]))
      mean_min = np.mean(min_list)

      x = x[note_mask]

    return {
        f'mean_{prefix}': np.mean(x),
        f'max_{prefix}': np.max(x),
        f'min_{prefix}': np.min(x),
        f'mean_max_{prefix}': mean_max,
        f'mean_min_{prefix}': mean_min,
        f'std_{prefix}': np.std(x)
    }

  ds_stats = {}
  ds_stats.update(get_stats(pitch_trimmed, 'pitch'))
  ds_stats.update(get_stats(power_trimmed, 'power'))
  ds_stats.update(get_stats(loudness_trimmed, 'loudness'))
  # power_trimmed shape is (n, 981) and mask_on shape is (n, 980)
  power_trimmed = power_trimmed[:,:-1]
  ds_stats.update(get_stats(pitch_trimmed, 'pitch_note', mask_on))
  ds_stats.update(get_stats(power_trimmed, 'power_note', mask_on))
  ds_stats.update(get_stats(loudness_trimmed, 'loudness_note', mask_on))

  ds_stats['quantile_transform'] = quantile_transform
  return ds_stats

data_provider = ddsp.training.data.TFRecordProvider(TRAIN_TFRECORD_FILEPATTERN)
dataset = data_provider.get_dataset(shuffle=False)
PICKLE_FILE_PATH = os.path.join(SAVE_DIR, 'dataset_statistics.pkl')

_ = save_dataset_statistics(data_provider, PICKLE_FILE_PATH, batch_size=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants