save_dataset_statistics broken with DDSP 3.2.0? #427

vvolhejn · 2022-03-07T14:16:00Z

Hi, I'm trying to run the code from the train_autoencoder notebook with a newer DDSP version (3.2.0). I made a little script that wraps save_dataset_statistics:

import argparse
import os

from ddsp.colab import colab_utils
import ddsp.training

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "train_tfrecord_filepattern", help="Glob of the .tfrecord files to analyze"
    )
    parser.add_argument("save_dir", help="In which directory to save the statistics")
    args = parser.parse_args()

    data_provider = ddsp.training.data.TFRecordProvider(args.train_tfrecord_filepattern)
    dataset = data_provider.get_dataset(shuffle=False)

    filename = "dataset_statistics.pkl"
    pickle_path = os.path.join(args.save_dir, filename)

    if os.path.exists(pickle_path):
        raise ValueError(f"The file {pickle_path} already exists.")

    _ = colab_utils.save_dataset_statistics(data_provider, pickle_path, batch_size=1)

This produces the following output:

Calculating dataset statistics for <ddsp.training.data.TFRecordProvider object at 0x2b3b88aecc70>
Traceback (most recent call last):
  File "[...]/get_dataset_statistics.py", line 24, in <module>
    _ = colab_utils.save_dataset_statistics(data_provider, pickle_path, batch_size=1)
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/colab/colab_utils.py", line 20
4, in save_dataset_statistics
    ds_stats = ddsp.training.postprocessing.compute_dataset_statistics(
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/training/postprocessing.py", l
ine 285, in compute_dataset_statistics
    spectral_ops.compute_power(batch['audio'],
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/spectral_ops.py", line 248, in
 compute_power
    rms_energy = compute_rms_energy(
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/spectral_ops.py", line 234, in
 compute_rms_energy
    audio = pad(audio, frame_size, hop_size, padding=padding)
  File "[...]/venv/lib64/python3.8/site-packages/ddsp/spectral_ops.py", line 201, in
 pad
    raise ValueError(f'During padding, frame_size ({frame_size})'
ValueError: During padding, frame_size (256) must be greater than hop_size (320).

I'm running on the same violin files mentioned in this paper.
My guess is that this is because padding was added after this notebook and the default arguments now break it somehow.

I would submit a pull request to fix this but I'm not too familiar with the codebase so I'm not sure which one to choose: disable padding in this case or change the arguments so that hop size becomes smaller?

The text was updated successfully, but these errors were encountered:

sharp-trickster · 2022-11-02T17:41:15Z

I have the same exact problem.
I´m trying to run it locally, and I honestly dont know how it still runs in collab but it crashes here, with the same values.
I tried increasing the frame size to 512 and 320 (just to see if it works) and then I get another error:
File "...\ddsp\training\postprocessing.py", line 330, in get_stats max_list.append(np.max(x_i[m])) IndexError: boolean index did not match indexed array along dimension 0; dimension is 181 but corresponding boolean dimension is 980

I´m not that thrilled to see that this happened to you in March and there´s still no replys too.

theloni-monk · 2022-11-29T22:58:36Z

I just tried the same thing locally and I'm getting the same error.

vvolhejn · 2022-12-03T11:43:31Z

@sharp-trickster @theloni-monk I'm no longer working on this, but if I recall correctly the issue is the the TFRecordProvider must be initialized with the exact same arguments that the dataset was created with in ddsp_prepare_tfrecord, otherwise you get mysterious errors like this. For example,

data_provider = ddsp.training.data.TFRecordProvider(
    args.train_tfrecord_filepattern,
    # Make sure these arguments match what the dataset was created with!
    # sample_rate=44100,
    frame_rate=50,
    centered=True,
    with_jukebox=False,
)

Look for ddsp_prepare_tfrecord.py to see what the defaults are.

See also the updated script I made for my thesis.

olaviinha · 2023-02-09T14:57:21Z

Anybody ever solved this?

vvolhejn · 2023-02-11T11:17:37Z

@olaviinha see my comment above, it's about the arguments to the TFRecordProvider not matching what the data was created with. The frame_rate and centered arguments must match, otherwise you get mysterious errors like this.

olaviinha · 2023-02-11T14:40:25Z

Granted, I am having this problem with version 3.5.0, thought perhaps issue might have been the same. I saw it, and tried it, but sadly to no avail. The default values of these parameters seem to match. I tried to match them manually in the notebook as well as changing them in both place to these values in your example, but still getting the same frame_size vs. hop_size error.

myersjm · 2024-02-12T04:50:37Z

I am experiencing the same issue. The arguments match, so I don't know what is wrong. My audio is 16 khz also. Did anyone figure out a solution to this?

JinFoolish · 2024-02-18T13:40:54Z

I am experiencing the same issue. The arguments match, so I don't know what is wrong. My audio is 16 khz also. Did anyone figure out a solution to this?

I have fixed it by coping codes out and modifying the parameters of compute_dataset_statistics in save_dataset_statistics.
replace the block by following:

from ddsp.colab import colab_utils
import ddsp.training
import pickle
from ddsp import spectral_ops
from ddsp.core import hz_to_midi
from ddsp.training.postprocessing import detect_notes, fit_quantile_transform
import numpy as np
import tensorflow.compat.v2 as tf
def save_dataset_statistics(data_provider,
                            file_path=None,
                            batch_size=1,
                            power_frame_size=256,):
  print('here')
  ds_stats = compute_dataset_statistics(
      data_provider, batch_size, power_frame_size, 250)
  # 250 is frame_rate which should equal to the param of ddsp_prepare_tfrecord

  # Save.
  if file_path is not None:
    with tf.io.gfile.GFile(file_path, 'wb') as f:
      pickle.dump(ds_stats, f)
    print(f'Done! Saved dataset statistics to: {file_path}')

  return ds_stats

def compute_dataset_statistics(data_provider,
                               batch_size=1,
                               power_frame_size=1024,
                               power_frame_rate=50):
  print('Calculating dataset statistics for', data_provider)
  ds = data_provider.get_batch(batch_size, repeats=1)

  # Unpack dataset.
  i = 0
  loudness = []
  power = []
  f0 = []
  f0_conf = []
  audio = []

  batch = next(iter(ds))
  audio_key = 'audio_16k' if 'audio_16k' in batch.keys() else 'audio'

  for batch in iter(ds):
    loudness.append(batch['loudness_db'])
    power.append(
        spectral_ops.compute_power(batch[audio_key],
                                   frame_size=power_frame_size,
                                   frame_rate=power_frame_rate))
    f0.append(batch['f0_hz'])
    f0_conf.append(batch['f0_confidence'])
    audio.append(batch[audio_key])
    i += 1

  print(f'Computing statistics for {i * batch_size} examples.')

  loudness = np.vstack(loudness)
  power = np.vstack(power)
  f0 = np.vstack(f0)
  f0_conf = np.vstack(f0_conf)
  audio = np.vstack(audio)

  # Fit the transform.
  trim_end = 20
  f0_trimmed = f0[:, :-trim_end]
  pitch_trimmed = hz_to_midi(f0_trimmed)
  power_trimmed = power[:, :-trim_end]
  loudness_trimmed = loudness[:, :-trim_end]
  f0_conf_trimmed = f0_conf[:, :-trim_end]

  # Detect notes.
  mask_on, _ = detect_notes(loudness_trimmed, f0_conf_trimmed)

  # If no notes detected, just default to using full signal.
  mask_on = np.logical_or(
      mask_on, np.logical_not(np.any(mask_on, axis=1, keepdims=True)))

  quantile_transform = fit_quantile_transform(loudness_trimmed, mask_on)

  # Pitch statistics.
  def get_stats(x, prefix='x', note_mask=None):
    if note_mask is None:
      mean_max = np.mean(np.max(x, axis=-1))
      mean_min = np.mean(np.min(x, axis=-1))
    else:
      max_list = []
      for x_i, m in zip(x, note_mask):
        if np.sum(m) > 0:
          max_list.append(np.max(x_i[m]))
      mean_max = np.mean(max_list)

      min_list = []
      for x_i, m in zip(x, note_mask):
        if np.sum(m) > 0:
          min_list.append(np.min(x_i[m]))
      mean_min = np.mean(min_list)

      x = x[note_mask]

    return {
        f'mean_{prefix}': np.mean(x),
        f'max_{prefix}': np.max(x),
        f'min_{prefix}': np.min(x),
        f'mean_max_{prefix}': mean_max,
        f'mean_min_{prefix}': mean_min,
        f'std_{prefix}': np.std(x)
    }

  ds_stats = {}
  ds_stats.update(get_stats(pitch_trimmed, 'pitch'))
  ds_stats.update(get_stats(power_trimmed, 'power'))
  ds_stats.update(get_stats(loudness_trimmed, 'loudness'))
  # power_trimmed shape is (n, 981) and mask_on shape is (n, 980)
  power_trimmed = power_trimmed[:,:-1]
  ds_stats.update(get_stats(pitch_trimmed, 'pitch_note', mask_on))
  ds_stats.update(get_stats(power_trimmed, 'power_note', mask_on))
  ds_stats.update(get_stats(loudness_trimmed, 'loudness_note', mask_on))

  ds_stats['quantile_transform'] = quantile_transform
  return ds_stats

data_provider = ddsp.training.data.TFRecordProvider(TRAIN_TFRECORD_FILEPATTERN)
dataset = data_provider.get_dataset(shuffle=False)
PICKLE_FILE_PATH = os.path.join(SAVE_DIR, 'dataset_statistics.pkl')

_ = save_dataset_statistics(data_provider, PICKLE_FILE_PATH, batch_size=1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

save_dataset_statistics broken with DDSP 3.2.0? #427

save_dataset_statistics broken with DDSP 3.2.0? #427

vvolhejn commented Mar 7, 2022 •

edited

Loading

sharp-trickster commented Nov 2, 2022

theloni-monk commented Nov 29, 2022

vvolhejn commented Dec 3, 2022

olaviinha commented Feb 9, 2023

vvolhejn commented Feb 11, 2023

olaviinha commented Feb 11, 2023 •

edited

Loading

myersjm commented Feb 12, 2024

JinFoolish commented Feb 18, 2024 •

edited

Loading

save_dataset_statistics broken with DDSP 3.2.0? #427

save_dataset_statistics broken with DDSP 3.2.0? #427

Comments

vvolhejn commented Mar 7, 2022 • edited Loading

sharp-trickster commented Nov 2, 2022

theloni-monk commented Nov 29, 2022

vvolhejn commented Dec 3, 2022

olaviinha commented Feb 9, 2023

vvolhejn commented Feb 11, 2023

olaviinha commented Feb 11, 2023 • edited Loading

myersjm commented Feb 12, 2024

JinFoolish commented Feb 18, 2024 • edited Loading

vvolhejn commented Mar 7, 2022 •

edited

Loading

olaviinha commented Feb 11, 2023 •

edited

Loading

JinFoolish commented Feb 18, 2024 •

edited

Loading