Skip to content

Conversation

@Bhavay-2001
Copy link
Contributor

What does this PR do?

Reduces the model sizes in the Dance Diffusion tests.

Fixes #7677

Before submitting

Who can review?

Tagging: @sayakpaul

@Bhavay-2001
Copy link
Contributor Author

Hi @sayakpaul, can you pls review it.
Thanks


assert audio.shape == (1, 2, components["unet"].sample_size)
expected_slice = np.array([-0.7265, 1.0000, -0.8388, 0.1175, 0.9498, -1.0000])
print(", ".join([str(round(x, 4)) for x in audio_slice.flatten().tolist()]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sorry. Corrected

@Bhavay-2001
Copy link
Contributor Author

Also, I am trying to alter the block_out_channels and extra_in_channels but facing some errors with the shape. Can you pls let me know how to correct that?

@Bhavay-2001
Copy link
Contributor Author

Hi @ariG23498, I am working on this test file. In this, when I change the block_out_channels and extra_in_channels parameters, I am stuck with errors related to shape. Soo, can you pls let me know how did you alter these parameters?

@ariG23498
Copy link
Contributor

@Bhavay-2001 you would also need to update norm_num_groups parameter while changing the block_out_channels. I am looking at something like this:

        unet = UNet1DModel(
            block_out_channels=(8, 8, 16),
            norm_num_groups=8,
            extra_in_channels=16,
            sample_size=8,
            sample_rate=16_000,
            in_channels=2,
            out_channels=2,
            flip_sin_to_cos=True,
            use_timestep_embedding=False,
            time_embedding_type="fourier",
            mid_block_type="UNetMidBlock1D",
            down_block_types=("DownBlock1DNoSkip", "DownBlock1D", "AttnDownBlock1D"),
            up_block_types=("AttnUpBlock1D", "UpBlock1D", "UpBlock1DNoSkip"),
        )

Does this solve the issue?

@Bhavay-2001
Copy link
Contributor Author

I tried this but it gives error related to shape RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96. I think the maintainers can clarify on this more.

@Bhavay-2001
Copy link
Contributor Author

Hi @sayakpaul, any suggestions on how to alter the block_out_channels and extra_in_channels parameters.

@sayakpaul
Copy link
Member

You will need to investigate the error a bit more deeply here. More specifically, which component leads to:

RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96

@Bhavay-2001
Copy link
Contributor Author

I tried to look but just an overview and it was somewhere in the model implementation part. Soo do we need to change that too if needed or leave it?

@Bhavay-2001
Copy link
Contributor Author

Hi @ariG23498, how did you find the relation between block_out_channels and norm_num_groups channel.

@ariG23498
Copy link
Contributor

Hi @ariG23498, how did you find the relation between block_out_channels and norm_num_groups channel.

Mostly by reading the code and the error messages.

@ariG23498
Copy link
Contributor

I tried this but it gives error related to shape RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96. I think the maintainers can clarify on this more.

Interesting!

Using the code quoted in this comment, I don't seem to have any failing test on my local system.

@Bhavay-2001
Copy link
Contributor Author

The batch_size of 8 is failing in my case. Apart from that, I am not able to decrease it further.

@Bhavay-2001
Copy link
Contributor Author

Hi @sayakpaul, can you please check this?

@Bhavay-2001
Copy link
Contributor Author

Bhavay-2001 commented May 9, 2024

I tried this but it gives error related to shape RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96. I think the maintainers can clarify on this more.

Interesting!

Using the code quoted in this comment, I don't seem to have any failing test on my local system.

Hi @ariG23498, can you pls send your complete test_dance_diffusion.py file? I think I have changed any variable or something.

@Bhavay-2001
Copy link
Contributor Author

I tried this but it gives error related to shape RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96. I think the maintainers can clarify on this more.

Interesting!
Using the code quoted in this comment, I don't seem to have any failing test on my local system.

Hi @ariG23498, can you pls send your complete test_dance_diffusion.py file? I think I have changed any variable or something.

Hi @ariG23498, can you please send this? Thanks

@ariG23498
Copy link
Contributor

This is the entire script.

# coding=utf-8
# Copyright 2024 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import gc
import unittest

import numpy as np
import torch

from diffusers import DanceDiffusionPipeline, IPNDMScheduler, UNet1DModel
from diffusers.utils.testing_utils import enable_full_determinism, nightly, require_torch_gpu, skip_mps, torch_device

from ..pipeline_params import UNCONDITIONAL_AUDIO_GENERATION_BATCH_PARAMS, UNCONDITIONAL_AUDIO_GENERATION_PARAMS
from ..test_pipelines_common import PipelineTesterMixin


enable_full_determinism()


class DanceDiffusionPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
    pipeline_class = DanceDiffusionPipeline
    params = UNCONDITIONAL_AUDIO_GENERATION_PARAMS
    required_optional_params = PipelineTesterMixin.required_optional_params - {
        "callback",
        "latents",
        "callback_steps",
        "output_type",
        "num_images_per_prompt",
    }
    batch_params = UNCONDITIONAL_AUDIO_GENERATION_BATCH_PARAMS
    test_attention_slicing = False

    def get_dummy_components(self):
        torch.manual_seed(0)
        unet = UNet1DModel(
            block_out_channels=(8, 8, 16),
            norm_num_groups=8,
            extra_in_channels=16,
            sample_size=8,
            sample_rate=16_000,
            in_channels=2,
            out_channels=2,
            flip_sin_to_cos=True,
            use_timestep_embedding=False,
            time_embedding_type="fourier",
            mid_block_type="UNetMidBlock1D",
            down_block_types=("DownBlock1DNoSkip", "DownBlock1D", "AttnDownBlock1D"),
            up_block_types=("AttnUpBlock1D", "UpBlock1D", "UpBlock1DNoSkip"),
        )
        scheduler = IPNDMScheduler()

        components = {
            "unet": unet,
            "scheduler": scheduler,
        }
        return components

    def get_dummy_inputs(self, device, seed=0):
        if str(device).startswith("mps"):
            generator = torch.manual_seed(seed)
        else:
            generator = torch.Generator(device=device).manual_seed(seed)
        inputs = {
            "batch_size": 1,
            "generator": generator,
            "num_inference_steps": 4,
        }
        return inputs

    def test_dance_diffusion(self):
        device = "cpu"  # ensure determinism for the device-dependent torch.Generator
        components = self.get_dummy_components()
        pipe = DanceDiffusionPipeline(**components)
        pipe = pipe.to(device)
        pipe.set_progress_bar_config(disable=None)

        inputs = self.get_dummy_inputs(device)
        output = pipe(**inputs)
        audio = output.audios

        audio_slice = audio[0, -3:, -3:]

        assert audio.shape == (1, 2, components["unet"].sample_size)
        expected_slice = np.array([-0.7265, 1.0000, -0.8388, 0.1175, 0.9498, -1.0000])
        assert np.abs(audio_slice.flatten() - expected_slice).max() < 1e-2

    @skip_mps
    def test_save_load_local(self):
        return super().test_save_load_local()

    @skip_mps
    def test_dict_tuple_outputs_equivalent(self):
        return super().test_dict_tuple_outputs_equivalent(expected_max_difference=3e-3)

    @skip_mps
    def test_save_load_optional_components(self):
        return super().test_save_load_optional_components()

    @skip_mps
    def test_attention_slicing_forward_pass(self):
        return super().test_attention_slicing_forward_pass()

    def test_inference_batch_single_identical(self):
        super().test_inference_batch_single_identical(expected_max_diff=3e-3)


@nightly
@require_torch_gpu
class PipelineIntegrationTests(unittest.TestCase):
    def setUp(self):
        # clean up the VRAM before each test
        super().setUp()
        gc.collect()
        torch.cuda.empty_cache()

    def tearDown(self):
        # clean up the VRAM after each test
        super().tearDown()
        gc.collect()
        torch.cuda.empty_cache()

    def test_dance_diffusion(self):
        device = torch_device

        pipe = DanceDiffusionPipeline.from_pretrained("harmonai/maestro-150k")
        pipe = pipe.to(device)
        pipe.set_progress_bar_config(disable=None)

        generator = torch.manual_seed(0)
        output = pipe(generator=generator, num_inference_steps=100, audio_length_in_s=4.096)
        audio = output.audios

        audio_slice = audio[0, -3:, -3:]

        assert audio.shape == (1, 2, pipe.unet.config.sample_size)
        expected_slice = np.array([-0.0192, -0.0231, -0.0318, -0.0059, 0.0002, -0.0020])

        assert np.abs(audio_slice.flatten() - expected_slice).max() < 1e-2

    def test_dance_diffusion_fp16(self):
        device = torch_device

        pipe = DanceDiffusionPipeline.from_pretrained("harmonai/maestro-150k", torch_dtype=torch.float16)
        pipe = pipe.to(device)
        pipe.set_progress_bar_config(disable=None)

        generator = torch.manual_seed(0)
        output = pipe(generator=generator, num_inference_steps=100, audio_length_in_s=4.096)
        audio = output.audios

        audio_slice = audio[0, -3:, -3:]

        assert audio.shape == (1, 2, pipe.unet.config.sample_size)
        expected_slice = np.array([-0.0367, -0.0488, -0.0771, -0.0525, -0.0444, -0.0341])

        assert np.abs(audio_slice.flatten() - expected_slice).max() < 1e-2

As you can see I have only changed the Unet model as already mentioned in this comment.

@ariG23498
Copy link
Contributor

Also note -- I have not changed the asserts (so please take care of them)

By running the tests on this file -- I do not get the reshape error as mentioned by you.

@Bhavay-2001
Copy link
Contributor Author

Bhavay-2001 commented May 14, 2024

Hi, using your code too, I am still facing some issues with sample_size and shape. Would you like to work on this or maybe help me out here?

@Bhavay-2001
Copy link
Contributor Author

Hi @ariG23498, would you like to work on this? I am not able to figure out the error.

@sayakpaul
Copy link
Member

Hi @Bhavay-2001, I would like to request you to stop pinging the authors multiple times who have already helped you significantly. If they haven't replied in seven let's just assume they are busy and don't have the bandwidth to look into this further.

With that, I encourage you to look into the errors a bit more deeply and try to figure out the location of the error and take appropriate steps to resolve them.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Tests] help us speed up the fast pipeline tests

3 participants