Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error When variation_degree_schedule Value Exceeds 10 #4

Closed
RoufaidaLaidi opened this issue Feb 12, 2024 · 1 comment
Closed

Error When variation_degree_schedule Value Exceeds 10 #4

RoufaidaLaidi opened this issue Feb 12, 2024 · 1 comment

Comments

@RoufaidaLaidi
Copy link

Context

  • Dataset: Brain Tumor MRI Dataset with 4 classes, 5713 training samples, and 1312 testing samples. Images are labeled as "label_objectNumber" and converted to RGB. More details can be found here.
  • Environment: PyTorch 1.12.0, CUDA 11.7.0
  • Script Parameters:
    • Feature Extractor: inception_v3
    • FID Model Name: inception_v3
    • Dataset Name for FID: brain
    • Image Size: 64x64
    • Batch Size: 500
    • Variation Degree Schedule: 0 to 42 in steps of 2, with an error occurring for values > 10

Issue Description

The script runs successfully for many iterations when the variation_degree_schedule parameter values are below 10. However, exceeding this value results in the following error during the image variation phase:
"Traceback (most recent call last):
File "/cluster/home/laidir/DPSDA/main.py", line 468, in
main()
File "/cluster/home/laidir/DPSDA/main.py", line 361, in main
packed_samples = api.image_variation(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 255, in image_variation
sub_variations = self._image_variation(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 268, in _image_variation
samples, _ = sample(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 307, in sample
sample = sampler(
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 354, in forward
sample = sample_fn(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion/gaussian_diffusion.py", line 223, in ddim_sample_loop
for sample in self.ddim_sample_loop_progressive(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion/gaussian_diffusion.py", line 269, in ddim_sample_loop_progressive
t_batch = th.tensor([indices[0]] * img.shape[0], device=device)
IndexError: list index out of range
"

This error appears to originate from an IndexError in the ddim_sample_loop within the improved diffusion API, specifically when attempting to index a list beyond its range.

Steps to Reproduce

  1. Run the provided script with the variation_degree_schedule parameter set to include values greater than 10.
  2. Observe the IndexError as described above during the image variation phase.

Additional Information

@fjxmlzn
Copy link
Contributor

fjxmlzn commented Feb 12, 2024

Thank you for your message. The reason of the error is as follows.

In your script. timestep_respacing is set to ddim10. This means that there is only 10 steps in the diffusion sample generation process.

Let's say the i-th number in variation_degree_schedule is a. It means that at i-th iteration of PE, the generation will only use the last 10-a diffusion sampling steps. Therefore, all the numbers in variation_degree_schedule should be <=10.

If you want to use the variation_degree_schedule you set there, you can consider changing timestep_respacing to a larger number, such as 100 and ddim100. (For the difference between {x} and ddim{x}, please refer to this repo https://github.com/openai/improved-diffusion?tab=readme-ov-file#sampling.)

Feel free to reopen the issue if you have further questions.

@fjxmlzn fjxmlzn closed this as completed Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants