Error When `variation_degree_schedule` Value Exceeds 10 #4

RoufaidaLaidi · 2024-02-12T13:35:01Z

Context

Dataset: Brain Tumor MRI Dataset with 4 classes, 5713 training samples, and 1312 testing samples. Images are labeled as "label_objectNumber" and converted to RGB. More details can be found here.
Environment: PyTorch 1.12.0, CUDA 11.7.0
Script Parameters:
- Feature Extractor: inception_v3
- FID Model Name: inception_v3
- Dataset Name for FID: brain
- Image Size: 64x64
- Batch Size: 500
- Variation Degree Schedule: 0 to 42 in steps of 2, with an error occurring for values > 10

Issue Description

The script runs successfully for many iterations when the variation_degree_schedule parameter values are below 10. However, exceeding this value results in the following error during the image variation phase:
"Traceback (most recent call last):
File "/cluster/home/laidir/DPSDA/main.py", line 468, in
main()
File "/cluster/home/laidir/DPSDA/main.py", line 361, in main
packed_samples = api.image_variation(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 255, in image_variation
sub_variations = self._image_variation(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 268, in _image_variation
samples, _ = sample(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 307, in sample
sample = sampler(
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 354, in forward
sample = sample_fn(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion/gaussian_diffusion.py", line 223, in ddim_sample_loop
for sample in self.ddim_sample_loop_progressive(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion/gaussian_diffusion.py", line 269, in ddim_sample_loop_progressive
t_batch = th.tensor([indices[0]] * img.shape[0], device=device)
IndexError: list index out of range
"

This error appears to originate from an IndexError in the ddim_sample_loop within the improved diffusion API, specifically when attempting to index a list beyond its range.

Steps to Reproduce

Run the provided script with the variation_degree_schedule parameter set to include values greater than 10.
Observe the IndexError as described above during the image variation phase.

Additional Information

The issue occurs specifically when the variation_degree_schedule parameter exceeds 10.
Attached is the script used to reproduce this error.
main_improved_diffusion_brainTumor_conditional.txt

The text was updated successfully, but these errors were encountered:

fjxmlzn · 2024-02-12T16:03:44Z

Thank you for your message. The reason of the error is as follows.

In your script. timestep_respacing is set to ddim10. This means that there is only 10 steps in the diffusion sample generation process.

Let's say the i-th number in variation_degree_schedule is a. It means that at i-th iteration of PE, the generation will only use the last 10-a diffusion sampling steps. Therefore, all the numbers in variation_degree_schedule should be <=10.

If you want to use the variation_degree_schedule you set there, you can consider changing timestep_respacing to a larger number, such as 100 and ddim100. (For the difference between {x} and ddim{x}, please refer to this repo https://github.com/openai/improved-diffusion?tab=readme-ov-file#sampling.)

Feel free to reopen the issue if you have further questions.

fjxmlzn closed this as completed Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error When `variation_degree_schedule` Value Exceeds 10 #4

Error When `variation_degree_schedule` Value Exceeds 10 #4

RoufaidaLaidi commented Feb 12, 2024

fjxmlzn commented Feb 12, 2024

Error When variation_degree_schedule Value Exceeds 10 #4

Error When variation_degree_schedule Value Exceeds 10 #4

Comments

RoufaidaLaidi commented Feb 12, 2024

Context

Issue Description

Steps to Reproduce

Additional Information

fjxmlzn commented Feb 12, 2024

Error When `variation_degree_schedule` Value Exceeds 10 #4

Error When `variation_degree_schedule` Value Exceeds 10 #4