Skip to content

Conversation

tolgacangoz
Copy link
Contributor

@tolgacangoz tolgacangoz commented Jun 5, 2024

This pull request refactors the code to remove unnecessary calls to to(torch_device) and to("cuda"). These calls were redundant and consumed more memory unnecessarily, and can be safely removed without affecting the code's functionality.

There are also comparisons between output_without_offload and output_with_offload in the test files. I tried with SD-1.5-fp16 in Colab. After two forward passes (w and w/o offloading), the occupied system RAM is ~5.1 GB. But, if I initialize the pipeline again before pipeline.enable_sequential_cpu_offload(), the occupied system RAM is ~2.4 GB. 1-1.5 GB RAM is already occupied by the system initially. This difference is ~0.5 GB for pipeline.enable_model_cpu_offload(). And I couldn't see a difference on GPU vRAM much. The time cost for adding a second initialization was almost zero. What should be done for these places:

def test_sequential_cpu_offload_forward_pass(self, expected_max_diff=1e-4):
import accelerate
components = self.get_dummy_components()
pipe = self.pipeline_class(**components)
for component in pipe.components.values():
if hasattr(component, "set_default_attn_processor"):
component.set_default_attn_processor()
pipe.to(torch_device)
pipe.set_progress_bar_config(disable=None)
generator_device = "cpu"
inputs = self.get_dummy_inputs(generator_device)
output_without_offload = pipe(**inputs)[0]
pipe.enable_sequential_cpu_offload()
assert pipe._execution_device.type == "cuda"
inputs = self.get_dummy_inputs(generator_device)
output_with_offload = pipe(**inputs)[0]
max_diff = np.abs(to_np(output_with_offload) - to_np(output_without_offload)).max()
self.assertLess(max_diff, expected_max_diff, "CPU offloading should not affect the inference results")

@sayakpaul @yiyixuxu @DN6

@tolgacangoz tolgacangoz closed this Jun 5, 2024
@tolgacangoz tolgacangoz reopened this Jun 5, 2024
@tolgacangoz tolgacangoz marked this pull request as draft June 5, 2024 15:25
@tolgacangoz tolgacangoz marked this pull request as ready for review June 6, 2024 09:58
@tolgacangoz tolgacangoz changed the title Fix CPU-Offloading Usage Optimize test files by fixing CPU-offloading usage Jun 6, 2024
@sayakpaul sayakpaul requested a review from DN6 June 6, 2024 11:57
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh thanks!

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Jun 6, 2024

@tolgacangoz
not sure I understand what you meant here - maybe a PR or a testing script?

There are also comparisons between output_without_offload and output_with_offload in the test files. I tried with SD-1.5-fp16 in Colab. After two forward passes (w and w/o offloading), the occupied system RAM is ~5.1 GB. But, if I initialize the pipeline again before pipeline.enable_sequential_cpu_offload(), the occupied system RAM is ~2.4 GB. 1-1.5 GB RAM is already occupied by the system initially. This difference is ~0.5 GB for pipeline.enable_model_cpu_offload(). And I couldn't see a difference on GPU vRAM much. The time cost for adding a second initialization was almost zero. What should be done for these places:

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yiyixuxu yiyixuxu merged commit ec1aded into huggingface:main Jun 6, 2024
@tolgacangoz tolgacangoz deleted the fix-offloading branch July 27, 2024 11:18
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
* Refactor code to remove unnecessary calls to `to(torch_device)`

* Refactor code to remove unnecessary calls to `to("cuda")`

* Update pipeline_stable_diffusion_diffedit.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants