Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regional Prompting (Node only) #5916

Merged
merged 23 commits into from
Apr 9, 2024
Merged

Conversation

RyanJDick
Copy link
Collaborator

@RyanJDick RyanJDick commented Mar 10, 2024

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Community Node Submission

Have you discussed this change with the InvokeAI team?

  • Yes
  • No, because:

Have you updated all relevant documentation?

  • Yes
  • No. Not yet.

Description

This branch makes the following changes to support regional prompting:

  • Adds a "Create Rectangle Mask" invocation for easily creating masks with nodes.
  • Adds a mask input to the compel prompt invocations (for both SD and SDXL)
  • Changes the "Denoise Latents" invocation to accept a collection of positive and negative text conditioning inputs.
  • Updates the UNet attention modules to apply regional masking in the cross-attention layers.

Here's a sample workflow using the new nodes:

image

Note about Memory Usage:

When IP-Adapter and/or regional prompting are used, we use a custom attention processor. This attention processor does not currently support xformers or sliced attention, so will use more memory than standard attention when those options are enabled.

The custom attention processor currently uses torch.scaled_dot_product_attention().

If there is enough demand, we could add support for xformers and sliced implementations. But, it probably makes sense to re-think our attention configuration strategy in the context of the latest improvements to torch (which supports low-memory and flash-attention modes).

QA Instructions, Screenshots, Recordings

Completed Tests

Basic functionnality:

  • CUDA, SDXL, Standard
  • CUDA, SD, Standard
  • CUDA, SDXL, Sequential
  • CUDA, SD, Sequential
  • CPU, SDXL, Standard

Speed regression tests (run on an RTX4090):

  • No regression in speed for standard global prompting
  • The speed cost of using multiple prompts in DenoiseLatents is small. With 3 prompts, the denoise latents time increases from ~9.6s to ~10.2s. The more significant cost is having to run the text encoder N times.
  • Speed of IP-adapter is unchanged.

Compatibility:

  • Regional negative prompts
  • No global prompt, regions do not 100% cover the image.
  • Compatible with IP-Adapter
  • Compatible with T2I-Adapter
  • Compatible with ControlNet

Remaining tests

  • ROCm
  • mps

Added/updated tests?

  • Yes
  • No

@github-actions github-actions bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files python-tests PRs that change python tests labels Mar 10, 2024
@RyanJDick RyanJDick mentioned this pull request Mar 10, 2024
12 tasks
@RyanJDick RyanJDick force-pushed the ryan/regional-prompting-naive branch from 5be7ca4 to 86ca81c Compare March 11, 2024 13:12
@RyanJDick RyanJDick marked this pull request as ready for review March 11, 2024 15:41
Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work. A couple of questions.

  • We now have several masks being used at multiple places, including the bitmap inpainting mask, the boolean mask used in regional prompting, and the segment anything mask used in wip: Segment Anything #5829 . Are these compatible with each other? For example, can I use a segment anything mask as a regional prompting mask?
  • What is the use case for a rectangular prompting mask?

@github-actions github-actions bot added Root docs PRs that change docs python-deps PRs that change python dependencies labels Mar 29, 2024
@hipsterusername
Copy link
Member

Impressive work. A couple of questions.

* We now have several masks being used at multiple places, including the bitmap inpainting mask, the boolean mask used in regional prompting, and the segment anything mask used in [wip: Segment Anything #5829](https://github.com/invoke-ai/InvokeAI/pull/5829) . Are these compatible with each other? For example, can I use a segment anything mask as a regional prompting mask?

* What is the use case for a rectangular prompting mask?

Re: Masks - That is the intent. They should just be Alpha channel-only images, interchangeable in different contexts. The system does not currently seem to handle "Mask" & "Image" as distinct types, but more as useful labels of the same type of object.

The Rectangular Mask is primarily for creating a Mask from a workflow without importing an existing image. It's obviously not the ideal UX - but we currently lack a better interface in the Workflow editor if the user doesn't want to leave it to create the mask. The nature of the regional prompting ultimately doesn't require an exact shape, so the rectangle is "good enough" for many use-cases.

@RyanJDick
Copy link
Collaborator Author

From a quick skim, all of our new mask formats are likely not compatible. I'll look into fixing that.

@RyanJDick RyanJDick force-pushed the ryan/regional-prompting-naive branch from 4355842 to 9da0336 Compare April 3, 2024 13:05
@RyanJDick RyanJDick force-pushed the ryan/regional-prompting-naive branch from 9da0336 to 4f97192 Compare April 3, 2024 13:38
@RyanJDick
Copy link
Collaborator Author

I started a thread here to discuss consolidating the mask types: https://discord.com/channels/1020123559063990373/1225113849489915925/1225113851121504416

@psychedelicious
Copy link
Collaborator

psychedelicious commented Apr 8, 2024

I've got a branch with a WIP UI (not hooked up to any graphs). It outputs an segmented mask image like this, which I'm using for testing:
canvas

I've added a couple nodes to this branch:

  • ExtractMasksAndPromptsInvocation: Given a segmented mask image and list of prompt-color pairs, derives a mask for each prompt-color pair, outputting a list of prompt-mask pairs. There's a default value for the list of of pairs that matches the above test image.
  • SplitMaskPromptPair: Splits a single prompt-mask pair into a prompt and MaskField.

I haven't been able to generate with regional prompts, though.

When I recreate @RyanJDick 's example workflow, I get this error:

[2024-04-08 18:00:51,743]::[InvokeAI]::ERROR --> Error while invoking session e714ce61-bc0a-493b-ab9d-f6888d06bea7, invocation 8386e799-d3ca-4456-a1f1-24d6e17ba051 (denoise_latents):
Linear.forward() takes 2 positional arguments but 3 were given
[2024-04-08 18:00:51,743]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 179, in _process
    outputs = self._invocation.invoke_internal(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 281, in invoke_internal
    output = self.invoke(context)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/latent.py", line 937, in invoke
    result_latents = pipeline.latents_from_embeddings(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 359, in latents_from_embeddings
    latents = self.generate_latents_from_embeddings(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 452, in generate_latents_from_embeddings
    step_output = self.step(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 573, in step
    uc_noise_pred, c_noise_pred = self.invokeai_diffuser.do_unet_step(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusion/shared_invokeai_diffusion.py", line 241, in do_unet_step
    ) = self._apply_standard_conditioning(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusion/shared_invokeai_diffusion.py", line 376, in _apply_standard_conditioning
    both_results = self.model_forward_callback(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 646, in _unet_forward
    return self.unet(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1216, in forward
    sample, res_samples = downsample_block(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 1279, in forward
    hidden_states = attn(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/transformers/transformer_2d.py", line 397, in forward
    hidden_states = block(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/attention.py", line 329, in forward
    attn_output = self.attn1(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 522, in forward
    return self.processor(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusion/custom_atttention.py", line 115, in __call__
    query = attn.to_q(hidden_states, *args)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: Linear.forward() takes 2 positional arguments but 3 were given

I adapted the workflow to SD1.5 and get the same error:
Screenshot 2024-04-08 at 6 25 41 pm

Rectangle Regional Prompts - SD1.5.json

[2024-04-08 18:24:55,244]::[InvokeAI]::ERROR --> Error while invoking session bc2f25fa-59bc-4fed-b1b0-7ed133382ef8, invocation 3add961c-2fe6-4db8-96b2-53364d02d064 (denoise_latents):
Linear.forward() takes 2 positional arguments but 3 were given
[2024-04-08 18:24:55,244]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 179, in _process
    outputs = self._invocation.invoke_internal(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 281, in invoke_internal
    output = self.invoke(context)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/latent.py", line 937, in invoke
    result_latents = pipeline.latents_from_embeddings(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 359, in latents_from_embeddings
    latents = self.generate_latents_from_embeddings(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 452, in generate_latents_from_embeddings
    step_output = self.step(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 573, in step
    uc_noise_pred, c_noise_pred = self.invokeai_diffuser.do_unet_step(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusion/shared_invokeai_diffusion.py", line 241, in do_unet_step
    ) = self._apply_standard_conditioning(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusion/shared_invokeai_diffusion.py", line 376, in _apply_standard_conditioning
    both_results = self.model_forward_callback(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 646, in _unet_forward
    return self.unet(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1216, in forward
    sample, res_samples = downsample_block(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 1279, in forward
    hidden_states = attn(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/transformers/transformer_2d.py", line 397, in forward
    hidden_states = block(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/attention.py", line 329, in forward
    attn_output = self.attn1(
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 522, in forward
    return self.processor(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusion/custom_atttention.py", line 115, in __call__
    query = attn.to_q(hidden_states, *args)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: Linear.forward() takes 2 positional arguments but 3 were given

I get a different error if I use the new nodes I created. Here's the workflow:

Arbitrary Regional Prompts.json

image

And the error:

[2024-04-08 18:16:47,497]::[InvokeAI]::ERROR --> Error while invoking session 76149546-6732-4097-ad5a-365914372f4a, invocation eede7508-9af7-41d4-89bf-0a0668eef7c1 (denoise_latents):

[2024-04-08 18:16:47,497]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 179, in _process
    outputs = self._invocation.invoke_internal(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 281, in invoke_internal
    output = self.invoke(context)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/latent.py", line 909, in invoke
    conditioning_data = self.get_conditioning_data(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/latent.py", line 517, in get_conditioning_data
    cond_text_embedding, cond_regions = self._concat_regional_text_embeddings(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/latent.py", line 472, in _concat_regional_text_embeddings
    regions = TextConditioningRegions(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusion/conditioning_data.py", line 86, in __init__
    assert self.masks.shape[1] == len(self.ranges)
AssertionError

I think I need a "global" positive prompt, so I added that and got a different error:

Arbitrary Regional Prompts with global prompt.json

image

[2024-04-08 18:28:11,117]::[InvokeAI]::ERROR --> Error while invoking session af3ddd99-2f63-49c0-8b37-c6cd5bcf5c15, invocation 7afcbe9e-8d12-42b6-9727-6989651fbb3b (denoise_latents):
Tensors must have same number of dimensions: got 3 and 4
[2024-04-08 18:28:11,117]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 179, in _process
    outputs = self._invocation.invoke_internal(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 281, in invoke_internal
    output = self.invoke(context)
  File "/home/bat/.invokeai/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/latent.py", line 909, in invoke
    conditioning_data = self.get_conditioning_data(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/latent.py", line 517, in get_conditioning_data
    cond_text_embedding, cond_regions = self._concat_regional_text_embeddings(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/latent.py", line 473, in _concat_regional_text_embeddings
    masks=torch.cat(processed_masks, dim=1),
RuntimeError: Tensors must have same number of dimensions: got 3 and 4

I had merged main into this branch, but switching to the last pre-merge commit and re-installing dependencies doesn't change the errors.

I tried with attention_type: torch-sdp and xformers, same error on both.

Maybe I'm missing some detail in the workflow, and/or creating the mask tensors incorrectly in my node.

@psychedelicious
Copy link
Collaborator

Here's the very rough regional prompt UI (no feedback needed at this point, it's just functional enough for testing):

Screen.Recording.2024-04-08.at.6.34.03.pm.mov

…mAttnProcessor2_0. This fixes a bug in CustomAttnProcessor2_0 that was being triggered when peft was not installed. The bug was present in a block of code that was previously copied from diffusers. The bug seems to have been introduced during diffusers' migration to PEFT for their LoRA handling. The upstream bug was fixed in huggingface/diffusers@531e719.
@RyanJDick
Copy link
Collaborator Author

RyanJDick commented Apr 8, 2024

@psychedelicious I have addressed your first error (TypeError: Linear.forward() takes 2 positional arguments but 3 were given) in 98900a7. See that commit's message for more info. I wasn't hitting it, because I had peft installed, which was triggering slightly different behaviour.

Moving on to the others now.

…Also, added a clearer error message in case the same error is introduced in the future.
@RyanJDick
Copy link
Collaborator Author

@psychedelicious The errors coming from your new nodes are addressed in 826f3d6

Copy link
Collaborator

@psychedelicious psychedelicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed errors are resolved.

SD1.5 is pretty iffy in terms of putting the prompted things in the specified regions, but SDXL works great.

I've reverted my added node. It was built on incorrect assumptions about how regional prompting worked. I'll add a different node for arbitrary mask images in the PR that adds the UI.

@psychedelicious psychedelicious mentioned this pull request Apr 9, 2024
10 tasks
@hipsterusername hipsterusername merged commit fe38625 into main Apr 9, 2024
14 checks passed
@hipsterusername hipsterusername deleted the ryan/regional-prompting-naive branch April 9, 2024 12:12
Comment on lines +819 to +822
# At some point, someone decided that schedulers that accept a generator should use the original seed with
# all bits flipped. I don't know the original rationale for this, but now we must keep it like this for
# reproducibility.
scheduler_step_kwargs = {"generator": torch.Generator(device=device).manual_seed(seed ^ 0xFFFFFFFF)}
Copy link
Contributor

@keturn keturn Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stochastic schedulers needed an explicitly seeded generator passed for reproducibility.

Nobody wanted to add a new stochastic_scheduler_seed or generator field to DenoiseLatentsInvocation, but they figured out how to get a seed from one of its input LatentFields.

However, I feared re-using the same seed as had already been used to generate that input would result in the same tensor, which is not the random behavior those schedulers expect to be doing.

So the compromise was, when you have an operation that needs a seed but the only seed defined has already been used earlier in the execution graph, you do some reproducible transformation like this to get a new seed for your new operation.

FWIW, I recommend factoring this seed ^ 0xFFFFFFFF up out of this method to the place that calls get_scheduler() and init_scheduler(). That makes this method simpler: the seed passed to init_scheduler is, in fact, the seed used for them; and the quirky logic about deriving the scheduler_seed goes with the code that's trying to pull a seed out of the InvocationContext's LatentFields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files docs PRs that change docs invocations PRs that change invocations python PRs that change python files python-deps PRs that change python dependencies python-tests PRs that change python tests Root
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants