-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[feat]: implement "local" caption upsampling for Flux.2 #12718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| @@ -0,0 +1,29 @@ | |||
| """ | |||
| These system prompts come from: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed internally, this new-line character thingy messes up the quality a bit. Hence, I have decided to keep these system messages one-to-one same as the original implementation linked above.
If we run make style && make quality, this order will be completely destroyed. We can change the pyproject.toml to exclude this path from getting formatted. But before we do that, let's see if this is the best we have.
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot on working on this!
I left some feedbacks!
|
|
||
| @staticmethod | ||
| def _resize_to_target_area(image: PIL.Image.Image, target_area: int = 1024 * 1024) -> Tuple[int, int]: | ||
| def _resize_to_target_area( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh do you want to add a new method called something like _resize_if_exceeds_area? or rename this one if we only use it this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
I created _resize_if_exceeds_area() which is basically:
def _resize_if_exceeds_area(image, target_area=1024 * 1024) -> PIL.Image.Image:
image_width, image_height = image.size
pixel_count = image_width * image_height
if pixel_count <= target_area:
return image
return Flux2ImageProcessor._resize_to_target_area(image, target_area)|
|
||
| # Adapted from | ||
| # https://github.com/black-forest-labs/flux2/blob/5a5d316b1b42f6b59a8c9194b77c8256be848432/src/flux2/text_encoder.py#L49C5-L66C19 | ||
| def _validate_and_process_images( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a seperate step to validate and process image and then run format_input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We now first _validate_and_process_images() and then pass the resultant images to format_input().
|
@sayakpaul this is really nice. Do you want me to start working on an Endpoint? We can take this conversation to private slack and see how this works. Update: After inspecting, it turns out that the model needs upwards of 300 GBs to run. (from the official model card)
Building a free Inference Endpoint does not seem to be feasible by me and @sayakpaul and hence we are benching this project. Another option would be to route through Inference Providers, but we have not seen a need (other than this specific one) to let our providers host this model. |
What does this PR do?
Test code:
Generated upsampled prompt:
Output
Notes
system_messages.pyscript undersrc/diffusers/pipelines/flux2so that other pipelines derived from Flux2 can easily use it.caption_upsample_temperatureis set (defaults toNone), we perform the process.