-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add interior design ControlNet pipeline readme #150
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jamesbraniganml6!
- Can we add a visual example of the data?
- Can we add a section on how to reuse this pipeline for a different use case? They should only recreate the prompt generation component.
- The pipeline is currently drawn with parallel captioning and segmentation, which is not the case. For clarity, it might be better to make these sequential.
PR that adds missing data types required for defining needed nested data types for the embedding component Changes: * The string values of the enum types have been changed to pyarrow types to make it easier to define complex schema * utf8 types defined in the components have been changed to strings to make them more intuitive We will need to make more changes in the future to handle different nested data types as suggested by @GeorgesLorre https://swagger.io/docs/specification/data-models/data-types/#:~:text=the%20null%20value.-,Arrays,-Arrays%20are%20defined Enums allow us to define nice constants that are typed but we'll need to define many in order to accommodate for all different types of nested structures. We might need to move away to dynamically typed data types with a dictionary but this will require quite some changes to the json schemas and the code so better leave it for later
PR that adds the image embedding component. Largely inspired by Niel's PR #111 (inference and batching with dask).
Added the logo svg's 🎉
This branch is based on the image-embedding branch which has a lot of changes. I would suggest to first merge that PR which will give a much smaller Diff here. This component implements the LAION image retrieval component which uses CLIP embeddings from the input subset to query the LAION database. It returns an images subset with URLs, similar to the other prompt based Clip Retrieval component. These URLs should then be downloaded by the already-made image-downloading component. --------- Co-authored-by: Philippe Moussalli <philippe.moussalli95@gmail.com>
This PR contains the Image Cropping component. The component looks for the most common color in the border. It uses this color to calculate how much of the image border can be cropped out. If the crop is not square, it will paste a border on the shortest side to make it square again. ![d4e35776-3ce1-4157-ac1f-5b2f18ff2ad4](https://github.com/ml6team/fondant/assets/92580873/314ec0d3-3ab6-418e-8051-d9f464496b0e) ![82eeae2d-c63c-42cb-881c-3707971d043c](https://github.com/ml6team/fondant/assets/92580873/6754b418-7922-4744-8ef3-59978b07ee9d) --------- Co-authored-by: Philippe Moussalli <philippe.moussalli95@gmail.com>
Thanks for the feedback @RobbeSneyders.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice James !
Since we focus on resuablity and want to inspire people to use fondant we could still make it more visual by adding more data examples or example picures (like the resizing, captioning etc). But that is maybe something we can still improve outside of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image is not telling much IMO, maybe a small sentence or 2 per step to explain how the data is being extended and enriched.
(also database
is very vague, maybe call it dataset ready for fine-tuning or something)
## Introduction | ||
This example demonstrates an end-to-end fondant pipeline to collect and process data for the training of a [ControlNet](https://github.com/lllyasviel/ControlNet) model, focusing on images related to interior design. | ||
|
||
### What is Controlnet? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### What is Controlnet? | |
### What is ControlNet? |
|
||
### What is Controlnet? | ||
|
||
Controlnet is an image generation model developed by https://arxiv.org/abs/2302.05543 that gives the user more control over the image generation process. It is based on the Stable Diffusion model, which generates images based on a caption and an image. The Controlnet model adds a third input, a conditioning image, that can be used for specifying specific wanted elements in the generated image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controlnet is an image generation model developed by https://arxiv.org/abs/2302.05543 that gives the user more control over the image generation process. It is based on the Stable Diffusion model, which generates images based on a caption and an image. The Controlnet model adds a third input, a conditioning image, that can be used for specifying specific wanted elements in the generated image. | |
ControlNet is an image generation model developed by [Zhang etl a., 2023](https://arxiv.org/abs/2302.05543) that gives the user more control over the image generation process. It is based on the [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) model, which generates images based on text and an optional image. The ControlNet model adds a third input, a conditioning image, that can be used for specifying specific wanted elements in the generated image. |
Useful links: | ||
* https://github.com/lllyasviel/ControlNet | ||
* https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/controlnet | ||
* https://arxiv.org/abs/2302.05543 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful links: | |
* https://github.com/lllyasviel/ControlNet | |
* https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/controlnet | |
* https://arxiv.org/abs/2302.05543 | |
Useful links: | |
* https://github.com/lllyasviel/ControlNet | |
* https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/controlnet | |
* https://arxiv.org/abs/2302.05543 |
It might be you need to include an enter here for this to show appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ChristiaensBert!
Looks good, left some comments. You'll also have to rebase / merge since some of the components have moved on main.
1. Building the images for each of the pipeline components | ||
``` | ||
bash build_images.sh -c all | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should set the --namespace
and --repo
as well to push the images to their own github container registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philippe-ml6 Do you have the full command that they have to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the bash script's help function
bash build_images.sh --help
Usage: build_images.sh [options]
Options:
-c, --component <value> Set the component name. Pass the component folder name to build a certain components or 'all' to build all components in the current directory (required)
-n, --namespace <value> Set the namespace (default: ml6team)
-r, --repo <value> Set the repo (default: fondant)
-t, --tag <value> Set the tag (default: latest)
-h, --help Display this help message
# caption_images | ||
|
||
### Description | ||
This component captions inputted images using [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component takes a model id as input, so it can use any HF Hub model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a database or the hub that we should have at the end?
|
||
| Input image | Output image | | ||
|----------------------------------------------------------------|------------------------------------------------------------------| | ||
| ![input image](docs/art/interior_design_controlnet_input1.png) | ![output image](docs/art/interior_design_controlnet_output1.jpg) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those images are not rendered properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Bert!
Can you remove the images that are not used? I see you added some more in the docs/art
folder.
First draft done. Feedback on image is particularly welcome. --------- Co-authored-by: Philippe Moussalli <philippe.moussalli95@gmail.com> Co-authored-by: khaerensml6 <92426912+khaerensml6@users.noreply.github.com> Co-authored-by: ChristiaensBert <92580873+ChristiaensBert@users.noreply.github.com> Co-authored-by: Bert Christiaens <bert.christiaens@ml6.eu>
First draft done. Feedback on image is particularly welcome.