-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alpha pipeline] Add prompt generating component #84
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jamesbraniganml6 and @NielsRogge!
Left some comments.
examples/controlnet-interior-design/components/generate_prompts/Dockerfile
Outdated
Show resolved
Hide resolved
examples/controlnet-interior-design/components/generate_prompts/fondant_component.yaml
Outdated
Show resolved
Hide resolved
examples/controlnet-interior-design/components/generate_prompts/src/main.py
Outdated
Show resolved
Hide resolved
examples/controlnet-interior-design/components/generate_prompts/src/main.py
Outdated
Show resolved
Hide resolved
examples/controlnet-interior-design/components/generate_prompts/src/main.py
Outdated
Show resolved
Hide resolved
examples/controlnet-interior-design/components/generate_prompts/src/main.py
Outdated
Show resolved
Hide resolved
|
||
df = dd.from_pandas( | ||
df, npartitions=1 | ||
) # need to decide how npartitions should be set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be based on the amount of cores available and data size.
https://stackoverflow.com/a/46646366/4098821
I think we should handle this as Fondant because:
- This is the complexity we want to take out of the user's hands
- The
dask
partitioning translates toparquet
partitioning and so is also relevant for downstream components
We can do this after the user returns the dataframe leveraging repartition
, but then the user still needs to provide some placeholder value in the meantime.
Another option is that we provide some utility functionality for this.
CC: @GeorgesLorre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should handle this in Fondant. the argument is optional so I would just not use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is optional, but only if you specify chunksize
instead.
https://github.com/dask/dask/blob/980fd92a46f101f161d21e0c4f486024bafbfad3/dask/dataframe/io/io.py#L263-L264
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far :) thanks!
Left a few coments
logger = logging.getLogger(__name__) | ||
|
||
|
||
interior_styles = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those prompts seem to be currently hardcoded within the main function. They should be moved to a general config and passed as parameters to the kubeflow component.
This way you avoid having to re-build your images everytime you want to change prompt lists and it also enables you to do experiment tracking by tracking the parameters that went into a certain kubeflow pipeline run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the question is if we even need this component, or if you shouldn't just pass your prompts to the LAION retriever directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still create a parquet file out of them to potentially link the retrieved images with a given prompt based on ID
type: utf8 | ||
|
||
args: | ||
dataset_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would add the prompt lists as arguments here. Although we still need to make sure lists are parsed properly. I created a ticket for it #85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do want to add them as arguments, maybe file-based arguments make more sense for this data size?
examples/controlnet-interior-design/components/generate_prompts/src/main.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after remaining comments are addressed
Also could you clarify:
Normally all components of a given pipeline should be placed in a |
examples/controlnet-interior-design/components/generate_prompts/fondant_component.yaml
Outdated
Show resolved
Hide resolved
Can you pick this up @NielsRogge? I just split this PR from the other branch.
Indeed, I would add all other components as native / reusable components. So we should get the following directory structure:
And the pipeline should use the reusable components. |
examples/controlnet-interior-design/components/generate_prompts/Dockerfile
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left a few open comments to address
examples/controlnet-interior-design/components/generate_prompts/Dockerfile
Outdated
Show resolved
Hide resolved
examples/controlnet-interior-design/components/generate_prompts/src/main.py
Outdated
Show resolved
Hide resolved
examples/controlnet-interior-design/components/generate_prompts/src/main.py
Outdated
Show resolved
Hide resolved
b28038d
to
bc216d6
Compare
...ples/pipelines/controlnet-interior-design/components/generate_prompts/fondant_component.yaml
Outdated
Show resolved
Hide resolved
examples/pipelines/controlnet-interior-design/controlnet_pipeline:dev.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this Niels 👍 ! One small comment but otherwise good to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NielsRogge!
LGTM (Can't approve my own PR)
Co-authored-by: Niels Rogge <niels.rogge@ml6.eu> Co-authored-by: Robbe Sneyders <robbe.sneyders@gmail.com>
6507229
to
b884936
Compare
Fixes #84 Split this from #71 so we can review this component by component. This PR contains the first component, which generates captions to query LAION-5B with. This is also the only component that is specific to this example, so I would place all other components in a `components/` directory. I also think we can put examples directly under the `examples/` directory instead of `examples/pipelines`, as the example pipelines contain example components, and the reusable components in the future `components/` directory can serve as examples as well. --------- Co-authored-by: James Branigan <james.branigan@ml6.eu> Co-authored-by: Niels Rogge <niels.rogge@ml6.eu> Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Fixes #94
Split this from #71 so we can review this component by component.
This PR contains the first component, which generates captions to query LAION-5B with. This is also the only component that is specific to this example, so I would place all other components in a
components/
directory.I also think we can put examples directly under the
examples/
directory instead ofexamples/pipelines
, as the example pipelines contain example components, and the reusable components in the futurecomponents/
directory can serve as examples as well.