Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alpha pipeline] Add prompt generating component #84

Merged
merged 6 commits into from
May 10, 2023

Conversation

RobbeSneyders
Copy link
Member

@RobbeSneyders RobbeSneyders commented May 8, 2023

Fixes #94

Split this from #71 so we can review this component by component.

This PR contains the first component, which generates captions to query LAION-5B with. This is also the only component that is specific to this example, so I would place all other components in a components/ directory.

I also think we can put examples directly under the examples/ directory instead of examples/pipelines, as the example pipelines contain example components, and the reusable components in the future components/ directory can serve as examples as well.

Copy link
Member Author

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jamesbraniganml6 and @NielsRogge!

Left some comments.


df = dd.from_pandas(
df, npartitions=1
) # need to decide how npartitions should be set
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be based on the amount of cores available and data size.
https://stackoverflow.com/a/46646366/4098821

I think we should handle this as Fondant because:

  • This is the complexity we want to take out of the user's hands
  • The dask partitioning translates to parquet partitioning and so is also relevant for downstream components

We can do this after the user returns the dataframe leveraging repartition, but then the user still needs to provide some placeholder value in the meantime.

Another option is that we provide some utility functionality for this.

CC: @GeorgesLorre

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should handle this in Fondant. the argument is optional so I would just not use it here.

Copy link
Member Author

@RobbeSneyders RobbeSneyders May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far :) thanks!
Left a few coments

logger = logging.getLogger(__name__)


interior_styles = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those prompts seem to be currently hardcoded within the main function. They should be moved to a general config and passed as parameters to the kubeflow component.

This way you avoid having to re-build your images everytime you want to change prompt lists and it also enables you to do experiment tracking by tracking the parameters that went into a certain kubeflow pipeline run

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the question is if we even need this component, or if you shouldn't just pass your prompts to the LAION retriever directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still create a parquet file out of them to potentially link the retrieved images with a given prompt based on ID

type: utf8

args:
dataset_name:
Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would add the prompt lists as arguments here. Although we still need to make sure lists are parsed properly. I created a ticket for it #85

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do want to add them as arguments, maybe file-based arguments make more sense for this data size?

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after remaining comments are addressed

@NielsRogge
Copy link
Contributor

Also could you clarify:

This is also the only component that is specific to this example, so I would place all other components in a components/ directory.

Normally all components of a given pipeline should be placed in a components folder, otherwise it would become hard to see which components a certain pipeline consists of. I thought the plan was to have native components directly available in the Fondant SDK, not in the examples folder?

@RobbeSneyders
Copy link
Member Author

LGTM after remaining comments are addressed

Can you pick this up @NielsRogge? I just split this PR from the other branch.

Normally all components of a given pipeline should be placed in a components folder, otherwise it would become hard to see which components a certain pipeline consists of. I thought the plan was to have native components directly available in the Fondant SDK, not in the examples folder?

Indeed, I would add all other components as native / reusable components. So we should get the following directory structure:

- examples
  |- controlnet-interior-design
     |- components
        |- generate_prompts
     |- pipeline.py
- components
  |- prompt-based-LAION-retriever
  |- image-downloader
  |- image-resolution-filter
  ...

And the pipeline should use the reusable components.

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left a few open comments to address

@NielsRogge NielsRogge changed the title Add prompt generating component for interior design controlnet example [Alpha pipeline] Add prompt generating component May 9, 2023
Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this Niels 👍 ! One small comment but otherwise good to go

Copy link
Member Author

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NielsRogge!
LGTM (Can't approve my own PR)

fondant/component_spec.py Outdated Show resolved Hide resolved
j-branigan and others added 5 commits May 10, 2023 09:52
Co-authored-by: Niels Rogge <niels.rogge@ml6.eu>
Co-authored-by: Robbe Sneyders <robbe.sneyders@gmail.com>
@NielsRogge NielsRogge merged commit cbfd34a into main May 10, 2023
@RobbeSneyders RobbeSneyders deleted the feature/interior-prompts branch May 15, 2023 16:28
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
Fixes #84

Split this from #71 so we can review this component by component.

This PR contains the first component, which generates captions to query
LAION-5B with. This is also the only component that is specific to this
example, so I would place all other components in a `components/`
directory.

I also think we can put examples directly under the `examples/`
directory instead of `examples/pipelines`, as the example pipelines
contain example components, and the reusable components in the future
`components/` directory can serve as examples as well.

---------

Co-authored-by: James Branigan <james.branigan@ml6.eu>
Co-authored-by: Niels Rogge <niels.rogge@ml6.eu>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prompt generating component
5 participants