Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use evolve in Component class #56

Merged
merged 27 commits into from
May 3, 2023
Merged

Use evolve in Component class #56

merged 27 commits into from
May 3, 2023

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Apr 28, 2023

This PR uses the evolve functionality (which updates the manifest based on a component spec) in the Component class.

Fixes #45

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NielsRogge.

I would propose a different approach for the Dataset class, see my comments.

fondant/test.py Outdated Show resolved Hide resolved
fondant/component.py Outdated Show resolved Hide resolved
fondant/component.py Outdated Show resolved Hide resolved
fondant/test.py Outdated Show resolved Hide resolved
fondant/manifest.py Outdated Show resolved Hide resolved
@GeorgesLorre
Copy link
Collaborator

I see some potential conflict with the join and merge subset PR's, how would you advice we proceed ?

@NielsRogge
Copy link
Contributor Author

@GeorgesLorre this PR can be merged after the join and merge PRs

1 similar comment
@NielsRogge
Copy link
Contributor Author

@GeorgesLorre this PR can be merged after the join and merge PRs

evolved_manifest = self.copy()

# Update `component_id` of the metadata
component_id = component_spec.name.lower().replace(" ", "_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we take this from the image field directly?

Copy link
Contributor Author

@NielsRogge NielsRogge May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the image field currently isn't used, and should point to a URL like ghcr.io/ml6team/image_filtering:latest, right?

In that case, we could split the string to get the component ID from that. For now I just used the name property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it to build the kubeflow_component.yaml example the FondantSpec so it should be properly defined for each component and pointing to a valid url.
Splitting it should be easy though so would rather do it now or add a TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What benefit would using the image provide over using the name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could include the docker tag of the images that you've used to run a certain a component in the manifest which can be used for versioning. (Figuring out what component version generated which manifest)

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NielsRogge! Some minor comments left.

fondant/component.py Outdated Show resolved Hide resolved
dataset = FondantDataset(manifest)
dataset = self._process_dataset(dataset)
input_manifest = self._load_or_create_manifest()
dataset = FondantDataset(input_manifest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this dataset is used by the LoadComponent. We might be able to restructure a bit so we don't create it in this case.

fondant/component.py Show resolved Hide resolved
tests/test_manifest_evolution.py Outdated Show resolved Hide resolved
fondant/test.py Outdated Show resolved Hide resolved
fondant/manifest.py Show resolved Hide resolved
fondant/component.py Outdated Show resolved Hide resolved
@@ -74,27 +74,37 @@ def _load_or_create_manifest(self) -> Manifest:
"""Abstract method that returns the dataset manifest"""

@abstractmethod
def _process_dataset(self, dataset: FondantDataset) -> FondantDataset:
def _process_dataset(self, dataset: FondantDataset) -> dd.DataFrame:
"""Abstract method that processes the input dataframe and updates the Fondant Dataset
with the new or loaded subsets"""

def run(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a long descriptive docstring describing in-details what the run() method is responsible for. Alternatively, you can include this doc at the class definition level (FondantComponent(ABC)) to provide a more comprehensive explanation of the component's functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we might need to refactor the run and process_dataset methods in a follow-up PR as the load component doesn't use the dataset argument.

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this up Niels! Left a few comments mainly related to docstrings but the overall logic of handling the manifest evolve and component spec in the Component seems to be valid.

fondant/component.py Outdated Show resolved Hide resolved
fondant/component.py Outdated Show resolved Hide resolved
fondant/test.py Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

Feel free to approve the PR, I've addressed all comments.

Copy link
Collaborator

@GeorgesLorre GeorgesLorre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

@PhilippeMoussalli PhilippeMoussalli changed the title Use evolve in Dataset Use evolve in Component class May 3, 2023
@NielsRogge NielsRogge mentioned this pull request May 3, 2023
@RobbeSneyders RobbeSneyders merged commit d20ff39 into main May 3, 2023
@RobbeSneyders RobbeSneyders deleted the evolve_manifest branch May 4, 2023 07:34
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
This PR uses the `evolve` functionality (which updates the manifest
based on a component spec) in the `Component` class.

Fixes #45

---------

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use manifest evolution in Dataset
4 participants