Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass processing step to worker #779

Merged
merged 4 commits into from
Feb 7, 2023
Merged

Conversation

severo
Copy link
Collaborator

@severo severo commented Feb 7, 2023

No description provided.

The workers factory does not only create dataset-based workers, so it
should not be named DatasetsBasedWorkerFactory. Also, to reduce the
code, we pass the processing step to the worker at its creation, instead
of getting it afterwards.
@HuggingFaceDocBuilder
Copy link
Collaborator

HuggingFaceDocBuilder commented Feb 7, 2023

The documentation is not available anymore as the PR was closed or merged.

@severo
Copy link
Collaborator Author

severo commented Feb 7, 2023

The CI error is treated here: #778. We will have to wait for it to be fixed before merging this PR.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!!! Thanks!

Definitely it makes much more sense to have the WorkerLoopConfig defined within workers/, where WorkerLoop is defined as well.

Also, as I commented in other PR (#778 (comment)):

Maybe, we could use BaseWorkerFactory to define the interface, and rename DatasetBasedWorkerFactory to WorkerFactory...

I think it is much clearer this way.

And I find the introduction of the processing_graph attribute in the WorkerFactory will help to be more flexible and generalize to more complex processings...

Just some questions below.

self.app_config = app_config
self.processing_graph = processing_graph
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one thing I don't understand: the processing_graph continues being part of the app_config? So you are passing it here twice (separate and within the app_config)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, if you are trying to decouple them, then ProcessingGraph shouldn't be removed from AppConfig (in workers/datasets_based/src/datasets_based/config.py)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you're right.

This PR is a part I extracted from a bigger PR where I'm moving processing_graph outside of the app_config and I forgot to remove it. I'll fix this for now, for the sake of coherence

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better this way, thanks :)

@severo
Copy link
Collaborator Author

severo commented Feb 7, 2023

Look @albertvillanova, I merged; I didn't rebase :P. Following your best practice!

Capture d’écran 2023-02-07 à 13 28 11

@severo severo merged commit 769bc3a into main Feb 7, 2023
@severo severo deleted the pass-processing-step-to-worker branch February 7, 2023 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants