New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More tests on dataset state #1094
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
27ffb7b
to
ebaff0e
Compare
4ec8b38
to
f736899
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the huge contribution. In principle, all proposed changes seem sensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There are some commented lines, will those be removed?
I dont quite understand the advantage of having processing graph in all job runners to make one call to get the children but maybe in the future, it will be done by the orchestrator.
instead of hard coding them
relying on a test processing graph (a->b->c). Also: remove .as_dict() methods that are not used.
it's just .ancestors
in ProcessingStep
in ProcessingStep.
also, remove /admin/job_duration also: renaming step -> processing_step nearly everywhere also: small refactorings proposed by sourcery (inverting logic to avoid double negations...)
also add docstrings
it makes the code in services/api a lot simpler. Also: fix the CI.
69c2f14
to
c8bbd04
Compare
Co-authored-by: Andrea Francis Soria Jimenez <andrea@huggingface.co>
Yes, the PR is too big, I should have broken it into some smaller parts... Thanks for reviewing it all (same @AndreaFrancis)! |
Yes! I removed them. And part of them was due to me having forgotten to push the last commit (finishing the tests in tests/state). I'll let you review the last changes, before merging.
Yes, in the future (the PR is |
provides_dataset_config_names
field indicates if the step is providing the list of config names for a datasetprovides_config_split_names
field indicates if the step is providing the list of split names for a configrequires
field has been renamed totriggered_by
: indeed, the relation between steps is that, when step B istriggered_by
step A, if A has been updated, a new job will be created for step B./admin/jobs_duration
endpoint, because the finished jobs are now deleted quickly, so: this statistic does not mean much now.