-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start: intro to experiment checkpoints #2518
Comments
This comment was marked as outdated.
This comment was marked as outdated.
I wrote a first draft for the document in #2528. I think we discussed, at least I mentioned this as my plan during last week's meeting.
These are the different ways of using the checkpoints, and also tags in get-started-checkpoints repository. DVClive is covered in the UG document, but the other three ways are not covered in detail. For the GS level, this basic stuff should be OK. We may need to update the code in the UG document to conform to |
@iesahin There was an issue in #2292, which was why the |
I think your comment in #2292 still seems valid @dberenbaum and
Also, I'm not sure that a typical user will need more than one checkpoint in the pipeline. I'll add if you want to use in such and such way, you can do so with other methods at the end. (1) DVClive provides automated metrics tracking, (2) you can save arbitrary checkpoints with It might add confusion but I think other ways are too much for an introductory material. I wouldn't add |
My concern would be that |
does it require making your code do one (or some number) of epochs at a time? |
Ah, yes, that may be necessary if there is no resume where the training left off functionality is available. Nevertheless I think this is easier to explain than |
@dberenbaum do you think we still want to introduce checkpoints at the Get Started level? To me it sounds like the feature not at that level of maturity but not sure. But if the answer is no feel free to close this thanks. |
Some comments for the record (maybe we can address these points at least):
Indeed signal files are barely mentioned in https://dvc.org/doc/user-guide/experiment-management/running-experiments#checkpoint-experiments, notably not even mentioned in https://dvc.org/doc/user-guide/experiment-management/checkpoints, and mentioned a bit more (but still not explained) in https://dvc.org/doc/dvclive/dvclive-with-dvc#dvclive-with-dvc. Is this something we want to document going fwd though?
|
I have not seen anyone ask about language-agnostic checkpoints, so I wouldn't prioritize "signal files" until someone does. |
The
basic
checkpoints (without dvclive ormake_checkpoint
orsignal-file
) seem to be undercovered in the docs. We have a way to add them to experiments usingdvc stage add -c model ...
or editingdvc.yaml
and probably the easiest way to start with the checkpoints.The Checkpoints Tutorial covers the
dvclive
usage. We also need documents forsignal-file
andmake_checkpoint
but they may be considered advanced.This is related to #2496
Related iterative/katacoda-scenarios#62
The text was updated successfully, but these errors were encountered: