Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] False Checkpoint Warning with tune.with_parameters() #13998

Closed
RaedShabbir opened this issue Feb 8, 2021 · 2 comments · Fixed by #14306
Closed

[tune] False Checkpoint Warning with tune.with_parameters() #13998

RaedShabbir opened this issue Feb 8, 2021 · 2 comments · Fixed by #14306
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks tune Tune-related issues

Comments

@RaedShabbir
Copy link
Contributor

What is the problem?

Detected at https://discuss.ray.io/t/tune-performance-bottlenecks/520/3

False warning:
2021-02-04 18:13:22,924 WARNING function_runner.py:541 – Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be func(config, checkpoint_dir=None).

However I suspect this warning is faulty, I manually verified and found checkpoints had been saved, my call to tune had more parameters passed in after checkpoint_dir = None

Ray version and other system information (Python version, TensorFlow version, OS):
ray latest

@RaedShabbir RaedShabbir added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 8, 2021
@richardliaw richardliaw changed the title False Checkpoint Warning with tune.with_parameters() [tune] [tune] False Checkpoint Warning with tune.with_parameters() Feb 9, 2021
@richardliaw richardliaw added P1 Issue that should be fixed within a few weeks tune Tune-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 9, 2021
@krfricke
Copy link
Contributor

Hi @RaedShabbir, looking at the discussion I am wondering what models.trainers.train_ptl_checkpoint looks like. Can you share some context here?

Also, you are binding the checkpoint_dir parameter in tune.with_parameters:

        tune.with_parameters(
            models.trainers.train_ptl_checkpoint,
            checkpoint_dir=model_config["checkpoint_dir"], #none 
            model_config=model_config, # model specific parameters 
            num_epochs=num_epochs, 
            num_gpus=gpus_per_trial,
            report_on=report_on, # reporting frequency 
            checkpoint_on=report_on, # checkpointing frequency if different than reporting freq
        ),

This will look to the function detector like there's no checkpoint_dir parameter anymore, as you're assigned it already. If you remove that line, it might already work.

@krfricke
Copy link
Contributor

This should probably be fixed by #14306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks tune Tune-related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants