Reproducing a Hydra run from previous Hydra job configs #1805

ashleve · 2021-08-31T10:09:51Z

Hi,

Currently Hydra generates the following logs:

When I want to reproduce the experiment, I can simply load the config.yaml:
python run.py --config-path /logs/runs/.../.hydra --config-name config.yaml

The problem is, config.yaml doesn't contain the hydra logs output paths configuration, which means that after running the line above, the logging path will revert to default.

Is there a workaround for that?

Is it possible to additionally load hydra specific config from some other path with a simple oneliner?

Thanks

The text was updated successfully, but these errors were encountered:

omry · 2021-08-31T15:52:40Z

This is not an officially supported workflow.

Try overriding the run path as well from the command line.

python run.py --config-path /logs/runs/.../.hydra --config-name config hydra.run.dir=/logs/runs/.../

Yevgnen · 2022-02-10T12:47:07Z

@omry

Hi, how can I load the config with properly formatted placeholders like ${abc.xyz} in scripts? Can I do this without wrapping another main function? I'm not trying to re-run, but to use the saved output. To instantiate some classes related to the output, I'd like to read the params used saved in .hydra/config.yaml but some of them are placeholders.

Jasha10 · 2022-02-10T18:40:48Z

@Yevgnen see OmegaConf.load.

Yevgnen · 2022-02-18T08:15:47Z

config/config.yaml

work_dir: ${hydra:runtime.cwd}
when: ${now:%Y-%m-%d}/${now:%H-%M-%S}

main.py

# -*- coding: utf-8 -*-

import hydra
from omegaconf import DictConfig


@hydra.main(config_path="config", config_name="config")
def main(cfg: DictConfig) -> None:
    print(cfg.work_dir)
    print(cfg.when)


if __name__ == "__main__":
    main()

@Jasha10 Hi, I tried to load .hydra/config.yaml of a run by OmegaConf.load, but got the following error when I access the config.work_dir

UnsupportedInterpolationType: Unsupported interpolation type hydra
    full_key: work_dir
    object_type=dict

And I don't seem to see any information related to the interpolated values of work_dir and when in .hydra/config.yaml, .hydra/hydra.yaml and .hydra/overrides.yaml of that run .

jieru-hu · 2022-02-25T18:15:07Z

@Yevgnen thank you for trying it out and the repro code. Could you provide a bit more details on what you tried with OmegaConf.load so we can repro? Many thanks

jieru-hu · 2022-02-25T18:16:05Z

reopening, we should look into this and at least provide some documentation on the approach.

Jasha10 · 2022-03-02T05:59:12Z

I've had success reproducing the config by using the saved outputs/.../.hydra/overrides.yaml file. Specifically, suppose I run my_app.py as follows:

$ python my_app.py a=b +c/d=e

This produces a file outputs/.../.hydra/overrides.yaml with the following contents:

- a=b
- +c/d=e

This list of overrides can be used to reproduce the config. The following recipe has worked for me:

with hydra.initialize(config_path=config_path):  # same config_path as used by @hydra.main
    recomposed_config = hydra.compose(
        config_name="main",  # same config_name as used by @hydra.main
        overrides=OmegaConf.load("outputs/.../.hydra/overrides.yaml"),
    )

There are a few caveats here:

Currently the ${hydra:...} resolver cannot be used outside the context of @hydra.main. Using compose does not play well with the ${hydra:...} resolver.
If your config uses non-deterministic resolvers, e.g. to generate a random number or get the current time of day, those resolvers will be run anew (and will probably give different values from the original run). If you need to re-use the values from the initial run, you could try something like this:
config = OmegaConf.merge(recomposed_config, OmegaConf.load("outputs/.../.hydra/config.yaml"))
EDIT: I think that config = OmegaConf.merge(recomposed_config, OmegaConf.load("outputs/.../.hydra/config.yaml")) will not work because .hydra/config.yaml may contain unresolved interpolations/OmegaConf resolvers. A yaml file containing resolved values could be useful here, as @jieru-hu pointed out below.

jieru-hu · 2022-03-04T23:23:30Z

how about we jus save the final composed config of everything related to a job in a separate YAML file (everything including Hydra and the job config, all resolved, no OmegConf syntax in the YAML) we can also provide an experimental CLI option for rerun an application from that finalized composed config. we will only support single run for now and see how it goes.

Saving the finalized YAML will also help with debugging, I've heard from users that the YAML files saved has OC resolvers in them and sometimes is hard to figure out what's going on.

Jasha10 · 2022-03-05T02:05:53Z

Saving a resolved yaml config (e.g. outputs/.../.hydra/resolved.yaml) could certainly be useful! I've just edited the end of my previous comment since this made me realize that loading .hydra/config.yaml is not good enough to reconstruct the resolved values.

That being said, creating resolved.yaml could be expensive, e.g. if the user has defined custom resolvers to do some complicated calculation. For this reason, I think it might make sense to have automatic saving of resolved.yaml be an opt-in feature (or at least have the option to disable saving resolved.yaml).

Another potential complication is that some application use-cases could rely on interpolations or OmegaConf resolvers being unresolved, e.g. if the user app wants to call these resolvers at runtime.

For re-running the application, more information is needed than just what is in resolved.yaml. The resolved yaml config will not contain typing information. Info about structured config types or enum types cannot be reconstructed from the resolved.yaml file. This is why the compose+overrides.yaml approach is useful: it produces a typed config.

jieru-hu · 2022-03-14T19:25:21Z

thanks @Jasha10 those are great points! I agree that eagerly resolve all configs is not the best way to go - especially when user applications also plays a role here.
arguably, we should be able to repro a run if the task_cfg and hydra_cfg passed into the task_function stay the same, the rest will be left up to the user application. I think one lightweight way of enabling this could be pickling JobReturn both before and after the run of user application.
Right now JobReturn( which can be very helpful) is not saved anywhere - I think we can enable the opt in of this by adding an experimental CallBack which dumps the JobReturn for each run.
The second part of this would providing an experiment CLI option that takes in a JobReturn pickle and single run the application.

Jasha10 · 2022-03-14T21:25:50Z

I think that using a Callback to pickle the JobReturn (and later using a CLI option to load the pickle) sounds like a great idea.

I think one lightweight way of enabling this could be pickling JobReturn both before and after the run of user application.

To pickle before the run, we might need to change the signature of the on_job_start hook, since on_job_start currently does not accept a JobReturn argument.

jieru-hu · 2022-03-14T21:28:21Z

I think that using a Callback to pickle the JobReturn (and later using a CLI option to load the pickle) sounds like a great idea.

I think one lightweight way of enabling this could be pickling JobReturn both before and after the run of user application.

To pickle before the run, we might need to change the signature of the on_job_start hook, since on_job_start currently does not accept a JobReturn argument.

I think for now we can just pickle the config object instead of a uncomplete JobReturn - that seems like it should be enough (both task and hydra config are included in the config object)

jieru-hu · 2022-03-17T00:24:57Z

Hi @ashleve @Yevgnen
i just created #2098 to address this. It is not a perfect solution but could be useful for what you are trying to achieve.
If you are interested, here is the documentation on this how to use this particular experimental CLI option.

addisonklinke · 2022-11-07T23:38:00Z

I'm see three potential approaches suggested in this thread

@ashleve: use the built-in Hydra --config-path and --config-name flags to point the script to an old config ^[comment]
@Jasha10: use the Compose API to merge config and overrides ^[comment]
@jieru-hu: use the newly added --experimental-rerun flag ^[comment]

I have some questions to better understand the differences between these approaches

Do approaches 1 and 2 end up generating the same config object?
If the above answer is yes, should approach 1 be preferred whenever @hydra.main is applicable? I say that based on this docs recommendation

Please avoid using the Compose API in cases where @hydra.main() can be used. Doing so forfeits many of the benefits of Hydra (e.g., Tab completion, Multirun, Working directory management, Logging management and more)

Is approach 3 only necessary if you want to preserve the original output directory and append to the old log file? Are there other advantages over approaches 1 or 2 that I'm missing?

Thanks everyone for the clarification!

Jasha10 · 2022-11-08T17:22:49Z

Do approaches 1 and 2 end up generating the same config object?

Approaches 1 involves a round-trip through yaml, which does not always capture the original config in full fidelity. Meanwhile, the approach2 config should be basically identical to the original config.

To see some of the differences, you can experiment with round-tripping a config through yaml as follows:

approach1 = OmegaConf.create(OmegaConf.to_yaml(original_config))  # round-trip through yaml

Approach 1 will produce an untyped config (i.e. a config that is not structured). This means you'll miss out on the runtime type-safety that comes with structured configs. Also, you'll see a difference when calling routines such as OmegaConf.to_object (since OmegaConf.to_object treats structured configs differently from unstructured configs). Another pitfall with the round-trip through yaml comes up if you use Enum values in your config:

>>> class MyEnum(Enum):
...     foo = 1
...
>>> original_config = OmegaConf.create({"key": MyEnum.foo})
>>> assert isinstance(original_config.key, MyEnum)
>>> approach1 = OmegaConf.create(OmegaConf.to_yaml(original_config))
>>> assert approach1.key == "foo"
>>> assert isinstance(approach1.key, str)
>>> assert not isinstance(approach1.key, MyEnum)  # approach 1 doesn't preserve enum values

If the above answer is yes, should approach 1 be preferred whenever @hydra.main is applicable?

If you use structured configs or enum values, I'd recommend against approach 1.

Another difference between approaches 1 and 2 is that, if you refactor your python dataclasses or change your input yaml files, the result of approach 2 will change accordingly. Approach 2 reconstructs the config using Hydra's compose process; the output of that process can change if your dataclasses or your input yaml files change. Meanwhile, the result of approach 1 is always the same (as it depend only on the static config.yaml file that was the output from your previous experiment).

I say that based on this docs recommendation

The docs do recommend using @hydra.main over compose if possible, since @hydra.main can do several things that compose cannot (including sweeping, launching, etc).

Actually, approach 2 can be adapted to work with @hydra.main if you don't mind monkeying with sys.argv:

import sys
...

# approach2 adapted to @hydra.main:
overrides: list[str] = list(OmegaConf.load("outputs/.../.hydra/overrides.yaml"))
sys.argv += overrides

@hydra.main(config_path=..., config_name="main")  # use same config_path and config_name as you used in the original run
def my_app(cfg):
    ...

I wonder if it would make sense for @hydra.main to expose an overrides parameter so that we could avoid this sys.argv hack...

Is approach 3 only necessary if you want to preserve the original output directory and append to the old log file? Are there other advantages over approaches 1 or 2 that I'm missing?

Yes, using the --experimental-rerun flag is only necessary if you want to preserve the original output directory and append to the old log file.

There is also a fourth approach that combines a few of these ideas: you can use PickleJobInfoCallback to produce a config.pickle file, and then directly call pickle.load(open("outputs/.../.hydra/config.pickle", "rb")) to load that pickled config later. This fourth approach circumvents the need to call @hydra.main or compose.

In summary, here are the four discussed approaches for re-animating a config from a previous experiment:

Approach 1: Run Hydra with `config_name="outputs/../.hydra/config.yaml"`

This approach comes from the OP.

First, run your experiment, which should produce some log files including an outputs/.../.hydra/config.yaml file.
You can later point to this config.yaml to re-run experiments using the --config-name and --config-path flags:

python run.py --config-path /outputs/.../.hydra --config-name config.yaml

Pros:

easy to set up

Cons:

produces an untyped (non-structured) config

Approach 1 always produces the same config even if you refactor your application / input yaml files. This may or may not be desirable.

Approach 2: Recompose the config with `overrides="outputs/../.hydra/overrides.yaml"`

This approach comes from my comment above.

First, run your experiment, which should produce some log files including an outputs/.../.hydra/overrides.yaml file.
You can then re-run the experiment by loading overrides.yaml and using those overrides as input to compose or @hydra.main.

Pros:

reconstructs structured configs

Cons:

use with @hydra.main requires monkeying with sys.argv.

The config produced by approach 2 will change if you refactor your dataclasses or your input yaml files. This may or may not be desirable.

Approach 3: PickleJobInfoCallback + `--experimental-rerun`

This approach is only necessary if you want to re-use the same output directory and append to the log file from your previous experiment.

Approach 4: PickleJobInfoCallback + `pickle.load(open("config.pickle", "rb"))`.

This approach avoids using @hydra.main and compose.

addisonklinke · 2022-11-08T18:41:29Z

Approach 1 involves a round-trip through yaml, which does not always capture the original config in full fidelity

Thanks for clarifying this subtle difference - that makes a lot of sense

Approach 1 always produces the same config even if you refactor your application / input yaml files. This may or may not be desirable.

I'll have to think more about whether this is desirable for my ML experiments. Typically, I would plan to checkout the git commit of the original run before reproducing, in which case there couldn't be any changes to the application code

I wonder if it would make sense for @hydra.main to expose an overrides parameter so that we could avoid this sys.argv hack...

I agree it could definitely be convenient to officially support that pattern. Worth opening a new issue?

Jasha10 · 2022-11-08T19:23:34Z

I agree it could definitely be convenient to officially support that pattern. Worth opening a new issue?

Yes, that would be great, thanks!

npuichigo · 2023-09-06T07:01:04Z

@Jasha10 For Approach 2, how can we pass the overrides path from command line since we may want the overrides dynamic

asimukaye · 2023-10-13T08:06:08Z

@Jasha10 For Approach 2, how can we pass the overrides path from command line since we may want the overrides dynamic

@npuichigo I haven't tried it but this: #1022 seems relevant to your question and might help.

bkmi · 2024-01-02T21:44:16Z

the compose + overrides looks ok for my purposes, but is there any way to include the ${hydra:...} resolves in it as well at the current time?

ashleve closed this as completed Aug 31, 2021

Jasha10 added the reproducibility label Feb 23, 2022

jieru-hu reopened this Feb 25, 2022

jieru-hu changed the title ~~Reproducing the run~~ Reproducing a Hydra run from previous Hydra job configs Feb 25, 2022

jieru-hu mentioned this issue Mar 4, 2022

Improving Hydra + PL + DDP experience #2070

Open

2 tasks

Jasha10 mentioned this issue Mar 10, 2022

Improving Hydra+DDP support Lightning-AI/pytorch-lightning#11617

Merged

9 tasks

This was referenced Mar 14, 2022

Add an experimental callback for pickling job info #2092

Closed

Add Callback for saving pickled job info #2093

Merged

add --experimental-rerun CLI option #2098

Merged

Jasha10 mentioned this issue Mar 24, 2022

Initialize Hydra using files from a previous run #1576

Closed

jieru-hu added this to the Hydra 1.2.0 milestone Apr 12, 2022

jieru-hu self-assigned this Apr 12, 2022

jieru-hu closed this as completed Apr 12, 2022

ashleve mentioned this issue Aug 17, 2022

how do we resume from a previous experiment? ashleve/lightning-hydra-template#411

Open

addisonklinke mentioned this issue Aug 30, 2022

[Feature Request] Possibility to save full start command in separate file #2351

Open

addisonklinke mentioned this issue Nov 8, 2022

[Feature Request] Support loading YAML from user-specified URI #2455

Open

addisonklinke mentioned this issue Nov 8, 2022

[Feature Request] Add overrides parameter to @hydra.main() #2459

Open

noamgot mentioned this issue Jul 19, 2023

[Bug] Reproducing a Hydra run from previous Hydra job configs - WITH SCHEMA #2720

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing a Hydra run from previous Hydra job configs #1805

Reproducing a Hydra run from previous Hydra job configs #1805

ashleve commented Aug 31, 2021

omry commented Aug 31, 2021 •

edited

Yevgnen commented Feb 10, 2022

Jasha10 commented Feb 10, 2022

Yevgnen commented Feb 18, 2022 •

edited

jieru-hu commented Feb 25, 2022

jieru-hu commented Feb 25, 2022

Jasha10 commented Mar 2, 2022 •

edited

jieru-hu commented Mar 4, 2022 •

edited

Jasha10 commented Mar 5, 2022 •

edited

jieru-hu commented Mar 14, 2022

Jasha10 commented Mar 14, 2022

jieru-hu commented Mar 14, 2022

jieru-hu commented Mar 17, 2022 •

edited

addisonklinke commented Nov 7, 2022

Jasha10 commented Nov 8, 2022 •

edited

addisonklinke commented Nov 8, 2022

Jasha10 commented Nov 8, 2022

npuichigo commented Sep 6, 2023

asimukaye commented Oct 13, 2023

bkmi commented Jan 2, 2024

Reproducing a Hydra run from previous Hydra job configs #1805

Reproducing a Hydra run from previous Hydra job configs #1805

Comments

ashleve commented Aug 31, 2021

omry commented Aug 31, 2021 • edited

Yevgnen commented Feb 10, 2022

Jasha10 commented Feb 10, 2022

Yevgnen commented Feb 18, 2022 • edited

jieru-hu commented Feb 25, 2022

jieru-hu commented Feb 25, 2022

Jasha10 commented Mar 2, 2022 • edited

jieru-hu commented Mar 4, 2022 • edited

Jasha10 commented Mar 5, 2022 • edited

jieru-hu commented Mar 14, 2022

Jasha10 commented Mar 14, 2022

jieru-hu commented Mar 14, 2022

jieru-hu commented Mar 17, 2022 • edited

addisonklinke commented Nov 7, 2022

Jasha10 commented Nov 8, 2022 • edited

Approach 1: Run Hydra with config_name="outputs/../.hydra/config.yaml"

Approach 2: Recompose the config with overrides="outputs/../.hydra/overrides.yaml"

Approach 3: PickleJobInfoCallback + --experimental-rerun

Approach 4: PickleJobInfoCallback + pickle.load(open("config.pickle", "rb")).

addisonklinke commented Nov 8, 2022

Jasha10 commented Nov 8, 2022

npuichigo commented Sep 6, 2023

asimukaye commented Oct 13, 2023

bkmi commented Jan 2, 2024

omry commented Aug 31, 2021 •

edited

Yevgnen commented Feb 18, 2022 •

edited

Jasha10 commented Mar 2, 2022 •

edited

jieru-hu commented Mar 4, 2022 •

edited

Jasha10 commented Mar 5, 2022 •

edited

jieru-hu commented Mar 17, 2022 •

edited

Jasha10 commented Nov 8, 2022 •

edited

Approach 1: Run Hydra with `config_name="outputs/../.hydra/config.yaml"`

Approach 2: Recompose the config with `overrides="outputs/../.hydra/overrides.yaml"`

Approach 3: PickleJobInfoCallback + `--experimental-rerun`

Approach 4: PickleJobInfoCallback + `pickle.load(open("config.pickle", "rb"))`.