Fails restoring weights #41508

Finebouche · 2023-11-29T23:32:36Z

What happened + What you expected to happen

The code of examples/restore_1_of_n_agents_from_checkpoint.py seems to not be working (at least in my case).

The weight are not recovered but re-initialized. The way I see it is that instead of having the same policy reward means (in Wandb) as before I get reinitialized values.

Maybe the example is not up to date or maybe I am doing something wrong here. I am using tune.Tuner().fit() and not tune.train() as in the example. But not sure why this would fail...

Versions / Dependencies

Python 3.10
Ray 2.8

Reproduction script

from ray.rllib.policy.policy import Policy
from ray.rllib.algorithms.callbacks import DefaultCallbacks
from ray.tune.registry import get_trainable_cls

from ray import train, tune
from ray.air.integrations.wandb import WandbLoggerCallback

config =  (
   get_trainable_cls("PPO").get_default_config()
...
...
   .multi_agent(
        policies= {
            "prey": PolicySpec(
                policy_class=None,  # infer automatically from Algorithm
                observation_space=env.observation_space[0],  # if None infer automatically from env
                action_space=env.action_space[0],  # if None infer automatically from env
                config={"gamma": 0.85},  # use main config plus <- this override here
            ),
            "predator": PolicySpec(
                policy_class=None,
                observation_space=env.observation_space[0],
                action_space=env.action_space[0],
                config={"gamma": 0.85},
            ),
        },
        policy_mapping_fn = lambda id, *arg, **karg: "prey" if env.agents[id].agent_type == 0 else "predator",
        policies_to_train=["prey", "predator"]
    )
)

path_to_checkpoint = "/blablabla/ray_results/PPO_2023-11-29_02-51-09/PPO_CustomEnvironment_c4c87_00000_0_2023-11-29_02-51-09/checkpoint_000008"

def restore_weights(path, policy_type):
    checkpoint_path = os.path.join(path, f"policies/{policy_type}")
    restored_policy = Policy.from_checkpoint(checkpoint_path)
    return restored_policy.get_weights()

class RestoreWeightsCallback(DefaultCallbacks):
    def on_algorithm_init(self, *, algorithm: "Algorithm", **kwargs) -> None:
        algorithm.set_weights({"predator": restore_weights(path_to_checkpoint, "predator")})
        algorithm.set_weights({"prey": restore_weights(path_to_checkpoint, "prey")})

config.callbacks(RestoreWeightsCallback)

ray.init()

# Define experiment    
tuner = tune.Tuner(
    "PPO",                                  
    param_space=config,                         
    run_config=train.RunConfig(         
        stop={                                    
            "training_iteration": 1,
            "timesteps_total": 20000,
        },
        verbose=3,
        callbacks=[WandbLoggerCallback(       
            project="ppo_marl", 
            group="PPO",
            api_key="blabla",
            log_config=True,
        )],
        checkpoint_config=train.CheckpointConfig(        
            checkpoint_at_end=True,
            checkpoint_frequency=1
        ),
    ),
)

# Run the experiment 
results = tuner.fit()

ray.shutdown()

Issue Severity

High: It blocks me from completing my task.

Finebouche · 2023-11-29T23:45:35Z

I should add that I checked that the checkpoint were correctly saved. If I do use


path_to_checkpoint = "/blablabla/ray_results/PPO_2023-11-29_02-51-09/PPO_CustomEnvironment_c4c87_00000_0_2023-11-29_02-51-09/checkpoint_000008"

algo = Algorithm.from_checkpoint(path_to_checkpoint)

and then use algo.compute_single_action()/ run the environment for several steps and then visualize the agents. I get the correct output.

It's really when trying to keep training those previous policies using the method described above that it fails.

Finebouche · 2023-11-30T14:59:52Z

I fell like it might be due to me using tune.Tuner().fit() and not tune.run(). In this other example with train(), it seem to work for the person that tried. Is there a way that fit reinitialize the weights ? Can you actually prevent that ?

Finebouche · 2023-12-01T01:37:34Z

Related to #40777, #32751, #36761, #36830, #41290 and #40347
All on loading previously train Model/Policies.

Finebouche · 2023-12-01T01:48:34Z

The trick of passing the checkpoint via "start_from_checkpoint" parameter to tune.Tuner().param_space found here doesn't work either :/

Finebouche · 2023-12-01T13:49:41Z

I was able to use tune.run() instead of tune.Tuner().fit() but it stil seems to be not working. The way I asses that is by visualizing an episode run of 3 environement:

The initial one I want to retrieve
an environment after attempt to restore weights
an environment after one step

And 2. and 3. have similar behavior, different from 1.

Side problem is that tune.run is absent from documentation. So I first thought that it was being deprecated. I finally found the info I needed in the function implementation in the repo but wasn't straightforward at all.

Questions still remains:

Is tune.run absent from the docs because it's being deprecated ?
The weight retrieval still doesn't work with tune.run and tune.Tuner().fit() + callbacks but works with Algorithm.from_checkpoint(path_to_checkpoint)

Finebouche · 2023-12-11T13:46:43Z

Also linked to : #40626 #40777 and #37515

Documentation should clearly explain how to do that

Finebouche added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 29, 2023

Finebouche changed the title ~~[<Ray component: Core|RLlib|etc...>]~~ 'RestoreWeightsCallback' example script Nov 29, 2023

Finebouche changed the title ~~'RestoreWeightsCallback' example script~~ 'RestoreWeightsCallback' example script doesn't seem to work anymore ? Nov 29, 2023

Finebouche changed the title ~~'RestoreWeightsCallback' example script doesn't seem to work anymore ?~~ Fails restoring weights Nov 29, 2023

anyscalesam added the rllib RLlib related issues label Dec 2, 2023

sven1977 added P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Dec 18, 2023

justinvyu added the tune Tune-related issues label Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails restoring weights #41508

Fails restoring weights #41508

Finebouche commented Nov 29, 2023 •

edited

Finebouche commented Nov 29, 2023

Finebouche commented Nov 30, 2023 •

edited

Finebouche commented Dec 1, 2023

Finebouche commented Dec 1, 2023

Finebouche commented Dec 1, 2023 •

edited

Finebouche commented Dec 11, 2023 •

edited

Fails restoring weights #41508

Fails restoring weights #41508

Comments

Finebouche commented Nov 29, 2023 • edited

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Finebouche commented Nov 29, 2023

Finebouche commented Nov 30, 2023 • edited

Finebouche commented Dec 1, 2023

Finebouche commented Dec 1, 2023

Finebouche commented Dec 1, 2023 • edited

Finebouche commented Dec 11, 2023 • edited

Finebouche commented Nov 29, 2023 •

edited

Finebouche commented Nov 30, 2023 •

edited

Finebouche commented Dec 1, 2023 •

edited

Finebouche commented Dec 11, 2023 •

edited