Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib][tune] How to save and later use the agent/model #7983

Closed
Rockyyost opened this issue Apr 11, 2020 · 6 comments
Closed

[rllib][tune] How to save and later use the agent/model #7983

Rockyyost opened this issue Apr 11, 2020 · 6 comments
Labels
question Just a question :)

Comments

@Rockyyost
Copy link

What is your question?

I've successfully used Tune to train an RL model, with checkpoints. I'd like to be able to 'save' the model so that I can use it later for inferencing. As an example, I will use an API to bring in data that is affectively my observations and I'd like to get from the saved model the actions.

I've been reading through the documentation and I think I'm starting to build the intuition on how to do this, but, I feel like I'm not quite their yet.

Any help will be greatly appreciated!

@Rockyyost Rockyyost added the question Just a question :) label Apr 11, 2020
@Carlz182
Copy link

What I am using at the moment is a combination of the tune.run call and a customizable trainer function that specifies how often checkpoints are saved, implements a training curriculum and so on.

def train_ppo(config, reporter):
    agent = PPOTrainer(config)
    agent.restore("/path/checkpoint_41/checkpoint-41") #continue training
    #training curriculum, start with phase 0
    phase = 0
    agent.workers.foreach_worker(
            lambda ev: ev.foreach_env(
                lambda env: env.set_phase(phase)))
    episodes = 0
    i = 0
    while True:
        result = agent.train()
        if reporter is None:
            continue
        else:
            reporter(**result)
        if i % 10 == 0: #save every 10th training iteration
            checkpoint_path = agent.save()
            print(checkpoint_path)
        i+=1
        #you can also change the curriculum here

You can use this function either directly by calling train_ppo(config,None) or inside a tune call:

    trainingSteps = 1000000
    trials = tune.run(
        train_ppo,
        config = config,
        resources_per_trial={
            "cpu": 7,
            "gpu": 1,
            "extra_cpu": 0,
        },
        stop={
            "training_iteration": trainingSteps,
        },
        return_trials=True)

Note that when using tune you will end up with 2 folders in your directory. One generated by tune and the other by the trainer function. The latter contains your checkpoints. You can also wrap the training configuration in a Trainer class which implements a train, setup and save function but I have not tried that successfully.

For inference you generate the agent again and load the checkpoints:

        self.agent = PPOTrainer(ppo_config)
        self.agent.restore(checkpoint_path)

Then you can get the action using the pre-trained model like this: agent.compute_action(observation)

Often you don't want to use the training environment for inference because you want to avoid a physics simulation. The easiest ways would be to generate a fake environment with the same observation and action space and provide this to your agent via the config file. ppo_config['env'] = FakeEnv. Then use the real world data as observations.

@Rockyyost
Copy link
Author

Awesome! This advice worked out great for me.

Thank you!

@stefanbschneider
Copy link
Member

stefanbschneider commented Jun 23, 2020

Any way to change the directory to which the agent is saved when calling .save()?

Update: Ok, got it, just pass the path as arg. The path seems to be relative to the experiment directory, which contains some hash code. Any idea how to get the absolute path?

@stefanbschneider
Copy link
Member

Is it possible to restore an agent and continue training it with tune.run() without any custom train_ppo function?

I'm looking for something like this (which doesn't work):

# restore the agent outside of tune.run()
agent = PPOTrainer(config)
agent.restore("/path/checkpoint_41/checkpoint-41")
# then continue training it; this breaks
tune.run(agent, config)

@Catypad
Copy link

Catypad commented Aug 20, 2020

Hello @stefanbschneider! Can you explain how to change the directory where the agent is saved when you define the agent and then use agent.train() to train. Thank you!

@stefanbschneider
Copy link
Member

By setting the local_dir argument in tune.run(): #9123 (comment)

But I think I already resolved my question regarding restoring a trained agent. Thanks anyways!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Just a question :)
Projects
None yet
Development

No branches or pull requests

4 participants