Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding inferencing the learnt policy #69

Closed
SiddSS opened this issue Nov 27, 2022 · 4 comments
Closed

Regarding inferencing the learnt policy #69

SiddSS opened this issue Nov 27, 2022 · 4 comments

Comments

@SiddSS
Copy link

SiddSS commented Nov 27, 2022

Hi,
We have created our custom environment for and wrapped it in a gym class. After training using MAPPO, we got the .pkl files. Can you elaborate upon how to inference the learned policy ?
We already have a visualization of the env using pygame and just want to load the learned policies and see them play.

Thanks in advance.

@Theohhhu
Copy link
Collaborator

Theohhhu commented Nov 29, 2022

It is doable. However, MARLlib decides not to incorporate the loading and rendering functions as we find it hard to unify all ten environments to render in a similar pattern.

We would like to provide you with instructions on how to implement this.
You can find a example for rendering here.
Also, the way to load the checkpoint is adding a restore path in tune.run(restore=YourCheckPointPath)
The complete configuration can be found in Trainer.

There is a thorough solution provided by Sven: multiagent-load-only-one-policy-from-checkpoint.

Any further question is welcome. We are happy to help you out.

@SiddSS
Copy link
Author

SiddSS commented Dec 11, 2022

Hi Thanks for the previous answer. But we have been unable to use the learned policy to compute actions for our agents. Our objective is to compute agent's actions based on the learnt policy. But when we use the function agent.compute_single_action(obs), where obs = env.reset()
We get the error that seq_lens in None type. We are not able to find where to find the sequence lengths to resolve this error. We added some print statements in the training and could observe the sequence lengths being printed there but compute_single_action does not seem to be workking with that.

It would be really helpful if you could provide some insights for the same. Also kindly let us know if we should be using some function other than agent.compute_single_action for the same purpose.

@Theohhhu
Copy link
Collaborator

Theohhhu commented Jan 17, 2023

Hi SiddSS,

Sorry for the late reply. Check out mamujoco example and mpe example for loading the checkpoint and rendering the environment in MARLlib.
You are welcomed for any further question.

Siyi

@Theohhhu
Copy link
Collaborator

Theohhhu commented Mar 3, 2023

New APIs are here for guiding how to render the pretrained model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants