New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Saved Model as Enemy Policy in Custom Environment (while training in a subprocvecenv) #835
Comments
Hello, |
I did something similar here, where the opponent in the env loads up new policy at random times upon environment reset. You can use |
Yeah we've done something similar, the most relevant class is CurryVecEnv |
Cheers,
|
@AdamGleave you;re repository seems like the ultimate way to go leading forward for my task. I have a few questions
|
@lukepolson
|
The way we have it set up is that there are multiple players per environment. The observation and action spaces are n-tuples where n is the number of player. So each environment is still independent. We just use multiple environments to speed up training. |
@AdamGleave thank you for the clarification. So if I'm correct, the venv object is just a collection of objects, each of which is a multi agent environment? If this is correct then in the line
observations would be an array that looks like
supposing that there were 3 agents (and the length of axis=0 would be the number of vectorized environments). Is this the correct? |
Yes, each environment in the |
@AdamGleave I just updated my comment above. |
Yeah, you have the right idea. You can see more information in our definition of MultiAgentEnv. Each environment in the |
Thanks a bunch for all your help @AdamGleave ! I appreciate the organization of your code as well. Before I go on a coding spree, I want to make sure I have the architecture of your package well summarized.
Now say I want to train all 4 agents at once using one or more different policies in an adversarial sort of way. Furthermore, suppose that some agents can die (i.e. snake dies and is removed from the board) so that at some points in the game you might only have 2 or 3 agents on the board at a time. Is your code well structured for this sort of problem. If not I'm debating switching to some sort of alternative software package like RLLIB. IF you think there's some other architecture that is particularily well suited for this task let me know as well :) |
My code is set up to expect a
Yeah, that's right.
Kind of.
The code isn't really set up for training multiple agents simultaneously. It wouldn't be hard to make your own interface to do this though. But I've heard RLLib has good multi-agent support. Note Stable Baselines doesn't implement multi-agent RL algorithms. If you have 4 agents I'm not sure self-play is going to converge to anything (normal guarantees are only in 2-player zero-sum games). |
@Miffyli I was looking at your solution and implemented it in my code. I'm finding that doing this for 4 agents is relatively slow for my code.
Is the |
@AdamGleave you mention that after 1 application of Curry Environment you fix one agent and get a VecMultiEnv with one fewer agent. I suppose then the proper protocol for three enemies on the board would be to apply
I suspect I'll just modify the CurryVecEnv to input multiple agent indices and take in multiple policies so that I can get rid of all three at once (unless you already have a class for this). Now @Miffyli mentions that the CurryVecEnv implements one agent to create actions for all environments in the subprocenv (for each enemy). How does this work with paralellizing code? Does 16 vs. 8 environments in a subprocenv then take twice as long? Thanks again for answering all my questions. Still relatively new to machine learning and reinforcement learning in general. Tried using tf_agents for a month or two but found the documentation was somewhat lacking. The help here has been much appreciated! |
Yeah, my solution is not very optimized as there are separate agents for each env, each running at their own paces. The I recommend going the Adam's way here, and try to create one agent that |
I am currently training in an environment that has multiple agents. In this case there are multiple snakes all on the same 11x11 grid moving around and eating food. There is one "player" snake and three "enemy" snakes. Every 1 million training steps I want to save the player model and update the enemies in such a way that they use that model to make their movements.
I can (sort of) do this in a SubprocVecEnv by importing tensorflow each time I call the update policy method of the Snake game class (which inherits from the gym environment class).
I consider this a hackish method because it imports tensorflow into each child process (of the subprocvecenv) every time the enemy policy is updated.
I use this hackish approach because I cannot simply pass in the
model=A2C.load(policy_path)
into some sort of callback as these models can't be pickled.Is there a standard solution for this sort of problem?
The text was updated successfully, but these errors were encountered: