Replies: 2 comments
-
I'm tagging Matteo which is our PoC for MARL things and the owner of the mappo_ippo script! |
Beta Was this translation helpful? Give feedback.
0 replies
-
It could be that the reward for not colliding is taking over and preventing the navigation success. The reward increasing is a good sign in general in case the reward function makes sense. But in general I am not able to make diagnostic comments about your custome environment, sorry. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've design my custom navigation env that have obstacles on it and the agents don't hit each other.
when I run my mappo_ippo.py I got strange outputs. can my model overfitted?
my custom_env is:
https://drive.google.com/file/d/1yw1rOpJcmoU99zcz-2wEGqV_ZnfOT1qF/view?usp=sharing
my config of mappo_ippo is:
max_steps:200
n_iters:625
n_agents and n_targets:3
backend:csv
entropy_eps:0.0001
remain confs is the same.
when I look at my csv and my videos they are surprising:
in my video the 20 first epochs they reach the goals very easy but after that episode they stop.
in my csv the train_mean_reward increasing non-stoply but my critic loss also increasing.
is this meaning my model overfitted?
@matteobettini
Beta Was this translation helpful? Give feedback.
All reactions