You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Actually, I have some personal thoughts about the evaluation worker, which should work quite differently from normal rollout workers.
First, the evaluation worker should not sample data in asynchronous manners, which will cause a waste of computing resources. Instead, it is supposed to wait for the training manager to send signals along with policy parameters to be evaluated. Maybe it has to maintain a local parameter buffer to handle the case when new signals coming in during evaluation time.
Second, the information received from the training manager should contain the corresponding (training) epoch number, so the evaluation worker can log the evaluation metrics with the training epoch rather than the evaluator's local sample epoch.
The changes from PR #12 added a new feature to do policy evaluation, while it was ignored. Polishment is required
settings
malib/malib/settings.py
Line 95 in 9efda1b
parameter
malib/malib/rollout/rollout_worker.py
Line 38 in 9efda1b
some related logics
malib/malib/rollout/base_worker.py
Line 162 in 9efda1b
malib/malib/rollout/base_worker.py
Line 104 in 9efda1b
malib/malib/rollout/base_worker.py
Line 315 in 9efda1b
The text was updated successfully, but these errors were encountered: