Some questions about population-based algorithms #1114

Root970103 · 2023-08-30T07:32:10Z

Thank you for your contribution to provide population-based algorithms, such as fictitious play, PSRO and so on. The examples you provided show the nash_conv value during the training process. I still have a question about how to evaluate the algorithm.

Root970103 · 2023-08-30T07:39:15Z

Specifically, I want to know how the performance of the algorithm can be evaluated. Similar to the evaluation of reinforcement learning algorithms can be represented by the reward obtained from the environment. Taking leduc poker as an example, how to prove the algorithm is effective? After the model training is completed, what we expect to save, rl model or the policy. I am not completely famaliar with this. I hope you can give some advice, thank you!

lanctot · 2023-08-31T14:23:22Z

Hi @Root970103,

If you're using PSRO or some form of fictitious play, the thing you save is either the average strategy, or the entire set of policies coupled with the meta-strategy. The latter can be turned into one policy using the policy_aggregator (if the game is small enough).

A good place to start is this example: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/examples/psro_v2_example.py

Hope this helps, but please don't hesitate to ask more questions if it's not clear.

Root970103 · 2023-09-01T07:13:47Z

Hi @Root970103,

If you're using PSRO or some form of fictitious play, the thing you save is either the average strategy, or the entire set of policies coupled with the meta-strategy. The latter can be turned into one policy using the policy_aggregator (if the game is small enough).

A good place to start is this example: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/examples/psro_v2_example.py

Hope this helps, but please don't hesitate to ask more questions if it's not clear.

Thank you for your reply! I have run this example script. And I obeserved the changes in nash_conv.

I also wonder if I can used the trained model (or policy ) against other algorithms in leduc poker env. For example, I want to test the trained model against CFR, the entir policies or the aggregate policy should be used?

In addition, in an adversarial scenario, is it appropriate to use the q value to evaluate the algorithms？

It's very kind of you to give these advices.

lanctot · 2023-10-07T11:29:54Z

I also wonder if I can used the trained model (or policy ) against other algorithms in leduc poker env. For example, I want to test the trained model against CFR, the entir policies or the aggregate policy should be used?

Yes, you can extract the policy (that is what the NashConv computation needs) and you can simulate the policy against CFR's policy.

In addition, in an adversarial scenario, is it appropriate to use the q value to evaluate the algorithms？

Q-values are just estimates of values for a state and action. You can turn that into a policy by choosing argmax Q(s,a) but these will be deterministic. So if the environment requires any kind of mixing, you would lose that if you argmax over the Q-values.

lanctot added the question Further information is requested label Oct 7, 2023

lanctot closed this as completed Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about population-based algorithms #1114

Some questions about population-based algorithms #1114

Root970103 commented Aug 30, 2023

Root970103 commented Aug 30, 2023

lanctot commented Aug 31, 2023

Root970103 commented Sep 1, 2023

lanctot commented Oct 7, 2023

Some questions about population-based algorithms #1114

Some questions about population-based algorithms #1114

Comments

Root970103 commented Aug 30, 2023

Root970103 commented Aug 30, 2023

lanctot commented Aug 31, 2023

Root970103 commented Sep 1, 2023

lanctot commented Oct 7, 2023