Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about population-based algorithms #1114

Closed
Root970103 opened this issue Aug 30, 2023 · 4 comments
Closed

Some questions about population-based algorithms #1114

Root970103 opened this issue Aug 30, 2023 · 4 comments
Labels
question Further information is requested

Comments

@Root970103
Copy link

Thank you for your contribution to provide population-based algorithms, such as fictitious play, PSRO and so on. The examples you provided show the nash_conv value during the training process. I still have a question about how to evaluate the algorithm.

@Root970103
Copy link
Author

Specifically, I want to know how the performance of the algorithm can be evaluated. Similar to the evaluation of reinforcement learning algorithms can be represented by the reward obtained from the environment. Taking leduc poker as an example, how to prove the algorithm is effective? After the model training is completed, what we expect to save, rl model or the policy. I am not completely famaliar with this. I hope you can give some advice, thank you!

@lanctot
Copy link
Collaborator

lanctot commented Aug 31, 2023

Hi @Root970103,

If you're using PSRO or some form of fictitious play, the thing you save is either the average strategy, or the entire set of policies coupled with the meta-strategy. The latter can be turned into one policy using the policy_aggregator (if the game is small enough).

A good place to start is this example: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/examples/psro_v2_example.py

Hope this helps, but please don't hesitate to ask more questions if it's not clear.

@Root970103
Copy link
Author

Hi @Root970103,

If you're using PSRO or some form of fictitious play, the thing you save is either the average strategy, or the entire set of policies coupled with the meta-strategy. The latter can be turned into one policy using the policy_aggregator (if the game is small enough).

A good place to start is this example: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/examples/psro_v2_example.py

Hope this helps, but please don't hesitate to ask more questions if it's not clear.

Thank you for your reply! I have run this example script. And I obeserved the changes in nash_conv.

I also wonder if I can used the trained model (or policy ) against other algorithms in leduc poker env. For example, I want to test the trained model against CFR, the entir policies or the aggregate policy should be used?

In addition, in an adversarial scenario, is it appropriate to use the q value to evaluate the algorithms?

It's very kind of you to give these advices.

@lanctot lanctot added the question Further information is requested label Oct 7, 2023
@lanctot
Copy link
Collaborator

lanctot commented Oct 7, 2023

I also wonder if I can used the trained model (or policy ) against other algorithms in leduc poker env. For example, I want to test the trained model against CFR, the entir policies or the aggregate policy should be used?

Yes, you can extract the policy (that is what the NashConv computation needs) and you can simulate the policy against CFR's policy.

In addition, in an adversarial scenario, is it appropriate to use the q value to evaluate the algorithms?

Q-values are just estimates of values for a state and action. You can turn that into a policy by choosing argmax Q(s,a) but these will be deterministic. So if the environment requires any kind of mixing, you would lose that if you argmax over the Q-values.

@lanctot lanctot closed this as completed Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants