-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of discounted cfr and linear cfr #80
Conversation
Hi @ai-gamer, thanks! That file (I will ask @jblespiau to the same with CFR-BR, which I think should also be separate) |
I try to make it in discounted_cfr.py and test it in the new cfr_example.py. It gets an error: Sorry, this might be an stupid question, but I don't know why I can't import discounted_cfr. I try to copy the original cfr.py, it can't be imported also |
I get the reason. I forked when I try to make the pull request and forget add the new python path. I will commit discounted_cfr.py again and make a test file for it. |
I commit discounted_cfr.py and goofspiel4_discounted_cfr.py again. |
Thanks. I am planning on Merging this this week (maybe I will pull in the few changed lines directly into CFR, not sure yet). Could you just confirm the paper I have is the correct one for DCFR? Same, could you add a reference for LCFR? Thanks! |
It is this one gor both: https://arxiv.org/abs/1809.04040 (we should cite it in the header if we are not already) I remember @noambrown telling me his implementation was slightly different than what was in the paper. Noam, could you take a look and recommend any particular settings? |
As @lanctot mentioned, https://arxiv.org/abs/1809.04040 include both dcfr and lcfr. @jblespiau Maybe it is kind of confusing if putting the few changed lines into cfr.py directly. I will edit the code comment and cite the paper. |
I glanced at the code and it looks fine, but I haven't investigated thoroughly or tested it. |
Hi. I tried merging it, because the CFRBase changed and I needed to add a few changes, and also adding tests. When reading the changes from the base-class and documenting them, to check I have correctly understood, I may have found an error in the algorithm implemented.
So what we do is:
but this is not true (as the implementation we have is doing everything as the same time). To rephrase everything using a sentence from the paper: "The first algorithm, which we refer to as linear CFR (LCFR), is identical In our implementation, we should multiply by t, not by (t / t+1). To do the second version, we need to make the update step outside of the recursive function. Does what I am trying to explain make sense? Is that then indeed correct that this CL is incorrectly implementing DCFR and LCFR, and that the cumulative values are multiplied several times per traversal, and that it is interleaved with the updates? I think we can fix that for the cumulative policy, doing: info_state_node.cumulative_policy[action] += (reach_prob * action_prob * ( self._iteration**self.gamma)) For the cumulative_regrets, we should do a second step after the recursive function, as in the RegretMatching Plus implementation for CFR+ (e.g. the Here is what I have been writing to give an idea of what I think should be working: https://paste.ofcode.org/6sPLn6Q4p27w5gPkJgvvza (see line 133 and 154 to 161). One thing is incorrect, the update line 154 should be done only for the current player nodes. What do you think? |
Sorry for the late response @jblespiau. I am taking a trip outside. I think what you said make sense to me. Thanks for taking the time to correct the mistake. I will go through the code carefully when I am back and send you a message if I find something confusing. |
Perfect. No rush, you can take your time :) I will be OOO for 2 weeks too. |
I edited the code @jblespiau shared a bit to make the update for the current player. I tested it on goofspiel5 with discounted cfr and cfr+, it seems get similar results as the original paper. Could you please have a look at the code to check it? https://paste.ofcode.org/fXpz5u78pWwUSQTK3nCSf8. Please mainly focus on line 149 to 157. @lanctot |
Thanks @ai-gamer , unfortunately I am very busy at the moment. I will let @jblespiau continue to handle this one. He is still on vacation, but I will remind him when he is back from vacation and we'll merge it then. Thanks a lot for your contribution! |
Thanks a lot for the reply. No problem. It's not urgent at all. @lanctot |
Thanks. I merged it internally, and the PR will be close automatically on our next push to Github. I improved the suggested code by: |
Great!! |
PiperOrigin-RevId: 276643426 Change-Id: I9eba53eadfebb38582feb9cbdc04b160df82886d
This is an implementation of discounted cfr and linear cfr. It is tested on goofspiel4.