Implementation of discounted cfr and linear cfr #80

ai-gamer · 2019-09-28T04:15:54Z

This is an implementation of discounted cfr and linear cfr. It is tested on goofspiel4.

lanctot · 2019-09-28T05:28:59Z

Hi @ai-gamer, thanks! That file cfr.py is getting quite crowded. Could you put the implementation in a different file, like maybe discounted_cfr.py? It will both keep cfr.py clean and make it easier for people to find discounted + linear CFR.

(I will ask @jblespiau to the same with CFR-BR, which I think should also be separate)

ai-gamer · 2019-09-28T06:40:26Z

I try to make it in discounted_cfr.py and test it in the new cfr_example.py. It gets an error:
from open_spiel.python.algorithms import discounted_cfr
ImportError: cannot import name 'discounted_cfr' from 'open_spiel.python.algorithms' (/Users/Desktop/open_spiel/open_spiel/python/algorithms/init.py)

Sorry, this might be an stupid question, but I don't know why I can't import discounted_cfr. I try to copy the original cfr.py, it can't be imported also

ai-gamer · 2019-09-28T09:26:50Z

I get the reason. I forked when I try to make the pull request and forget add the new python path. I will commit discounted_cfr.py again and make a test file for it.

ai-gamer · 2019-09-28T10:44:37Z

I commit discounted_cfr.py and goofspiel4_discounted_cfr.py again.

jblespiau · 2019-09-30T12:02:00Z

Thanks. I am planning on Merging this this week (maybe I will pull in the few changed lines directly into CFR, not sure yet).

Could you just confirm the paper I have is the correct one for DCFR? Same, could you add a reference for LCFR?

Thanks!

lanctot · 2019-09-30T12:51:18Z

It is this one gor both: https://arxiv.org/abs/1809.04040 (we should cite it in the header if we are not already)

I remember @noambrown telling me his implementation was slightly different than what was in the paper. Noam, could you take a look and recommend any particular settings?

ai-gamer · 2019-09-30T14:36:49Z

As @lanctot mentioned, https://arxiv.org/abs/1809.04040 include both dcfr and lcfr. @jblespiau Maybe it is kind of confusing if putting the few changed lines into cfr.py directly. I will edit the code comment and cite the paper.

noambrown · 2019-09-30T14:46:47Z

I glanced at the code and it looks fine, but I haven't investigated thoroughly or tested it.

jblespiau · 2019-10-01T10:58:23Z

Hi.

I tried merging it, because the CFRBase changed and I needed to add a few changes, and also adding tests.
See
https://paste.ofcode.org/tYV35KDgJD5TQgcLhgF7zh and
https://paste.ofcode.org/35cz639DNGctUANiUXFep2A

When reading the changes from the base-class and documenting them, to check I have correctly understood, I may have found an error in the algorithm implemented.

Note that the cumulative regrets and cumulative policy are incremented in the recursive function _compute_counterfactual_regret_for_player which walk the full tree of histories.
Thus, noting S an information state, and |S| the number of histories going through it, we

add the portion of the cumulative regrets of each h in S during the tree traversal.
add the cumulative policy |S| times, because we enter S multiple times. Each time we add the same value. At the end, we will normalize the cumulative policy, and |S| is independent of the step for a given information step S, so it's a constant factor.

So what we do is:
cumulative_policy(t)(s, a) = |s| * \sum_{k=1}^t policy(t)(s, a) * player_reach_prob * t
and then we normalize.

I think that what this CL is doing, is assuming we do sequentially:

Update the cumulative regrets and average policy
Then, we can multiply these by the factor in the paper, using alpha, beta and gamma.

but this is not true (as the implementation we have is doing everything as the same time).

To rephrase everything using a sentence from the paper:

"The first algorithm, which we refer to as linear CFR (LCFR), is identical
to CFR, except on iteration t the updates to the regrets and
average strategies are given weight t. That is, the iterates
are weighed linearly. (Equivalently, one could multiply the
accumulated regret by t
t+1 on each iteration. We do this in
our experiments to reduce the risk of numerical instability."

In our implementation, we should multiply by t, not by (t / t+1). To do the second version, we need to make the update step outside of the recursive function.

Does what I am trying to explain make sense? Is that then indeed correct that this CL is incorrectly implementing DCFR and LCFR, and that the cumulative values are multiplied several times per traversal, and that it is interleaved with the updates?

I think we can fix that for the cumulative policy, doing:

info_state_node.cumulative_policy[action] += (reach_prob * action_prob * ( self._iteration**self.gamma))

For the cumulative_regrets, we should do a second step after the recursive function, as in the RegretMatching Plus implementation for CFR+ (e.g. the _apply_regret_matching_plus_reset(self._info_state_nodes) line).

Here is what I have been writing to give an idea of what I think should be working:

https://paste.ofcode.org/6sPLn6Q4p27w5gPkJgvvza (see line 133 and 154 to 161). One thing is incorrect, the update line 154 should be done only for the current player nodes.

What do you think?

ai-gamer · 2019-10-04T02:55:43Z

Sorry for the late response @jblespiau. I am taking a trip outside. I think what you said make sense to me. Thanks for taking the time to correct the mistake. I will go through the code carefully when I am back and send you a message if I find something confusing.

jblespiau · 2019-10-04T09:10:26Z

Perfect. No rush, you can take your time :) I will be OOO for 2 weeks too.

ai-gamer · 2019-10-12T08:06:03Z

I edited the code @jblespiau shared a bit to make the update for the current player. I tested it on goofspiel5 with discounted cfr and cfr+, it seems get similar results as the original paper. Could you please have a look at the code to check it? https://paste.ofcode.org/fXpz5u78pWwUSQTK3nCSf8. Please mainly focus on line 149 to 157. @lanctot

lanctot · 2019-10-12T15:38:56Z

Thanks @ai-gamer , unfortunately I am very busy at the moment. I will let @jblespiau continue to handle this one. He is still on vacation, but I will remind him when he is back from vacation and we'll merge it then. Thanks a lot for your contribution!

ai-gamer · 2019-10-13T00:21:28Z

Thanks a lot for the reply. No problem. It's not urgent at all. @lanctot

jblespiau · 2019-10-24T15:44:40Z

Thanks. I merged it internally, and the PR will be close automatically on our next push to Github.

I improved the suggested code by:
(a) having a per-player list of the nodes to update, to prevent iterating over all the nodes
(b) Using a specific field of the information_state to get the player was a hack which was not going to work with other gams, so I am now calling state.CurrentPlayer()

ai-gamer · 2019-10-25T00:43:21Z

Great!!

PiperOrigin-RevId: 276643426 Change-Id: I9eba53eadfebb38582feb9cbdc04b160df82886d

ai-gamer added 2 commits September 28, 2019 11:34

Implementation of discounted cfr and linear cfr

ebe75d0

Add comments to dcfr implementation

e2ef2fe

googlebot added the cla: yes label Sep 28, 2019

dcfr and test

c5b29a0

ai-gamer added 4 commits September 28, 2019 18:23

Implement dcfr in seperate file and add an exmaple of using

e790d3d

Merge branch 'master' of https://github.com/ai-gamer/open_spiel

7e86bda

checkout cfr.py and cfr_example.py to the original version

96a181d

edit goofspiel4 discount cfr exmaple

f5804b8

jblespiau added invalid This doesn't seem right jblespiau This is being reviewed by jblespiau@ labels Oct 1, 2019

ai-gamer added 2 commits October 8, 2019 11:37

Merge remote-tracking branch 'update_stream/master'

fe3c75b

Merge remote-tracking branch 'upstream/master'

b1e3c89

fix bugs and add comments of discounted_cfr

fcc3f7e

jblespiau removed the invalid This doesn't seem right label Oct 23, 2019

OpenSpiel pushed a commit that referenced this pull request Oct 28, 2019

Merge pull request #80 from ai-gamer:master

f0e9c32

PiperOrigin-RevId: 276643426 Change-Id: I9eba53eadfebb38582feb9cbdc04b160df82886d

OpenSpiel merged commit fcc3f7e into google-deepmind:master Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of discounted cfr and linear cfr #80

Implementation of discounted cfr and linear cfr #80

ai-gamer commented Sep 28, 2019

lanctot commented Sep 28, 2019

ai-gamer commented Sep 28, 2019

ai-gamer commented Sep 28, 2019 •

edited

ai-gamer commented Sep 28, 2019

jblespiau commented Sep 30, 2019

lanctot commented Sep 30, 2019

ai-gamer commented Sep 30, 2019

noambrown commented Sep 30, 2019

jblespiau commented Oct 1, 2019

ai-gamer commented Oct 4, 2019

jblespiau commented Oct 4, 2019

ai-gamer commented Oct 12, 2019 •

edited

lanctot commented Oct 12, 2019

ai-gamer commented Oct 13, 2019

jblespiau commented Oct 24, 2019

ai-gamer commented Oct 25, 2019

Implementation of discounted cfr and linear cfr #80

Implementation of discounted cfr and linear cfr #80

Conversation

ai-gamer commented Sep 28, 2019

lanctot commented Sep 28, 2019

ai-gamer commented Sep 28, 2019

ai-gamer commented Sep 28, 2019 • edited

ai-gamer commented Sep 28, 2019

jblespiau commented Sep 30, 2019

lanctot commented Sep 30, 2019

ai-gamer commented Sep 30, 2019

noambrown commented Sep 30, 2019

jblespiau commented Oct 1, 2019

ai-gamer commented Oct 4, 2019

jblespiau commented Oct 4, 2019

ai-gamer commented Oct 12, 2019 • edited

lanctot commented Oct 12, 2019

ai-gamer commented Oct 13, 2019

jblespiau commented Oct 24, 2019

ai-gamer commented Oct 25, 2019

ai-gamer commented Sep 28, 2019 •

edited

ai-gamer commented Oct 12, 2019 •

edited