Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of discounted cfr and linear cfr #80

Merged
merged 10 commits into from
Oct 28, 2019

Conversation

ai-gamer
Copy link
Contributor

This is an implementation of discounted cfr and linear cfr. It is tested on goofspiel4.

@lanctot
Copy link
Collaborator

lanctot commented Sep 28, 2019

Hi @ai-gamer, thanks! That file cfr.py is getting quite crowded. Could you put the implementation in a different file, like maybe discounted_cfr.py? It will both keep cfr.py clean and make it easier for people to find discounted + linear CFR.

(I will ask @jblespiau to the same with CFR-BR, which I think should also be separate)

@ai-gamer
Copy link
Contributor Author

I try to make it in discounted_cfr.py and test it in the new cfr_example.py. It gets an error:
from open_spiel.python.algorithms import discounted_cfr
ImportError: cannot import name 'discounted_cfr' from 'open_spiel.python.algorithms' (/Users/Desktop/open_spiel/open_spiel/python/algorithms/init.py)

Sorry, this might be an stupid question, but I don't know why I can't import discounted_cfr. I try to copy the original cfr.py, it can't be imported also

@ai-gamer
Copy link
Contributor Author

ai-gamer commented Sep 28, 2019

I get the reason. I forked when I try to make the pull request and forget add the new python path. I will commit discounted_cfr.py again and make a test file for it.

@ai-gamer
Copy link
Contributor Author

I commit discounted_cfr.py and goofspiel4_discounted_cfr.py again.

@jblespiau
Copy link
Collaborator

Thanks. I am planning on Merging this this week (maybe I will pull in the few changed lines directly into CFR, not sure yet).

Could you just confirm the paper I have is the correct one for DCFR? Same, could you add a reference for LCFR?

Thanks!

@lanctot
Copy link
Collaborator

lanctot commented Sep 30, 2019

It is this one gor both: https://arxiv.org/abs/1809.04040 (we should cite it in the header if we are not already)

I remember @noambrown telling me his implementation was slightly different than what was in the paper. Noam, could you take a look and recommend any particular settings?

@ai-gamer
Copy link
Contributor Author

As @lanctot mentioned, https://arxiv.org/abs/1809.04040 include both dcfr and lcfr. @jblespiau Maybe it is kind of confusing if putting the few changed lines into cfr.py directly. I will edit the code comment and cite the paper.

@noambrown
Copy link

I glanced at the code and it looks fine, but I haven't investigated thoroughly or tested it.

@jblespiau
Copy link
Collaborator

Hi.

I tried merging it, because the CFRBase changed and I needed to add a few changes, and also adding tests.
See
https://paste.ofcode.org/tYV35KDgJD5TQgcLhgF7zh and
https://paste.ofcode.org/35cz639DNGctUANiUXFep2A

When reading the changes from the base-class and documenting them, to check I have correctly understood, I may have found an error in the algorithm implemented.

  1. Note that the cumulative regrets and cumulative policy are incremented in the recursive function _compute_counterfactual_regret_for_player which walk the full tree of histories.
    Thus, noting S an information state, and |S| the number of histories going through it, we
  • add the portion of the cumulative regrets of each h in S during the tree traversal.
  • add the cumulative policy |S| times, because we enter S multiple times. Each time we add the same value. At the end, we will normalize the cumulative policy, and |S| is independent of the step for a given information step S, so it's a constant factor.

So what we do is:
cumulative_policy(t)(s, a) = |s| * \sum_{k=1}^t policy(t)(s, a) * player_reach_prob * t
and then we normalize.

  1. I think that what this CL is doing, is assuming we do sequentially:
  • Update the cumulative regrets and average policy
  • Then, we can multiply these by the factor in the paper, using alpha, beta and gamma.

but this is not true (as the implementation we have is doing everything as the same time).

To rephrase everything using a sentence from the paper:

"The first algorithm, which we refer to as linear CFR (LCFR), is identical
to CFR, except on iteration t the updates to the regrets and
average strategies are given weight t. That is, the iterates
are weighed linearly. (Equivalently, one could multiply the
accumulated regret by t
t+1 on each iteration. We do this in
our experiments to reduce the risk of numerical instability."

In our implementation, we should multiply by t, not by (t / t+1). To do the second version, we need to make the update step outside of the recursive function.

Does what I am trying to explain make sense? Is that then indeed correct that this CL is incorrectly implementing DCFR and LCFR, and that the cumulative values are multiplied several times per traversal, and that it is interleaved with the updates?

I think we can fix that for the cumulative policy, doing:

info_state_node.cumulative_policy[action] += (reach_prob * action_prob * ( self._iteration**self.gamma))

For the cumulative_regrets, we should do a second step after the recursive function, as in the RegretMatching Plus implementation for CFR+ (e.g. the _apply_regret_matching_plus_reset(self._info_state_nodes) line).

Here is what I have been writing to give an idea of what I think should be working:

https://paste.ofcode.org/6sPLn6Q4p27w5gPkJgvvza (see line 133 and 154 to 161). One thing is incorrect, the update line 154 should be done only for the current player nodes.

What do you think?

@jblespiau jblespiau added invalid This doesn't seem right jblespiau This is being reviewed by jblespiau@ labels Oct 1, 2019
@ai-gamer
Copy link
Contributor Author

ai-gamer commented Oct 4, 2019

Sorry for the late response @jblespiau. I am taking a trip outside. I think what you said make sense to me. Thanks for taking the time to correct the mistake. I will go through the code carefully when I am back and send you a message if I find something confusing.

@jblespiau
Copy link
Collaborator

Perfect. No rush, you can take your time :) I will be OOO for 2 weeks too.

@ai-gamer
Copy link
Contributor Author

ai-gamer commented Oct 12, 2019

I edited the code @jblespiau shared a bit to make the update for the current player. I tested it on goofspiel5 with discounted cfr and cfr+, it seems get similar results as the original paper. Could you please have a look at the code to check it? https://paste.ofcode.org/fXpz5u78pWwUSQTK3nCSf8. Please mainly focus on line 149 to 157. @lanctot

@lanctot
Copy link
Collaborator

lanctot commented Oct 12, 2019

Thanks @ai-gamer , unfortunately I am very busy at the moment. I will let @jblespiau continue to handle this one. He is still on vacation, but I will remind him when he is back from vacation and we'll merge it then. Thanks a lot for your contribution!

@ai-gamer
Copy link
Contributor Author

Thanks a lot for the reply. No problem. It's not urgent at all. @lanctot

@jblespiau jblespiau removed the invalid This doesn't seem right label Oct 23, 2019
@jblespiau
Copy link
Collaborator

Thanks. I merged it internally, and the PR will be close automatically on our next push to Github.

I improved the suggested code by:
(a) having a per-player list of the nodes to update, to prevent iterating over all the nodes
(b) Using a specific field of the information_state to get the player was a hack which was not going to work with other gams, so I am now calling state.CurrentPlayer()

@ai-gamer
Copy link
Contributor Author

Great!!

OpenSpiel pushed a commit that referenced this pull request Oct 28, 2019
PiperOrigin-RevId: 276643426
Change-Id: I9eba53eadfebb38582feb9cbdc04b160df82886d
@OpenSpiel OpenSpiel merged commit fcc3f7e into google-deepmind:master Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes jblespiau This is being reviewed by jblespiau@
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants