Navigation Menu

Skip to content

Commit

Permalink
"9:35am.
Browse files Browse the repository at this point in the history
```
type Actions =
    | Pass
    | Bet

type Cards =
    | One
    | Two
    | Three

let rng = System.Random()

let knuth_shuffle (ar: _[]) =
    let swap i j =
        let item = ar.[i]
        ar.[i] <- ar.[j]
        ar.[j] <- item

    for i=Array.length ar - 1 downto 1 do swap (rng.Next(i+1)) i

let ar = [|One; Two; Three|]
knuth_shuffle ar; ar
```

The Knuth shuffle is easy enough. I am using the version I saw in the Pharo library here.

Let me move on.

```
compare One Three
```

Actually, I am quite surprised that this gives me -2. I thought that compare only gave -1,0 and 1. Quite interesting.

Maybe it only does that for degenerate unions.

9:50am.

```
type State =
    | Pass
    | PassPass
    | PassBet
    | PassBetPass
    | PassBetBet
    | Bet
    | BetPass
    | BetBet
```

Let me put this here for the time being.

I haven't decided what the states should be yet.

10:15am.

```
let normalize array =
    let temp, normalizing_sum =
        Array.mapFold (fun s x ->
            let strategy = max x 0.0
            strategy, strategy + s
            ) 0.0 array

    let inline f g = for i=0 to temp.Length-1 do temp.[i] <- g temp.[i]
    if normalizing_sum > 0.0 then f (fun x -> x / normalizing_sum)
    else f (fun _ -> 1.0 / float actions.Length)

let add_strategy_sum agent realization_weight x =
    let sum = agent.strategy_sum
    Array.iteri (fun i x -> sum.[i] <- sum.[i] + realization_weight * x) x
```

Now `normalize` will be optimized.

10:25am. Renaming is just so good in VS. It is amazing.

In terms of stability and ease of use, VS is actually better than Pharo's IDE. Pharo needs to take some lessons from it.

Pharo irks:
- Lack of dedent with Shift + Tab
- Tabs rather than spaces
- [Undo](pharo-project/pharo#2814)
- Autocomplete sinking
- [System browser sinking](pharo-project/pharo#2800)
- GToolkit crashing on delete
- Lack of variable popup in debugger
- Meta taking only half of the screen
- [Class revert does not revert methods](pharo-project/pharo#2853)
- All the buggy Roassal examples

I am just gathering some ammo in case I get challenged in the next PL monthly thread.

Note: Highlight my recent renaming experience in VS.

10:35am. Now, enough of that. Let me start work on the CFR function before I get distracted any further.

As expected this example is quite hard, mostly because the code in the paper is so crappy.

I am going to have to do it roughly and then I am going to have to do it cleanly. I am not sure why there aren't two nodeMaps, one for each player.

10:45am.

```
private double cfr(int[] cards, String history, double p0, double p1) {
    int plays = history.length();
    int player = plays % 2;
    int opponent = 1 - player;
    *Return payoff for terminal states*
    String infoSet = cards[player] + history;
    *hGet information set node or create it if nonexistant*
    *For each action, recursively call cfr with additional history and probability*
    *For each action, compute and accumulate counterfactual regret*
    return nodeUtil;
}
```

No, this is just so crappy. I cannot possibly abide by this. I will do it differently.

Let me just ask, is this example really intended to have two players or is it just a single player playing against itself?

10:55am.

```
if (plays > 1) {
    boolean terminalPass = history.charAt(plays - 1) == ’p’;
    boolean doubleBet = history.substring(plays - 2, plays).equals("bb");
    boolean isPlayerCardHigher = cards[player] > cards[opponent];
    if (terminalPass)
        if (history.equals("pp")) return isPlayerCardHigher ? 1 : -1;
        else return 1;
    else if (doubleBet) return isPlayerCardHigher ? 2 : -2;
}
```

I will have to express this in terms of pattern matching.

11am.

```
let cfr history (one : Semblance) (two : Semblance) =
    match history with
    | [Pass; Pass] -> if one.card > two.card then 1.0 else -1.0
    | [Pass; Bet; Pass] -> -1.0
    | [Pass; Bet; Bet] -> if one.card > two.card then 2.0 else -2.0
    | [Bet; Pass] -> 1.0
    | [Bet; Bet] -> if one.card > two.card then 2.0 else -2.0
    | _ ->
```

Something like this should be decent.

11:15am.

```
double[] strategy = node.getStrategy(player == 0 ? p0 : p1);
double[] util = new double[NUM_ACTIONS];
double nodeUtil = 0;
for (int a = 0; a < NUM_ACTIONS; a++) {
    String nextHistory = history + (a == 0 ? "p" : "b");
    util[a] = player == 0
        ? - cfr(cards, nextHistory, p0 * strategy[a], p1)
        : - cfr(cards, nextHistory, p0, p1 * strategy[a]);
    nodeUtil += strategy[a] * util[a];
}
```

This is so convoluted. I have no idea which player is supposed to act here. I am going to have to do it as originally intended.

Let me take off for a bit here."
  • Loading branch information
mrakgr committed Mar 18, 2019
1 parent c52f2df commit 16a397d
Showing 1 changed file with 77 additions and 1 deletion.
78 changes: 77 additions & 1 deletion Learning/CFR/kuhn_poker.fsx
@@ -1 +1,77 @@

open System.Collections.Generic

type Action =
| Pass
| Bet

type Card =
| One
| Two
| Three

let rng = System.Random()

let knuth_shuffle (ar: _[]) =
let swap i j =
let item = ar.[i]
ar.[i] <- ar.[j]
ar.[j] <- item

for i=Array.length ar - 1 downto 1 do swap (rng.Next(i+1)) i

let cards = [|One; Two; Three|]

type Node =
{
strategy_sum: float[]
regret_sum: float[]
}

type Agent = Dictionary<Action list * Card, Node>

let node_map : Agent = Dictionary()

let actions = [|Bet;Pass|]

let normalize array =
let temp, normalizing_sum =
Array.mapFold (fun s x ->
let strategy = max x 0.0
strategy, strategy + s
) 0.0 array

let inline mutate_temp f = for i=0 to temp.Length-1 do temp.[i] <- f temp.[i]
if normalizing_sum > 0.0 then mutate_temp (fun x -> x / normalizing_sum)
else mutate_temp (fun _ -> 1.0 / float actions.Length)
temp

let add_strategy_sum agent realization_weight x =
let sum = agent.strategy_sum
Array.iteri (fun i x -> sum.[i] <- sum.[i] + realization_weight * x) x

type Particle = {card: Card; probability: float}

let cfr history (one : Particle) (two : Particle) =
match history with
| [Pass; Pass] -> if one.card > two.card then 1.0 else -1.0
| [Pass; Bet; Pass] -> -1.0
| [Pass; Bet; Bet] -> if one.card > two.card then 2.0 else -2.0
| [Bet; Pass] -> 1.0
| [Bet; Bet] -> if one.card > two.card then 2.0 else -2.0
| _ ->
let node =
match node_map.TryGetValue((history, one.card)) with
| true, v -> v
| false, _ -> {strategy_sum=Array.zeroCreate actions.Length; regret_sum=Array.zeroCreate actions.Length}

let action_distribution =
0.0

let train num_iterations =
let cards = [|One; Two; Three|]
let mutable util = 0.0
for i=1 to num_iterations do
knuth_shuffle cards
util <- util + cfr [] {card=cards.[0]; probability=1.0} {card=cards.[1]; probability=1.0}
printfn "Average game value: %f" (util / float num_iterations)
printfn "%A" node_map

0 comments on commit 16a397d

Please sign in to comment.