"9:35am.

``` type Actions = | Pass | Bet type Cards = | One | Two | Three let rng = System.Random() let knuth_shuffle (ar: _[]) = let swap i j = let item = ar.[i] ar.[i] <- ar.[j] ar.[j] <- item for i=Array.length ar - 1 downto 1 do swap (rng.Next(i+1)) i let ar = [|One; Two; Three|] knuth_shuffle ar; ar ``` The Knuth shuffle is easy enough. I am using the version I saw in the Pharo library here. Let me move on. ``` compare One Three ``` Actually, I am quite surprised that this gives me -2. I thought that compare only gave -1,0 and 1. Quite interesting. Maybe it only does that for degenerate unions. 9:50am. ``` type State = | Pass | PassPass | PassBet | PassBetPass | PassBetBet | Bet | BetPass | BetBet ``` Let me put this here for the time being. I haven't decided what the states should be yet. 10:15am. ``` let normalize array = let temp, normalizing_sum = Array.mapFold (fun s x -> let strategy = max x 0.0 strategy, strategy + s ) 0.0 array let inline f g = for i=0 to temp.Length-1 do temp.[i] <- g temp.[i] if normalizing_sum > 0.0 then f (fun x -> x / normalizing_sum) else f (fun _ -> 1.0 / float actions.Length) let add_strategy_sum agent realization_weight x = let sum = agent.strategy_sum Array.iteri (fun i x -> sum.[i] <- sum.[i] + realization_weight * x) x ``` Now `normalize` will be optimized. 10:25am. Renaming is just so good in VS. It is amazing. In terms of stability and ease of use, VS is actually better than Pharo's IDE. Pharo needs to take some lessons from it. Pharo irks: - Lack of dedent with Shift + Tab - Tabs rather than spaces - [Undo](pharo-project/pharo#2814) - Autocomplete sinking - [System browser sinking](pharo-project/pharo#2800) - GToolkit crashing on delete - Lack of variable popup in debugger - Meta taking only half of the screen - [Class revert does not revert methods](pharo-project/pharo#2853) - All the buggy Roassal examples I am just gathering some ammo in case I get challenged in the next PL monthly thread. Note: Highlight my recent renaming experience in VS. 10:35am. Now, enough of that. Let me start work on the CFR function before I get distracted any further. As expected this example is quite hard, mostly because the code in the paper is so crappy. I am going to have to do it roughly and then I am going to have to do it cleanly. I am not sure why there aren't two nodeMaps, one for each player. 10:45am. ``` private double cfr(int[] cards, String history, double p0, double p1) { int plays = history.length(); int player = plays % 2; int opponent = 1 - player; *Return payoff for terminal states* String infoSet = cards[player] + history; *hGet information set node or create it if nonexistant* *For each action, recursively call cfr with additional history and probability* *For each action, compute and accumulate counterfactual regret* return nodeUtil; } ``` No, this is just so crappy. I cannot possibly abide by this. I will do it differently. Let me just ask, is this example really intended to have two players or is it just a single player playing against itself? 10:55am. ``` if (plays > 1) { boolean terminalPass = history.charAt(plays - 1) == ’p’; boolean doubleBet = history.substring(plays - 2, plays).equals("bb"); boolean isPlayerCardHigher = cards[player] > cards[opponent]; if (terminalPass) if (history.equals("pp")) return isPlayerCardHigher ? 1 : -1; else return 1; else if (doubleBet) return isPlayerCardHigher ? 2 : -2; } ``` I will have to express this in terms of pattern matching. 11am. ``` let cfr history (one : Semblance) (two : Semblance) = match history with | [Pass; Pass] -> if one.card > two.card then 1.0 else -1.0 | [Pass; Bet; Pass] -> -1.0 | [Pass; Bet; Bet] -> if one.card > two.card then 2.0 else -2.0 | [Bet; Pass] -> 1.0 | [Bet; Bet] -> if one.card > two.card then 2.0 else -2.0 | _ -> ``` Something like this should be decent. 11:15am. ``` double[] strategy = node.getStrategy(player == 0 ? p0 : p1); double[] util = new double[NUM_ACTIONS]; double nodeUtil = 0; for (int a = 0; a < NUM_ACTIONS; a++) { String nextHistory = history + (a == 0 ? "p" : "b"); util[a] = player == 0 ? - cfr(cards, nextHistory, p0 * strategy[a], p1) : - cfr(cards, nextHistory, p0, p1 * strategy[a]); nodeUtil += strategy[a] * util[a]; } ``` This is so convoluted. I have no idea which player is supposed to act here. I am going to have to do it as originally intended. Let me take off for a bit here."
mrakgr · Mar 18, 2019 · 16a397d · 16a397d
1 parent c52f2df
commit 16a397d
Showing 1 changed file with 77 additions and 1 deletion.
diff --git a/Learning/CFR/kuhn_poker.fsx b/Learning/CFR/kuhn_poker.fsx
@@ -1 +1,77 @@
-
+open System.Collections.Generic
+
+type Action =
+    | Pass
+    | Bet
+
+type Card =
+    | One
+    | Two
+    | Three
+
+let rng = System.Random()
+
+let knuth_shuffle (ar: _[]) =
+    let swap i j =
+        let item = ar.[i]
+        ar.[i] <- ar.[j]
+        ar.[j] <- item
+
+    for i=Array.length ar - 1 downto 1 do swap (rng.Next(i+1)) i
+
+let cards = [|One; Two; Three|]
+
+type Node = 
+    {
+    strategy_sum: float[]
+    regret_sum: float[]
+    }
+
+type Agent = Dictionary<Action list * Card, Node>
+
+let node_map : Agent = Dictionary()
+
+let actions = [|Bet;Pass|]
+
+let normalize array = 
+    let temp, normalizing_sum =
+        Array.mapFold (fun s x ->
+            let strategy = max x 0.0
+            strategy, strategy + s
+            ) 0.0 array
+
+    let inline mutate_temp f = for i=0 to temp.Length-1 do temp.[i] <- f temp.[i]
+    if normalizing_sum > 0.0 then mutate_temp (fun x -> x / normalizing_sum)
+    else mutate_temp (fun _ -> 1.0 / float actions.Length)
+    temp
+
+let add_strategy_sum agent realization_weight x = 
+    let sum = agent.strategy_sum
+    Array.iteri (fun i x -> sum.[i] <- sum.[i] + realization_weight * x) x
+
+type Particle = {card: Card; probability: float}
+
+let cfr history (one : Particle) (two : Particle) =
+    match history with
+    | [Pass; Pass] -> if one.card > two.card then 1.0 else -1.0
+    | [Pass; Bet; Pass] -> -1.0
+    | [Pass; Bet; Bet] -> if one.card > two.card then 2.0 else -2.0
+    | [Bet; Pass] -> 1.0
+    | [Bet; Bet] -> if one.card > two.card then 2.0 else -2.0
+    | _ ->
+        let node =
+            match node_map.TryGetValue((history, one.card)) with
+            | true, v -> v
+            | false, _ -> {strategy_sum=Array.zeroCreate actions.Length; regret_sum=Array.zeroCreate actions.Length}
+
+        let action_distribution = 
+        0.0
+
+let train num_iterations =
+    let cards = [|One; Two; Three|]
+    let mutable util = 0.0
+    for i=1 to num_iterations do
+        knuth_shuffle cards
+        util <- util + cfr [] {card=cards.[0]; probability=1.0} {card=cards.[1]; probability=1.0}
+    printfn "Average game value: %f" (util / float num_iterations)
+    printfn "%A" node_map