Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
``` type Actions = | Pass | Bet type Cards = | One | Two | Three let rng = System.Random() let knuth_shuffle (ar: _[]) = let swap i j = let item = ar.[i] ar.[i] <- ar.[j] ar.[j] <- item for i=Array.length ar - 1 downto 1 do swap (rng.Next(i+1)) i let ar = [|One; Two; Three|] knuth_shuffle ar; ar ``` The Knuth shuffle is easy enough. I am using the version I saw in the Pharo library here. Let me move on. ``` compare One Three ``` Actually, I am quite surprised that this gives me -2. I thought that compare only gave -1,0 and 1. Quite interesting. Maybe it only does that for degenerate unions. 9:50am. ``` type State = | Pass | PassPass | PassBet | PassBetPass | PassBetBet | Bet | BetPass | BetBet ``` Let me put this here for the time being. I haven't decided what the states should be yet. 10:15am. ``` let normalize array = let temp, normalizing_sum = Array.mapFold (fun s x -> let strategy = max x 0.0 strategy, strategy + s ) 0.0 array let inline f g = for i=0 to temp.Length-1 do temp.[i] <- g temp.[i] if normalizing_sum > 0.0 then f (fun x -> x / normalizing_sum) else f (fun _ -> 1.0 / float actions.Length) let add_strategy_sum agent realization_weight x = let sum = agent.strategy_sum Array.iteri (fun i x -> sum.[i] <- sum.[i] + realization_weight * x) x ``` Now `normalize` will be optimized. 10:25am. Renaming is just so good in VS. It is amazing. In terms of stability and ease of use, VS is actually better than Pharo's IDE. Pharo needs to take some lessons from it. Pharo irks: - Lack of dedent with Shift + Tab - Tabs rather than spaces - [Undo](pharo-project/pharo#2814) - Autocomplete sinking - [System browser sinking](pharo-project/pharo#2800) - GToolkit crashing on delete - Lack of variable popup in debugger - Meta taking only half of the screen - [Class revert does not revert methods](pharo-project/pharo#2853) - All the buggy Roassal examples I am just gathering some ammo in case I get challenged in the next PL monthly thread. Note: Highlight my recent renaming experience in VS. 10:35am. Now, enough of that. Let me start work on the CFR function before I get distracted any further. As expected this example is quite hard, mostly because the code in the paper is so crappy. I am going to have to do it roughly and then I am going to have to do it cleanly. I am not sure why there aren't two nodeMaps, one for each player. 10:45am. ``` private double cfr(int[] cards, String history, double p0, double p1) { int plays = history.length(); int player = plays % 2; int opponent = 1 - player; *Return payoff for terminal states* String infoSet = cards[player] + history; *hGet information set node or create it if nonexistant* *For each action, recursively call cfr with additional history and probability* *For each action, compute and accumulate counterfactual regret* return nodeUtil; } ``` No, this is just so crappy. I cannot possibly abide by this. I will do it differently. Let me just ask, is this example really intended to have two players or is it just a single player playing against itself? 10:55am. ``` if (plays > 1) { boolean terminalPass = history.charAt(plays - 1) == ’p’; boolean doubleBet = history.substring(plays - 2, plays).equals("bb"); boolean isPlayerCardHigher = cards[player] > cards[opponent]; if (terminalPass) if (history.equals("pp")) return isPlayerCardHigher ? 1 : -1; else return 1; else if (doubleBet) return isPlayerCardHigher ? 2 : -2; } ``` I will have to express this in terms of pattern matching. 11am. ``` let cfr history (one : Semblance) (two : Semblance) = match history with | [Pass; Pass] -> if one.card > two.card then 1.0 else -1.0 | [Pass; Bet; Pass] -> -1.0 | [Pass; Bet; Bet] -> if one.card > two.card then 2.0 else -2.0 | [Bet; Pass] -> 1.0 | [Bet; Bet] -> if one.card > two.card then 2.0 else -2.0 | _ -> ``` Something like this should be decent. 11:15am. ``` double[] strategy = node.getStrategy(player == 0 ? p0 : p1); double[] util = new double[NUM_ACTIONS]; double nodeUtil = 0; for (int a = 0; a < NUM_ACTIONS; a++) { String nextHistory = history + (a == 0 ? "p" : "b"); util[a] = player == 0 ? - cfr(cards, nextHistory, p0 * strategy[a], p1) : - cfr(cards, nextHistory, p0, p1 * strategy[a]); nodeUtil += strategy[a] * util[a]; } ``` This is so convoluted. I have no idea which player is supposed to act here. I am going to have to do it as originally intended. Let me take off for a bit here."
- Loading branch information