-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some more strategies #1
Conversation
Before:
## Winning strategy
- With probability 18.7500%: Binary search, first guess is 1. On each step, guess the leftmost element in the interval that won't increase the worst-case complexity.
- With probability 12.5000%: Binary search, first guess is 2. On each step, guess the leftmost element in the interval that won't increase the worst-case complexity.
- With probability 28.1250%: Binary search, first guess is 3. On each step, guess the leftmost element in the interval that won't increase the worst-case complexity.
- With probability 21.8750%: Binary search, first guess is 2. On each step, guess the rightmost element in the interval that won't increase the worst-case complexity.
- With probability 18.7500%: Binary search, first guess is 0. On each step, guess the rightmost element in the interval that won't increase the worst-case complexity.
## Average wins for each number
0: $3.6875 (stdev $0.8764)
1: $3.6875 (stdev $0.9911)
2: $3.6875 (stdev $0.9746)
3: $3.6875 (stdev $1.0361)
4: $3.6875 (stdev $0.6808)
Avg win if Ballmer chooses randomly: $3.6875
Win if Ballmer chooses adversarially: $3.6875
After:
## Winning strategy
- With probability 27.5862%: Binary search, first guess is 1. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
- With probability 27.5862%: Binary search, first guess is 3. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
- With probability 3.4483%: Binary search, first guess is 0. On each step, guess the rightmost element in the interval that won't increase the worst-case complexity.
- With probability 3.4483%: Binary search, first guess is 4. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
- With probability 37.9310%: Binary search, first guess is 2. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
## Average wins for each number
0: $3.7586 (stdev $0.7086)
1: $3.7586 (stdev $0.9666)
2: $3.7586 (stdev $0.9851)
3: $3.7586 (stdev $0.9666)
4: $3.7586 (stdev $0.7086)
Avg win if Ballmer chooses randomly: $3.758620689655172
Win if Ballmer chooses adversarially: $3.758620689655172
## Winning strategy
- With probability 22.5000%: Binary search, first guess is 1. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
- With probability 22.5000%: Binary search, first guess is 3. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
- With probability 42.5000%: Linear search, first guess is 2, then walk linearly inward from the endpoint.
- With probability 5.0000%: Linear search, first guess is 1, then walk linearly inward from the endpoint.
- With probability 7.5000%: Linear search, first guess is 3, then walk linearly inward from the endpoint.
## Average wins for each number
0: $3.7750 (stdev $0.6462)
1: $3.7750 (stdev $0.9226)
2: $3.7750 (stdev $1.0410)
3: $3.7750 (stdev $0.9670)
4: $3.7750 (stdev $0.6462)
Avg win if Ballmer chooses randomly: $3.7749999999999995
Win if Ballmer chooses adversarially: $3.7749999999999995
After this patch, the computed results for n=5 agree with my manual calculations.
## Winning strategy
- With probability 22.2222%: Binary search, first guess is 1. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
- With probability 22.2222%: Binary search, first guess is 3. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
- With probability 44.4444%: Linear search, first guess is 2, then walk linearly inward from the endpoint.
- With probability 5.5556%: Linear search, first guess is 1, then walk linearly inward from the endpoint.
- With probability 5.5556%: Linear search, first guess is 3, then walk linearly inward from the endpoint.
## Average wins for each number
0: $3.7778 (stdev $0.6448)
1: $3.7778 (stdev $0.9238)
2: $3.7778 (stdev $1.0645)
3: $3.7778 (stdev $0.9238)
4: $3.7778 (stdev $0.6448)
Avg win if Ballmer chooses randomly: $3.7777777777777772
Win if Ballmer chooses adversarially: $3.7777777777777772
|
Amazing! Getting really close to the optimal numbers.
Wouldn't this be equivalent to: def choose_random(left_incl, right_incl, rand: Random):
"""Select the random element in the interval."""
return rand.randint(left_incl, right_incl) |
No, because that's not a pure strategy. The pure-strategy version is like this: invoked with something like The actual mathematical Nash equilibrium would be found by putting all ~100-factorial of those pure strategies into the thing you pass to scipy; we just can't do that because there aren't enough bits in the universe. :) |
|
I played around a little more with the above kind of "explicit list" strategies. Here's a mixed strategy involving 86 pure strategies; even if an adversarial Ballmer knows you're using this mixed strategy, you can still earn 15.79 cents per game this way. But this is still not a perfect Nash equilibrium, as we see from the fact that Ballmer's expected payout is higher for "Avoiding [numbers]" means that you call the appropriate binary search to get your candidate guess, but then if your candidate guess is on the blacklist, you adjust your actual guess rightward until it's no longer on the blacklist (or, if that's not possible, then leftward; or if that's not possible, just guess the blacklisted number now). Obviously there's room to fiddle with that procedure too. |
Oh, I meant that that function accepts a Although I especially like you representation of the strategy as a permutation not only for generating random pure strategies (I'd assume truly random strategies have very negative EV and don't contribute meaningfully), but for storing pure strategies in a succinct format. It's basically a pre-order traversal (after a bit of normalization) of the decision tree, also used in this article by Eric Farmer. It's human-readable and easy on the storage if we were to gather many pure strategies like Bo did with the |
Heh, yes, but not an easy one to describe succinctly, anymore. :)
Yes. And you could write a "pretty-printer" that would prune down the list until the remaining moves were "obvious." For example, in the 5-number game, instead of talking about the strategy [1,5,4,3,2] you can just say [1,5,4,3]; instead of [2,3,4,1,5] you can just say [2,3,4]; and instead of [2,4,1,3,5] you can just say [2,4]. |
As you already know, the "optimality" of your equilibrium depends on the range of your imagination in coming up with pure strategies for the second player to mix together. If the second player's imagination is limited, then Ballmer gets an advantage.
The range of strategies imagined by the current code doesn't suffice to model even what's needed for the equilibrium solution to n=5. Here I've added the two missing strategies for n=5: "outward-leaning" binary search, and linear search inward from the endpoint. I'm gratified to see that when I add these, scipy's computed results agree with my tedious manual calculations for n=5!
This also improves the computed (non-optimal) result for n=100:
I dunno if you want to accept this PR, or close it, or what; this just seemed like the easiest way to bring it to your attention. :)
Another way to throw in some weird strategies would be to express each strategy as a permutation of the numbers from 1 to n. That is, if your strategy is (50 25 75 12 37...) then that means you should guess 50 if it's a legal guess; otherwise guess 25 if it's legal; otherwise guess 75 if it's legal; etc. I think all possible pure strategies can be expressed in that form. So you could enumerate all your "preconceived" searching strategies, and then throw in a few hundred randomly generated permutations just to see if mixing them in helps at all.