Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some more strategies #1

Merged
merged 3 commits into from
Sep 6, 2024
Merged

Conversation

Quuxplusone
Copy link
Contributor

As you already know, the "optimality" of your equilibrium depends on the range of your imagination in coming up with pure strategies for the second player to mix together. If the second player's imagination is limited, then Ballmer gets an advantage.

The range of strategies imagined by the current code doesn't suffice to model even what's needed for the equilibrium solution to n=5. Here I've added the two missing strategies for n=5: "outward-leaning" binary search, and linear search inward from the endpoint. I'm gratified to see that when I add these, scipy's computed results agree with my tedious manual calculations for n=5!

This also improves the computed (non-optimal) result for n=100:

## Winning strategy
[snip 74 lines]
Avg win if Ballmer chooses randomly: $0.16247848000093376
Win if Ballmer chooses adversarially: $0.14657033010415976

I dunno if you want to accept this PR, or close it, or what; this just seemed like the easiest way to bring it to your attention. :)

Another way to throw in some weird strategies would be to express each strategy as a permutation of the numbers from 1 to n. That is, if your strategy is (50 25 75 12 37...) then that means you should guess 50 if it's a legal guess; otherwise guess 25 if it's legal; otherwise guess 75 if it's legal; etc. I think all possible pure strategies can be expressed in that form. So you could enumerate all your "preconceived" searching strategies, and then throw in a few hundred randomly generated permutations just to see if mixing them in helps at all.

Before:

    ## Winning strategy
    - With probability 18.7500%: Binary search, first guess is 1. On each step, guess the leftmost element in the interval that won't increase the worst-case complexity.
    - With probability 12.5000%: Binary search, first guess is 2. On each step, guess the leftmost element in the interval that won't increase the worst-case complexity.
    - With probability 28.1250%: Binary search, first guess is 3. On each step, guess the leftmost element in the interval that won't increase the worst-case complexity.
    - With probability 21.8750%: Binary search, first guess is 2. On each step, guess the rightmost element in the interval that won't increase the worst-case complexity.
    - With probability 18.7500%: Binary search, first guess is 0. On each step, guess the rightmost element in the interval that won't increase the worst-case complexity.
    ## Average wins for each number
    0: $3.6875 (stdev $0.8764)
    1: $3.6875 (stdev $0.9911)
    2: $3.6875 (stdev $0.9746)
    3: $3.6875 (stdev $1.0361)
    4: $3.6875 (stdev $0.6808)
    Avg win if Ballmer chooses randomly: $3.6875
    Win if Ballmer chooses adversarially: $3.6875

After:

    ## Winning strategy
    - With probability 27.5862%: Binary search, first guess is 1. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
    - With probability 27.5862%: Binary search, first guess is 3. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
    - With probability 3.4483%: Binary search, first guess is 0. On each step, guess the rightmost element in the interval that won't increase the worst-case complexity.
    - With probability 3.4483%: Binary search, first guess is 4. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
    - With probability 37.9310%: Binary search, first guess is 2. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
    ## Average wins for each number
    0: $3.7586 (stdev $0.7086)
    1: $3.7586 (stdev $0.9666)
    2: $3.7586 (stdev $0.9851)
    3: $3.7586 (stdev $0.9666)
    4: $3.7586 (stdev $0.7086)
    Avg win if Ballmer chooses randomly: $3.758620689655172
    Win if Ballmer chooses adversarially: $3.758620689655172
    ## Winning strategy
    - With probability 22.5000%: Binary search, first guess is 1. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
    - With probability 22.5000%: Binary search, first guess is 3. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
    - With probability 42.5000%: Linear search, first guess is 2, then walk linearly inward from the endpoint.
    - With probability 5.0000%: Linear search, first guess is 1, then walk linearly inward from the endpoint.
    - With probability 7.5000%: Linear search, first guess is 3, then walk linearly inward from the endpoint.
    ## Average wins for each number
    0: $3.7750 (stdev $0.6462)
    1: $3.7750 (stdev $0.9226)
    2: $3.7750 (stdev $1.0410)
    3: $3.7750 (stdev $0.9670)
    4: $3.7750 (stdev $0.6462)
    Avg win if Ballmer chooses randomly: $3.7749999999999995
    Win if Ballmer chooses adversarially: $3.7749999999999995
After this patch, the computed results for n=5 agree with my manual calculations.

    ## Winning strategy
    - With probability 22.2222%: Binary search, first guess is 1. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
    - With probability 22.2222%: Binary search, first guess is 3. On each step, guess the endmost element in the interval that won't increase the worst-case complexity.
    - With probability 44.4444%: Linear search, first guess is 2, then walk linearly inward from the endpoint.
    - With probability 5.5556%: Linear search, first guess is 1, then walk linearly inward from the endpoint.
    - With probability 5.5556%: Linear search, first guess is 3, then walk linearly inward from the endpoint.
    ## Average wins for each number
    0: $3.7778 (stdev $0.6448)
    1: $3.7778 (stdev $0.9238)
    2: $3.7778 (stdev $1.0645)
    3: $3.7778 (stdev $0.9238)
    4: $3.7778 (stdev $0.6448)
    Avg win if Ballmer chooses randomly: $3.7777777777777772
    Win if Ballmer chooses adversarially: $3.7777777777777772
@gukoff
Copy link
Owner

gukoff commented Sep 6, 2024

Amazing! Getting really close to the optimal numbers.

Another way to throw in some weird strategies would be to express each strategy as a permutation of the numbers from 1 to n. That is, if your strategy is (50 25 75 12 37...) then that means you should guess 50 if it's a legal guess; otherwise guess 25 if it's legal; otherwise guess 75 if it's legal; etc. I think all possible pure strategies can be expressed in that form. So you could enumerate all your "preconceived" searching strategies, and then throw in a few hundred randomly generated permutations just to see if mixing them in helps at all.

Wouldn't this be equivalent to:

def choose_random(left_incl, right_incl, rand: Random):
    """Select the random element in the interval."""
    return rand.randint(left_incl, right_incl)

@gukoff gukoff merged commit cc47323 into gukoff:main Sep 6, 2024
@Quuxplusone
Copy link
Contributor Author

Another way to throw in some weird strategies would be to express each strategy as a permutation of the numbers from 1 to n. That is, if your strategy is (50 25 75 12 37...) then that means you should guess 50 if it's a legal guess; otherwise guess 25 if it's legal; otherwise guess 75 if it's legal; etc. I think all possible pure strategies can be expressed in that form. So you could enumerate all your "preconceived" searching strategies, and then throw in a few hundred randomly generated permutations just to see if mixing them in helps at all.

Wouldn't this be equivalent to:

def choose_random(left_incl, right_incl, rand: Random):
    """Select the random element in the interval."""
    return rand.randint(left_incl, right_incl)

No, because that's not a pure strategy. The pure-strategy version is like this:

def binsearch_straight_left(left_incl, right_incl):
    return (left_incl + right_incl) // 2

def list_then_binsearch(guesses, binsearch):
    def f(left_incl, right_incl):
        for i in guesses:
            if left_incl <= i <= right_incl:
                return i
        return binsearch(left_incl, right_incl)
    return f

invoked with something like list_then_binsearch([50, 25, 75, 12]) or list_then_binsearch([33, 66, 10, 20, 45, 89]) or in general list_then_binsearch(random.sample(range(n), n)). The important thing is that the strategy cannot itself contain randomness. We are randomly selecting/constructing a pure strategy at random; but the strategy itself is pure once it's constructed.

The actual mathematical Nash equilibrium would be found by putting all ~100-factorial of those pure strategies into the thing you pass to scipy; we just can't do that because there aren't enough bits in the universe. :)

@Quuxplusone
Copy link
Contributor Author

I played around a little more with the above kind of "explicit list" strategies. Here's a mixed strategy involving 86 pure strategies; even if an adversarial Ballmer knows you're using this mixed strategy, you can still earn 15.79 cents per game this way. But this is still not a perfect Nash equilibrium, as we see from the fact that Ballmer's expected payout is higher for 32, 33, and 34 than it is for most other numbers (so he should avoid choosing those, so we should avoid guessing those).

## Winning strategy
- With probability 0.0152%: Guess [58, 19, 16, 74, 70], then left-straight binary search avoiding [34, 10, 8, 89, 60].
- With probability 0.0219%: Guess [54, 41, 21, 81, 72], then left-straight binary search avoiding [34, 11, 10, 95, 60].
- With probability 0.0268%: Guess [52, 21, 12, 86, 74], then right-straight binary search avoiding [31, 10, 2, 87, 42].
- With probability 0.0318%: Guess [39, 17, 9, 78, 70], then left-leaning binary search avoiding [34, 10, 8, 95, 60].
- With probability 0.0398%: Guess [45, 22, 7, 77, 56], then right-straight binary search avoiding [34, 10, 8, 95, 60].
- With probability 0.0408%: Guess [37, 18, 9, 69, 45], then left-straight binary search avoiding [34, 10, 8, 89, 60].
- With probability 0.0489%: Guess [53, 27, 18, 77, 65], then right-straight binary search avoiding [31, 14, 1, 85, 69].
- With probability 0.0669%: Guess [58, 26, 11, 83, 74], then right-straight binary search avoiding [92, 60, 32, 97, 93].
- With probability 0.1024%: Guess [33, 16, 8, 64, 48], then right-straight binary search avoiding [34, 20, 11, 60, 46].
- With probability 0.1228%: Guess [45, 18, 13, 75, 58], then left-straight binary search avoiding [34, 20, 11, 97, 60].
- With probability 0.1279%: Guess [58, 26, 11, 83, 74], then right-leaning binary search avoiding [92, 60, 32, 97, 93].
- With probability 0.1385%: Guess [47, 28, 16, 78, 66], then right-straight binary search avoiding [68, 31, 11, 97, 95].
- With probability 0.1549%: Guess [54, 27, 13, 84, 72], then outward-leaning binary search avoiding [68, 31, 11, 92, 87].
- With probability 0.1559%: Guess [49, 21, 8, 69, 61], then left-leaning binary search avoiding [32, 20, 18, 95, 60].
- With probability 0.1567%: Guess [35], then left-straight binary search.
- With probability 0.1762%: Guess [48, 19, 10, 74, 64], then outward-leaning binary search avoiding [34, 31, 1, 98, 69].
- With probability 0.1929%: Guess [45, 23, 16, 62, 54], then right-straight binary search avoiding [68, 31, 1, 98, 69].
- With probability 0.1991%: Guess [62, 35, 9, 78, 69], then left-straight binary search avoiding [31, 20, 11, 97, 60].
- With probability 0.2076%: Guess [51, 23, 15, 63, 60], then right-straight binary search avoiding [66, 31, 11, 98, 68].
- With probability 0.2198%: Guess [37, 17, 7, 77, 56], then right-straight binary search avoiding [53, 42, 31, 89, 59].
- With probability 0.2205%: Guess [42, 10, 2, 74, 58], then outward-leaning binary search avoiding [60, 59, 32, 88, 68].
- With probability 0.2288%: Guess [52, 30, 20, 74, 64], then right-straight binary search avoiding [53, 44, 31, 97, 88].
- With probability 0.2574%: Guess [57], then outward-leaning binary search.
- With probability 0.2739%: Guess [49, 17, 0, 70, 55], then right-leaning binary search avoiding [68, 34, 31, 98, 85].
- With probability 0.3160%: Guess [58, 26, 11, 83, 74], then left-straight binary search avoiding [92, 60, 32, 97, 93].
- With probability 0.3524%: Guess [50, 20, 13, 78, 72], then left-straight binary search avoiding [87, 31, 10, 97, 92].
- With probability 0.4230%: Guess [56, 24, 16, 81, 69], then right-straight binary search avoiding [33, 31, 1, 92, 38].
- With probability 0.4272%: Guess [44, 25, 9, 75, 56], then right-straight binary search avoiding [31, 2, 1, 91, 50].
- With probability 0.4356%: Guess [59, 28, 14, 78, 69], then left-straight binary search avoiding [87, 42, 31, 93, 92].
- With probability 0.4357%: Guess [44, 29, 12, 72, 58], then right-straight binary search avoiding [65, 31, 1, 98, 68].
- With probability 0.4449%: Guess [54, 20, 13, 79, 63], then left-straight binary search avoiding [42, 32, 31, 92, 88].
- With probability 0.4544%: Guess [49, 17, 15, 83, 67], then left-straight binary search avoiding [91, 69, 31, 97, 95].
- With probability 0.4687%: Guess [53], then left-leaning binary search.
- With probability 0.4785%: Guess [47, 17, 9, 81, 59], then left-straight binary search avoiding [88, 68, 31, 98, 97].
- With probability 0.4965%: Guess [52, 28, 20, 83, 67], then outward-leaning binary search avoiding [87, 68, 31, 93, 92].
- With probability 0.5171%: Guess [48, 32, 16, 80, 64], then left-leaning binary search avoiding [31, 11, 1, 96, 68].
- With probability 0.5399%: Guess [50, 22, 16, 72, 57], then left-leaning binary search avoiding [92, 31, 11, 98, 95].
- With probability 0.5553%: Guess [53, 21, 13, 70, 62], then left-straight binary search avoiding [31, 3, 1, 97, 68].
- With probability 0.5596%: Guess [45, 30, 18, 66, 61], then left-straight binary search avoiding [92, 31, 11, 98, 95].
- With probability 0.6146%: Guess [50, 22, 16, 72, 57], then left-straight binary search avoiding [92, 31, 11, 98, 95].
- With probability 0.6251%: Guess [51, 24, 19, 82, 67], then left-straight binary search avoiding [68, 34, 31, 98, 85].
- With probability 0.6932%: Guess [50], then right-straight binary search.
- With probability 0.7377%: Guess [50, 26, 18, 66, 56], then right-straight binary search avoiding [34, 31, 1, 98, 69].
- With probability 0.7463%: Guess [49, 27, 15, 79, 65], then right-straight binary search avoiding [31, 11, 2, 95, 68].
- With probability 0.8100%: Guess [39, 16, 9, 69, 57], then left-straight binary search avoiding [42, 31, 1, 97, 68].
- With probability 0.8724%: Guess [48, 16, 0, 80, 64], then left-leaning binary search avoiding [68, 34, 32, 98, 85].
- With probability 0.8959%: Guess [39, 16, 7, 28, 64], then left-straight binary search avoiding [31, 2, 1, 91, 86].
- With probability 0.9243%: Guess [39], then right-straight binary search.
- With probability 0.9458%: Guess [50, 18, 13, 71, 66], then outward-leaning binary search avoiding [68, 31, 14, 98, 85].
- With probability 0.9562%: Guess [54], then left-straight binary search.
- With probability 1.0104%: Guess [54], then outward-leaning binary search.
- With probability 1.0638%: Guess [38], then left-straight binary search.
- With probability 1.0822%: Guess [49, 25, 12, 70, 55], then right-straight binary search avoiding [31, 14, 1, 85, 69].
- With probability 1.1653%: Guess [43], then left-straight binary search.
- With probability 1.2116%: Guess [49], then left-straight binary search.
- With probability 1.2216%: Guess [45], then left-straight binary search.
- With probability 1.3687%: Guess [46], then right-leaning binary search.
- With probability 1.3801%: Guess [56], then left-leaning binary search.
- With probability 1.4976%: Guess [57], then left-straight binary search.
- With probability 1.5744%: Guess [51], then right-leaning binary search.
- With probability 1.6917%: Guess [56], then right-straight binary search.
- With probability 1.7000%: Guess [39], then left-straight binary search.
- With probability 1.7407%: Guess [36], then left-straight binary search.
- With probability 1.7761%: Guess [52], then left-straight binary search.
- With probability 1.7949%: Guess [60], then left-straight binary search.
- With probability 1.8790%: Guess [42], then right-straight binary search.
- With probability 2.0065%: Guess [41], then left-straight binary search.
- With probability 2.0105%: Guess [51], then left-straight binary search.
- With probability 2.0375%: Guess [42], then outward-leaning binary search.
- With probability 2.0446%: Guess [58], then right-straight binary search.
- With probability 2.3210%: Guess [60], then right-straight binary search.
- With probability 2.3754%: Guess [48], then right-straight binary search.
- With probability 2.4180%: Guess [54], then right-straight binary search.
- With probability 2.4293%: Guess [37], then right-straight binary search.
- With probability 2.4350%: Guess [47], then right-straight binary search.
- With probability 2.4589%: Guess [53], then right-straight binary search.
- With probability 2.5128%: Guess [50], then outward-leaning binary search.
- With probability 2.5692%: Guess [63], then right-straight binary search.
- With probability 2.9470%: Guess [47], then right-leaning binary search.
- With probability 3.1326%: Guess [45], then outward-leaning binary search.
- With probability 3.3570%: Guess [52], then left-leaning binary search.
- With probability 3.4873%: Guess [62], then left-straight binary search.
- With probability 3.4919%: Guess [49], then outward-leaning binary search.
- With probability 3.6914%: Guess [46], then left-straight binary search.
- With probability 4.6474%: Guess [56], then outward-leaning binary search.
- With probability 5.5147%: Guess [43], then outward-leaning binary search.
## Average wins for each number
0: $0.1579 (stdev $1.0746) 1: $0.1579 (stdev $1.0650) 2: $0.1579 (stdev $1.0265) 3: $0.1579 (stdev $1.0848) 4: $0.1579 (stdev $1.0900) 5: $0.1579 (stdev $1.0637) 6: $0.1579 (stdev $1.0739) 7: $0.1587 (stdev $1.1237) 8: $0.1635 (stdev $1.0943) 9: $0.1579 (stdev $1.1534)
10: $0.1634 (stdev $1.1472) 11: $0.1586 (stdev $1.1901) 12: $0.1579 (stdev $1.1263) 13: $0.1579 (stdev $1.1830) 14: $0.1579 (stdev $1.1515) 15: $0.1579 (stdev $1.1541) 16: $0.1579 (stdev $1.1499) 17: $0.1579 (stdev $1.1697) 18: $0.1579 (stdev $1.2229) 19: $0.1579 (stdev $1.0994)
20: $0.1579 (stdev $1.1709) 21: $0.1579 (stdev $1.1408) 22: $0.1579 (stdev $1.1881) 23: $0.1579 (stdev $1.1052) 24: $0.1579 (stdev $1.2639) 25: $0.1579 (stdev $1.1565) 26: $0.1579 (stdev $1.1667) 27: $0.1579 (stdev $1.1696) 28: $0.1579 (stdev $1.1819) 29: $0.1579 (stdev $1.1614)
30: $0.1579 (stdev $1.2033) 31: $0.1579 (stdev $1.2238) 32: $0.1591 (stdev $1.0660) 33: $0.1605 (stdev $1.1279) 34: $0.1729 (stdev $1.1040) 35: $0.1579 (stdev $1.1068) 36: $0.1579 (stdev $1.1482) 37: $0.1579 (stdev $1.1853) 38: $0.1579 (stdev $1.1439) 39: $0.1579 (stdev $1.2260)
40: $0.1579 (stdev $1.1310) 41: $0.1579 (stdev $1.1385) 42: $0.1579 (stdev $1.2204) 43: $0.1579 (stdev $1.2578) 44: $0.1579 (stdev $1.1065) 45: $0.1579 (stdev $1.2550) 46: $0.1579 (stdev $1.2189) 47: $0.1579 (stdev $1.2751) 48: $0.1579 (stdev $1.1997) 49: $0.1579 (stdev $1.2744)
50: $0.1579 (stdev $1.2405) 51: $0.1579 (stdev $1.2190) 52: $0.1579 (stdev $1.2527) 53: $0.1585 (stdev $1.1927) 54: $0.1579 (stdev $1.2409) 55: $0.1579 (stdev $1.0635) 56: $0.1579 (stdev $1.3120) 57: $0.1579 (stdev $1.1580) 58: $0.1579 (stdev $1.1765) 59: $0.1579 (stdev $1.1494)
60: $0.1654 (stdev $1.2286) 61: $0.1579 (stdev $1.1314) 62: $0.1579 (stdev $1.1946) 63: $0.1579 (stdev $1.1726) 64: $0.1579 (stdev $1.1348) 65: $0.1579 (stdev $1.0894) 66: $0.1579 (stdev $1.1703) 67: $0.1579 (stdev $1.0721) 68: $0.1579 (stdev $1.2041) 69: $0.1579 (stdev $1.2089)
70: $0.1579 (stdev $1.1710) 71: $0.1579 (stdev $1.1588) 72: $0.1579 (stdev $1.1546) 73: $0.1579 (stdev $1.1334) 74: $0.1579 (stdev $1.2156) 75: $0.1579 (stdev $1.2360) 76: $0.1579 (stdev $1.0836) 77: $0.1579 (stdev $1.2152) 78: $0.1579 (stdev $1.1584) 79: $0.1579 (stdev $1.1649)
80: $0.1579 (stdev $1.1460) 81: $0.1579 (stdev $1.2108) 82: $0.1579 (stdev $1.1719) 83: $0.1579 (stdev $1.1395) 84: $0.1579 (stdev $1.1585) 85: $0.1579 (stdev $1.1513) 86: $0.1579 (stdev $1.1183) 87: $0.1579 (stdev $1.1349) 88: $0.1601 (stdev $1.1422) 89: $0.1607 (stdev $1.1178)
90: $0.1579 (stdev $1.1783) 91: $0.1579 (stdev $1.1065) 92: $0.1579 (stdev $1.1092) 93: $0.1579 (stdev $1.0873) 94: $0.1597 (stdev $1.0260) 95: $0.1611 (stdev $1.1152) 96: $0.1579 (stdev $1.0568) 97: $0.1579 (stdev $1.0260) 98: $0.1601 (stdev $1.0400) 99: $0.1579 (stdev $1.0606)
Avg win if Ballmer chooses randomly: $0.1584188797812923
Win if Ballmer chooses adversarially: $0.15790066548568113

"Avoiding [numbers]" means that you call the appropriate binary search to get your candidate guess, but then if your candidate guess is on the blacklist, you adjust your actual guess rightward until it's no longer on the blacklist (or, if that's not possible, then leftward; or if that's not possible, just guess the blacklisted number now). Obviously there's room to fiddle with that procedure too.

@gukoff
Copy link
Owner

gukoff commented Sep 10, 2024

No, because that's not a pure strategy

Oh, I meant that that function accepts a random: Random object with a fixed random seed :) Then it's a pure strategy.

Although randint makes a choice based only on the interval size, which makes it unable to generate totally random pure strategies. One could fix it by re-seeding rand for every guess that was "higher". But then it feels like an overkill.


I especially like you representation of the strategy as a permutation not only for generating random pure strategies (I'd assume truly random strategies have very negative EV and don't contribute meaningfully), but for storing pure strategies in a succinct format. It's basically a pre-order traversal (after a bit of normalization) of the decision tree, also used in this article by Eric Farmer.

It's human-readable and easy on the storage if we were to gather many pure strategies like Bo did with the Multiplicative Weights algorithm (it generates a new pure strat on every algorithm step, then averages them all to get an almost-optimal mixed strategy).

@Quuxplusone
Copy link
Contributor Author

I meant that that function accepts a random: Random object with a fixed random seed :) Then it's a pure strategy.

Heh, yes, but not an easy one to describe succinctly, anymore. :)

I especially like you representation of the strategy as a permutation not only for generating random pure strategies (I'd assume truly random strategies have very negative EV and don't contribute meaningfully), but for storing pure strategies in a succinct format. It's basically a pre-order traversal (after a bit of normalization) of the decision tree, also used in this article by Eric Farmer.

Yes. And you could write a "pretty-printer" that would prune down the list until the remaining moves were "obvious." For example, in the 5-number game, instead of talking about the strategy [1,5,4,3,2] you can just say [1,5,4,3]; instead of [2,3,4,1,5] you can just say [2,3,4]; and instead of [2,4,1,3,5] you can just say [2,4].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants