https://fivethirtyeight.com/features/whats-your-best-scrabble-string/

## Express

You have two buckets and 100 ping-pong balls, 50 of which are red and 50 of which are blue. You get to arrange the balls into the two buckets however you’d like, but each bucket needs at least one ball. Your friend will blindly choose one of the two buckets and then select a ball at random from the chosen bucket.

How can you arrange the balls to maximize the probability that your friend chooses a red ball? What probability of success do you achieve?

Extra credit: What probability of success do you get with 25 balls of each color? 200 balls of each color?

### Solution

In any case, have only 1 ball in one bucket, and have that ball be red. For n blue and n red balls, for a random chance of choosing either bucket, you will get a probability of picking a red ball as (0.5 * 1) + (0.5 * (n-1)/n) which asymptotically approaches 0.75 as n goes to infinity.

In [1]:
import numpy as np
import pandas as pd
import itertools

NUM_BLUE = 50
NUM_RED = 50

In [2]:
# Step 1: Make all possible combinations of number of balls in left bucket + number of balls that are blue
# If you have 1 ball in the left bucket, 0 or 1 of them can be blue
# If you have 2 balls in the left bucket, 0, 1, or 2 can be blue and so on
# This should be symmetrical if you switch blue and red or left and right

setup_list = [(x, y) for x in range(1, (NUM_BLUE + NUM_RED)) for y in range(NUM_BLUE) if y <= x]
df = pd.DataFrame({
    'num_balls_l': list(zip(*setup_list))[0],
    'num_balls_l_blue': list(zip(*setup_list))[1]
})

df.head()

Unnamed: 0,num_balls_l,num_balls_l_blue
0,1,0
1,1,1
2,2,0
3,2,1
4,2,2


In [3]:
# Construct the rest of the columns containing number of blue/red balls in left/right buckets
df['num_balls_r'] = NUM_BLUE + NUM_RED - df['num_balls_l']
df['num_balls_l_red'] = df['num_balls_l'] - df['num_balls_l_blue']
df['num_balls_r_blue'] = NUM_BLUE - df['num_balls_l_blue']
df['num_balls_r_red'] = NUM_RED - df['num_balls_l_red']

# Filter out cases where there are >50 red or blue balls in either bucket
df = df.loc[df['num_balls_l_red'] <= NUM_RED]

df.head()

Unnamed: 0,num_balls_l,num_balls_l_blue,num_balls_r,num_balls_l_red,num_balls_r_blue,num_balls_r_red
0,1,0,99,1,50,49
1,1,1,99,0,49,50
2,2,0,98,2,50,48
3,2,1,98,1,49,49
4,2,2,98,0,48,50


In [4]:
# Rearrange columns for readability
df = df[['num_balls_l', 'num_balls_r', 'num_balls_l_blue', 'num_balls_l_red', 'num_balls_r_blue', 'num_balls_r_red']]

# Construct probabilities of choosing red ball if left or right bucket
df['prob_l_red'] = df['num_balls_l_red'] / df['num_balls_l']
df['prob_r_red'] = df['num_balls_r_red'] / df['num_balls_r']

# Construct overall probability assuming 50-50 chance of choosing either bucket
df['prob_red'] = (df['prob_l_red'] + df['prob_r_red']) / 2

df.head()

Unnamed: 0,num_balls_l,num_balls_r,num_balls_l_blue,num_balls_l_red,num_balls_r_blue,num_balls_r_red,prob_l_red,prob_r_red,prob_red
0,1,99,0,1,50,49,1.0,0.494949,0.747475
1,1,99,1,0,49,50,0.0,0.505051,0.252525
2,2,98,0,2,50,48,1.0,0.489796,0.744898
3,2,98,1,1,49,49,0.5,0.5,0.5
4,2,98,2,0,48,50,0.0,0.510204,0.255102


In [5]:
# Arrange results by prob_red
df.sort_values(by=['prob_red'], ascending=False).head()

Unnamed: 0,num_balls_l,num_balls_r,num_balls_l_blue,num_balls_l_red,num_balls_r_blue,num_balls_r_red,prob_l_red,prob_r_red,prob_red
0,1,99,0,1,50,49,1.0,0.494949,0.747475
2,2,98,0,2,50,48,1.0,0.489796,0.744898
5,3,97,0,3,50,47,1.0,0.484536,0.742268
9,4,96,0,4,50,46,1.0,0.479167,0.739583
14,5,95,0,5,50,45,1.0,0.473684,0.736842
