In [1]:
import math
import numpy as np
import scipy
import collections

# Problem statement

With a large school of goldfish visiting, an opportunity arises to acquire some top grade `SCUBA_GEAR`. You only have two chances to offer a good price. Each one of the goldfish will accept the lowest bid that is over their reserve price. You know there’s a constant desire for scuba gear on the archipelago. So, at the end of the round, you’ll be able to sell them for 1000 SeaShells a piece.

Whilst not every goldfish has the same reserve price, you know the distribution of their reserve prices. The reserve price will be no lower than 900 and no higher than 1000. The probability scales linearly from 0 at 900 to most likely at 1000.

You only trade with the goldfish. Bids of other participants will not affect your results.

Think hard about how you want to set your two bids in order to walk away with some serious SeaShell profit.

# Solution

Let $n$ denote the number of fish and assume each fish has a single unit of scuba gear to sell.  
For each $i\in \{1,\ldots,n\}$, fish $i$ has a reserve price $R_i$: fish $i$ will sell its gear iff the bid $p$ verifies $p\geq R_i$.
Note that each $R_i$ is unknown to us.

Denoting by $p_l$, $p_h$ our low and high bid,
the final profit is 
$$\sum_{i=1}^n (1000-p_l)1_{R_i \leq p_l} + (1000-p_h)1_{p_l < R_i \leq p_h} +  0\times 1_{R_i > p_h},$$
and our task is to choose our bids in order to maximize this sum, or equivalently, to maximize the average 
$$A \coloneqq \frac 1n \sum_{i=1}^n (1000-p_l)1_{R_i \leq p_l} + (1000-p_h)1_{p_l < R_i \leq p_h}.$$

Since $n$ is assumed to be large and since we know the distribution of the $R_i$, 
a heuristic suggested by the law of large numbers is to maximize the expected value $\mathbb E[A]$.  
Assuming the $R_i$ are i.i.d. with same distribution as $R$, the optimization problem is to maximize
$$(1000-p_l)\mathbb P(R \leq p_l) + (1000-p_h)\mathbb P(p_l < R \leq p_h)$$
under the constraints $p_l, p_h\in \mathbb N$ and $900 \leq p_l \leq p_h \leq 1000$.

It was clarified by the [organizer](https://discord.com/channels/1001852729725046804/1004051976759296022/1226850690387673228) that
the distribution of reserve prices is continuous.   
The pdf of $R$ writes therefore as $f_R(r) = (\alpha r + \beta)1_{900\leq r\leq 1000}$ for some constants $\alpha$ and $\beta$.  
According to this [message](https://discord.com/channels/1001852729725046804/1004051976759296022/1226842470281773096),
$f_R(r)=0$ for every $r\leq 900$, and we also have $\int_{900}^{1000}f_R(r) dr = 1$, 
hence $\alpha = \frac 1{5000}$ and $\beta = -\frac{9}{50}$.  
The surrogate objective function $\mathbb E[A]$ writes therefore as 
$$(1000-p_l)\int_{900}^{p_l} (\alpha r + \beta) dr + (1000-p_h)\int_{p_l}^{p_h} (\alpha r + \beta) dr
= \frac{1}{10000}\big((1000-p_l)(p_l-900)^2 + (1000-p_h) (p_h-p_l)(p_h+p_l-1800)\big)
.$$

The following Python code performs optimization over the finite grid $\{(p_l, p_h)\in \mathbb N^2: 900 \leq p_l \leq p_h \leq 1000\}$

In [2]:
def objective(low, high):
    """Compute the value of the surrogate objective function.

    Parameters
    ----------
    low : int
        Value of the low bid.
    high : int
        Value of the high bid.
    
    Returns
    -------
    float
        Value of the surrogate objective function.
    """
    lhs = (1000-low) * (low-900)**2
    rhs = (1000-high) * (high-low) * (high+low-1800)
    return 1/10000 * (lhs + rhs)

argmax = []
val_max = 0
for low in range(900, 1001):
    for high in range(low, 1001):
        comp = objective(low, high)
        if math.isclose(comp, val_max):
            argmax.append((low, high))
        elif comp > val_max:
            val_max = comp
            argmax = [(low, high)]
print('Maximizers:', argmax)

Maximizers: [(952, 978)]


The maximizer is $(p_l,p_h) = (952,978)$.

This is also confirmed by the following Mathematica command
$$
a\text{:=}\frac{1}{5000};b\text{:=}-\frac{9}{50};\text{Maximize}\left[\left\{(1000-h) \int_l^h (a r+b) \, dr+(1000-l) \int_{900}^l (a r+b) \, dr,900\leq l\leq h\leq 1000\right\},\{l,h\},\mathbb{Z}\right]
$$

# Results

There were actually $n=5000$ fish.
While $(952,978)$ is the solution that maximizes the surrogate objective function, 
it was suboptimal for the online judge.

<p float="center">
  <img src="https://i.imgur.com/EfE7rbg.png" width="1200" />
</p>

Below, we investigate whether the suboptimality of $(952,978)$ was a one-off. We repeat the experiment many times.  
Reserve prices are sampled using inverse transform sampling, since the data is univariate and the CDF can be explicitly inversed.

In [3]:
def objective2(low, high, reserve_prices):
    """Compute the value of the objective function.

    Parameters
    ----------
    low : int
        Value of the low bid.
    high : int
        Value of the high bid.
    reserve_prices : ndarray
        Reserve price for each fish
        
    Returns
    -------
    float
        Value of the objective function.
    """
    arr = (1000-low) * (reserve_prices <= low) + (1000-high) * ((low < reserve_prices) & (reserve_prices <= high))
    return np.sum(arr)

def maximize(reserve_prices):
    """Compute maximizers of the objective function.

    Parameters
    ----------
    reserve_prices : ndarray
        Reserve price for each fish
        
    Returns
    -------
    argmax : list of tuple
        Maximizers.
    """
    argmax = []
    val_max = 0
    for low in range(900, 1001):
        for high in range(low, 1001):
            comp = objective2(low, high, reserve_prices)
            if math.isclose(comp, val_max):
                argmax.append((low, high))
            elif comp > val_max:
                val_max = comp
                argmax = [(low, high)]
    return argmax

class reserve_price_gen(scipy.stats.rv_continuous):
    "Reserve price distribution"
    def _pdf(self, r):
        return 1/5000 * r - 9/50
    def _ppf(self, u):
        return 100 * (9+np.sqrt(u))

def repeat(m):
    """Simulate the experiment several times.

    Parameters
    ----------
    m : int
        Number of repetitions
        
    Returns
    -------
    res : list of list
        Maximizers for each experiment.
    """
    res = []
    reserve_price_dist = reserve_price_gen(a=900, b=1000)
    for _ in range(m):
        reserve_prices = reserve_price_dist.rvs(size=5000)
        res.append(maximize(reserve_prices))
    return res

The output below indicates that $(952,978)$ is optimal in 9.4% of the experiments. 

In [4]:
res = repeat(1000)
collections.Counter([el[0] for el in res])

Counter({(952, 978): 94,
         (953, 979): 78,
         (954, 979): 68,
         (953, 978): 65,
         (951, 978): 63,
         (952, 979): 63,
         (950, 977): 50,
         (951, 977): 45,
         (954, 980): 42,
         (951, 979): 37,
         (952, 977): 36,
         (950, 978): 33,
         (955, 980): 30,
         (953, 980): 25,
         (954, 978): 21,
         (953, 977): 20,
         (949, 977): 18,
         (950, 976): 17,
         (955, 981): 15,
         (952, 980): 15,
         (951, 976): 15,
         (949, 976): 14,
         (955, 979): 13,
         (952, 976): 12,
         (954, 981): 9,
         (950, 979): 8,
         (948, 976): 8,
         (956, 981): 7,
         (955, 978): 6,
         (949, 975): 6,
         (948, 977): 6,
         (953, 981): 6,
         (949, 978): 6,
         (956, 980): 5,
         (956, 979): 5,
         (954, 977): 4,
         (948, 978): 4,
         (948, 975): 4,
         (947, 975): 4,
         (950, 975): 4,
         (952, 9