# Exploring regret and optimization

Where $i$ indexes alternatives, $t$ indexes individuals, and $I_t$ denotes the choice of individual $t$, regret corresponds to

$$
R_n = \sum_{t \leq n} \max_i U_{i, t} - \sum_{t \leq n} U_{{I_t}, t}
$$

Letting $U_{i, t} = \phi_{i, t}^\top \beta + \epsilon_{i, t}$ be the utility derived by individual $t$ making choice $i$,

$$
R_n = \sum_{t \leq n} \max_i \left\lbrace \phi_{i, t}^\top \beta + \epsilon_{i, t} \right\rbrace - \sum_{t \leq n} \phi_{I_t, t}^\top \beta + \epsilon_{I_t, t}
$$

---

Taking expectations:

$$
\mathbb{E} [ R_n ] = \mathbb{E} \left[ \sum_{t \leq n} \max_i \left\lbrace \phi_{i, t}^\top \beta + \epsilon_{i, t} \right\rbrace - \sum_{t \leq n} \phi_{I_t, t}^\top \beta + \epsilon_{I_t, t} \right] = \sum_{t \leq n} \mathbb{E} \left[ \max_i \left\lbrace \phi_{i, t}^\top \beta + \epsilon_{i, t} \right\rbrace \right] - \sum_{t \leq n} \phi_{I_t, t}^\top \beta
$$

And minimizing over $\hat U_{i, t} = \phi_{i, t}^\top \beta$, letting $p_t$ be the vector indexed by $i$ with values $\mathbb{P} \left( \hat U_{i, t} + \epsilon_{i, t} > \hat U_{j, t} + \epsilon_{j, t} \: \forall j \neq i \right)$, and $\hat s$ be the vector indexed by $i$ whose values are the amount of individuals choosing $i$ ($I_t = i$), first order conditions are characterized by:

$$
\sum_{t \leq n} p_t = \hat s
$$

which means observed proportions for choices should be equal to theoretical expected proportions.

---

If we minimize over $\beta$ before taking any expectation, we can discard the extraneous error term and our problem is:

$$
\min_\beta \sum_{t \leq n} \max_i \left\lbrace \phi_{i, t}^\top \beta + \epsilon_{i, t} \right\rbrace - \sum_{t \leq n} \phi_{I_t, t}^\top \beta
$$

We may introduce the variable $-U_t$ to move the $\max$ over a collection of linear expressions into the constraints:

$$
\begin{align}
    \min_{\beta, U_t} \quad & \sum_{t \leq n} -U_t - \sum_{t \leq n} \phi_{I_t, t}^\top \beta \\
    s.t. \quad & -U_t \geq \phi_{i, t}^\top \beta + \epsilon_{i, t} \quad \forall i
\end{align}
$$

To further simplify this, we must assume the $\phi_{i, t}$ are in fact independent from $t$. This amounts to saying everybody has the same criteria for choice. It is more appropriate here to consider individuals as simply members of a sample rather than distinct agents making choices. Introducing $\hat s_i = \sum_t \mathbb{1}_{I_t = i}$, and letting $\hat U_i = \phi_{i, t}^\top \beta$ the problem is now

$$
\begin{align}
    \max_{U_t, \hat U_i} \quad & \sum_t U_t + \sum_i \hat s_i \hat U_i \\
    s.t. \quad & U_t + \hat U_i \leq - \epsilon_{i, t}
\end{align}
$$

Taking the $\max$ of the opposite of the objective function, we find this is the dual of an optimal transport problem:

$$
\begin{align}
    \max_{\pi \geq 0} \quad & \sum_{i, t} \pi_{i, t} \epsilon_{i, t}  \\
    s.t. \quad & \sum_i \pi_{i, t} = 1  \\
               & \sum_t \pi_{i, t} = \hat s_i
\end{align}
$$

---

For the armed bandit problem, regret is defined with the $\max$ outside of the sum. One arm is assumed to be optimal, and a strategy can only regret not exploiting that arm (as opposed to regretting errors outside of its control).

The expectation of this regret is

$$
\mathbb{E} [ R_n ] = \mathbb{E} \left[ \max_i \left\lbrace \sum_{t \leq n} \phi_{i, t}^\top \beta + \epsilon_{i, t} \right\rbrace - \sum_{t \leq n} \phi_{I_t, t}^\top \beta + \epsilon_{I_t, t} \right] = \mathbb{E} \left[ \max_i \left\lbrace \left( \sum_{t \leq n} \phi_{i, t}^\top \right) \beta + \sum_t \epsilon_{i, t} \right\rbrace \right] - \sum_{t \leq n} \phi_{I_t, t}^\top \beta
$$

In the stochastic bandit setting, $\phi_{i, t}$ is independent from $t$ and we can let $\phi_{i, t} = \phi_i$ and $\epsilon_i = \frac{1}{n} \sum_t \epsilon_{i, t}$ to find

$$
\mathbb{E} [ R_n ] = n \mathbb{E} \left[ \max_i \left\lbrace \phi_i^\top \beta + \epsilon_i \right\rbrace \right] - \sum_{t \leq n} \phi_{I_t, t}^\top \beta
$$

FOC when minimizing over $\hat U_i = \phi_i^\top \beta$ are then

$$
p_i = \frac{1}{n} \hat s_i
$$

where $p_i = \mathbb{P} \left( \hat U_i + \epsilon_i > \hat U_j + \epsilon_j \: \forall j \neq i \right)$ and $\hat s_i = \sum_t \mathbb{1}_{I_t = i}$. This means proportions for choices must be equal to theoretical expected proportions. The difference with the result when $\max$ is inside the sum is that we must assume $\phi_{i, t}$ is independent from $t$.

---

Minimizing over $\beta$ before taking expectations,

$$
\min_{\beta} \: n \max_i \left\lbrace \phi_i^\top \beta + \epsilon_i \right\rbrace - \sum_t \phi_{I_t}^\top \beta
$$

Introducing $-U$,

$$
\begin{align}
    \min_{\beta} \quad & - U - \frac{1}{n} \sum_t \phi_{I_t}^\top \beta  \\
    s.t. \quad & - U \geq \phi_i^\top \beta + \epsilon_i
\end{align}
$$

and

$$
\begin{align}
    \max_{U, \hat U_i} \quad & U + \frac{1}{n} \sum_i \hat s_i \hat U_i  \\
    s.t. \quad & U + \hat U_i \leq - \epsilon_i
\end{align}
$$

This is also the dual of a trivial optimal transport problem, namely

$$
\begin{align}
    \max_{\pi \geq 0} \quad & \sum \pi_i \epsilon_i  \\
    s.t. \quad & \sum \pi_i = 1  \\
         \quad & \pi_i = \frac{\hat s_i}{n}
\end{align}
$$