We are interested in understanding the [Estimation Game](https://www.quantifiedintuitions.org/estimation-game)'s scoring method. In particular, we want to understand what is the "correct" confidence interval to use.

The estimation game involves multiple questions, each with a numeric answer. The users provide a lower and upper bound, and get scored based on that. In the back end, each question also gets a challenge-discount `C` that's higher for more challenging questions.  

The scoring function follows Spencer Greenberg's [Calibration Scoring Rules for Practical Prediction Training](https://www.semanticscholar.org/reader/5779a041d387f7301ed79e873dae48d7c80e63d3). 

The Javascript code for the scoring method:
```js
export const challengeScore = (
  lowerBound: number,
  upperBound: number,
  answer: number,
  confidenceInterval: number,
  useLogScoring: boolean = false,
  C: number
) => {
  const SMAX = 10;
  const SMIN = -10; // higher lower bound for challenge questions to be more forgiving
  const DELTA = 0.4;
  const EPSILON = 0.0000000001;
  const B = confidenceInterval / 100;

  return greenbergScoring(lowerBound,
    upperBound,
    answer,
    useLogScoring,
    C,
    SMAX,
    SMIN,
    DELTA,
    EPSILON,
    B,
  )
}

const greenbergScoring = (
  lowerBound: number,
  upperBound: number,
  answer: number,
  useLogScoring: boolean = false,
  C: number,
  SMAX: number,
  SMIN: number,
  DELTA: number,
  EPSILON: number,
  B: number
) => {
  if (!useLogScoring) {
    lowerBound -= EPSILON;
    upperBound += EPSILON;
    let r = (lowerBound - answer) / C;
    let s = (upperBound - lowerBound) / C;
    let t = (answer - upperBound) / C;
    console.log("r: " + r);
    console.log("s: " + s);
    console.log("t: " + t);
    if (answer < lowerBound) {
      return Math.max(SMIN, (-2 / (1 - B)) * r - (r / (1 + r)) * s);
    } else if (answer > upperBound) {
      return Math.max(SMIN, (-2 / (1 - B)) * t - (t / (1 + t)) * s);
    }
    lowerBound -= DELTA;
    upperBound += DELTA;
    r = (lowerBound - answer) / C;
    s = (upperBound - lowerBound) / C;
    t = (answer - upperBound) / C;
    return ((4 * SMAX * r * t) / (s * s)) * (1 - s / (1 + s));
  } else {
    lowerBound /= 10 ** EPSILON;
    upperBound *= 10 ** EPSILON;
    let r = Math.log(lowerBound / answer) / Math.log(C);
    let s = Math.log(upperBound / lowerBound) / Math.log(C);
    let t = Math.log(answer / upperBound) / Math.log(C);
    console.log("r: " + r);
    console.log("s: " + s);
    console.log("t: " + t);
    if (answer < lowerBound) {
      return Math.max(SMIN, (-2 / (1 - B)) * r - (r / (1 + r)) * s);
    } else if (answer > upperBound) {
      return Math.max(SMIN, (-2 / (1 - B)) * t - (t / (1 + t)) * s);
    }
    lowerBound /= 10 ** DELTA;
    upperBound *= 10 ** DELTA;
    r = Math.log(lowerBound / answer) / Math.log(C);
    s = Math.log(upperBound / lowerBound) / Math.log(C);
    t = Math.log(answer / upperBound) / Math.log(C);
    return ((4 * SMAX * r * t) / (s * s)) * (1 - s / (1 + s));
  }
};
```

First, we see that there are two variations, depending on `useLogScoring`. They are equivalent, up to applying `log` on the inputs. So for simplicity lets remove the log-scale version (and sneakily rewrite to python). 

In [1]:
def challengeScore(
  lowerBound,
  upperBound,
  answer,
  confidenceInterval,
  C):
    SMAX = 10
    SMIN = -10 # higher lower bound for challenge questions to be more forgiving
    DELTA = 0.4
    EPSILON = 0.0000000001
    B = confidenceInterval / 100

    return greenbergScoring(lowerBound,
        upperBound,
        answer,
        C,
        SMAX,
        SMIN,
        DELTA,
        EPSILON,
        B,
    )


def greenbergScoring(
  lowerBound,
  upperBound,
  answer,
  C,
  SMAX,
  SMIN,
  DELTA,
  EPSILON,
  B
):
    lowerBound -= EPSILON
    upperBound += EPSILON
    r = (lowerBound - answer) / C
    s = (upperBound - lowerBound) / C
    t = (answer - upperBound) / C
    print("r: " + r)
    print("s: " + s)
    print("t: " + t)
    if (answer < lowerBound):
        return max(SMIN, (-2 / (1 - B)) * r - (r / (1 + r)) * s)
    elif (answer > upperBound):
        return max(SMIN, (-2 / (1 - B)) * t - (t / (1 + t)) * s)
    lowerBound -= DELTA
    upperBound += DELTA
    r = (lowerBound - answer) / C
    s = (upperBound - lowerBound) / C
    t = (answer - upperBound) / C
    return ((4 * SMAX * r * t) / (s * s)) * (1 - s / (1 + s))


From Greenberg's paper, page 19, we see it explained so that the formula we use (other than the `EPSILON`/`DELTA` adjustments) is

$$
S^{0}(x, L, U)= \begin{cases}\frac{-2}{1-\beta} r-\frac{r}{1+r} s & , \text { when } x<L \\ 4 s_{\max } \frac{r t}{s^{2}}\left(1-\frac{s}{1+s}\right) & , \text { when } L \leq x \leq U \\ \frac{-2}{1-\beta} t-\frac{t}{1+t} s & , \text { when } x>U\end{cases}
$$

which is a version of the following simpler formula:

$$
S(x, L, U)= \begin{cases}\frac{-2}{1-\beta} r-s & \text { when } x<L \\ -s & \text { when } L \leq x \leq U \\ \frac{-2}{1-\beta} t-s & \text { when } x>U\end{cases}
$$

In these formulas, $x$ is `answer` and $L$, $U$ are the `lowerBound` and `upperBound` respectively. Also, $s,r,t$ are defined as in the code above so that
$$
\begin{align*}
s &= \frac{U-L}{C} \\
r &= \frac{L-x}{C} \\
t &= \frac{x-U}{C}
\end{align*}
$$



First thing to note is that in the formula we can usually ignore factors $\frac{\lambda}{1+\lambda}$ for $\lambda$ being $s,$ $r,$ or $t$. This is because these factors are generally either close to 1 ($\lambda \to \infty$) or close to 0 ($\lambda \to 0$). So this explains why in the cases where $x$ is outside the credible interval (and sufficiently far away from it) the two formulas are basically the same.

In the 
