## Loss function

We use a [Pseudo-Huber](https://en.wikipedia.org/wiki/Huber_loss#Pseudo-Huber_loss_function) loss function.
Note that the Pseudo-Huber function is defined as $\hat{h}_{\delta}\left(x\right) = \delta^2 \left(\sqrt{1 + \left(x/\delta\right)^2} - 1\right)$ where $\delta$ is a parameter that indicates when the loss transitions from being quadratic (L2-like) to linear (L1-like) in $a$.
Note that we will actually use a scaled Pseudo-Huber function of $h_{\delta}\left(x\right) = \hat{h}_{\delta}\left(x\right)/\delta$ so the slope of the loss is one in the linear range.

Specifically, let $r_v\left(c\right) = p_v\left(c\right) - y_{v,c}$ be the residual for the predicted of the escape probability of variant $v$ at concentration $c$, where we are using $y_{v,c}$ to denote the measured value.
Then the loss for variant $v$ at concentration $c$ is $L_{\delta}\left(r_v\left(c\right)\right) = h_{\delta}\left(r_v\left(c\right)\right)$, and the overall loss is:
$$ L = \sum_{v,c} h_{\delta}\left(r_v\left(c\right)\right).$$

## Gradient of loss function
For the loss function, the gradients are as follows:

$$
\frac{\partial L}{\partial \beta_{m,e}} =
\sum_{v,c}
\frac{r_v\left(c\right)}{h_{\delta}\left(r_v\left(c\right)\right) + \delta}
p_v\left(c\right) \left[1 - U_e\left(v, c\right)\right] b\left(v\right)_m
$$

$$
\frac{\partial L}{\partial a_{\rm{wt},e}} =
-\sum_{v,c}
\frac{r_v\left(c\right)}{h_{\delta}\left(r_v\left(c\right)\right) + \delta}
p_v\left(c\right) \left[1 - U_e\left(v, c\right)\right]
$$

See below for how the sub-components that lead to these were calculated.

#### Calculating $\frac{\partial \left[h_{\delta}\left(r\right)\right]}{\partial r}$

We have
$$ \frac{\partial \left[h_{\delta}\left(r\right)\right]}{\partial r}
= \delta \frac{\partial \left(\sqrt{1 + \left(r/\delta\right)^2} - 1\right)}{\partial r}
= \frac{\delta}{2 \sqrt{1 + \left(r/\delta\right)^2}} \frac{2r}{\delta^2}
= \frac{r}{h_{\delta}\left(r\right) + \delta}
$$

#### Calculating $\frac{\partial p_v\left(c\right)}{\partial \beta_{m,e}}$

First, note 

$$
\frac{\partial p_v\left(c\right)}{\partial \beta_{m,e}} = \frac{\partial U_e\left(v, c\right)}{\partial \beta_{m,e}} \frac{p_v\left(c\right)}{U_e\left(v, c\right)}.
$$


Next, note
$$
\frac{\partial U_e\left(v, c\right)}{\partial \beta_{m,e}} = \frac{\partial \phi_e\left(v\right)}{\partial \beta_{m,e}}\frac{c \exp\left(-\phi_e\left(v\right)\right) }{\left[1 + c \exp\left(-\phi_e\left(v\right)\right)\right]^2} = \frac{\partial \phi_e\left(v\right)}{\partial \beta_{m,e}} U_e\left(v, c\right) \left[1 - U_e\left(v, c\right)\right]
$$
where the last step uses the simplification [here](https://math.stackexchange.com/a/1225116).

Finally, note
$$\frac{\partial \phi_e\left(v\right)}{\partial \beta_{m,e}} = b\left(v\right)_m.$$

Putting it all together, we have:
$$
\frac{\partial p_v\left(c\right)}{\partial \beta_{m,e}} = p_v\left(c\right) \left[1 - U_e\left(v, c\right)\right] b\left(v\right)_m.
$$

#### Calculating $\frac{\partial p_v\left(c\right)}{\partial a_{\rm{wt},e}}$
The only difference from above is the sign, so:
$$
\frac{\partial p_v\left(c\right)}{\partial a_{\rm{wt},e}} = -p_v\left(c\right) \left[1 - U_e\left(v, c\right)\right].
$$