# Old text

Define $N_t$ as number of test conducted at day $t$;
       $C_t$ as number of observed positive cases at day $t$;
       $H_t$ as newly admitted to hospitals at day $t$;
       $D_t$ as new death at day $t$.
       
Also define two unobserved variables, $C_t^*$ as the true number of cases at day $t$ (propotional to the whole population); $S_t^*$ as the severity which is related to the case fatality ratio or the hospitality ratio. 

Since $C_t^*$ is not affected by $N_t$, but $C_t$ is. To be more specific, when we double number of testing at day $t$, i.e. increase $N_t$ to $2 N_t$, the observed number of postive cases will also increase, although the increment is likely to be less than one $C_t$. So, $C_t$ by itself is not a good estimator for $C_t^*$. Therefore, we want to take into account the variable $N_t$ and construct a more reliable estimate for $C_t^*$.

A model proposed,

$\hat{C_t^*} = C_t \cdot \left( \frac{N}{N_t} \right)^{\beta}$

where $\beta > 0$. 

Take log transformation, we have

$\log \hat{C_t^*}= \log C_t + \beta \cdot \log \frac{N}{N_t}$

After taking take first difference, we get

$\Delta \log \hat{C_t^*} = \Delta \log C_t - \beta \cdot \Delta \log N_t$.

Rearrange the model, we can get the OLS estimate of $\beta$:

$\Delta \log C_t = \beta \cdot \Delta \log N_t + \Delta \log \hat{C_t^*}$,

$\hat{\beta} = \frac{\sum \Delta \log N_t  \Delta \log C_t}{\sum \left( \Delta \log N_t\right)^2}$.

In [1]:
using Optim;
function ScoreBeta(θ, data)
    x = data.diff_logNt;
    y = data.diff_logCt;
    T = nrow(data);
    ω = θ[1]; 
    ϕ = θ[2];
    α = θ[3]; 
    σ = θ[4];
    β0 = θ[5];
    β = [];  
    # Checked by CS people, this step is totally fine, wouldn't make any difference to memory or speed.
    # During optimazition process, this beta will be cleared after every update.
    push!(β, β0)  
    
    for i in 1:(T-1)
        v = ω + ϕ * β[i] + α * sign(x[i]) * (y[i] - β[i] * x[i]) / σ
        push!(β, v)
    end
    
    return β
end

function logL(θ, data)
    x = data.diff_logNt;
    y = data.diff_logCt;
    T = nrow(data);
    β = ScoreBeta(θ, data);
    σ = θ[4];
    sum((y .- β .* x).^2)/(σ^2) + T*log(σ^2)   # Note: This is negative logL, not logL.
end

logL (generic function with 1 method)

# Old score model
Assume normal errors, $\Delta \log \hat{C_t^*} \sim N(0, \sigma^2)$. 

Simplify notation, $y_t = \Delta \log C_t$ and $x_t = \Delta \log N_t$.

The log-likelihood is:

$$\ell(\beta_t) \propto - T \log(\sigma^2) - \sum_{t=1}^T \frac{(y_t - \beta_t x_t)^2}{\sigma^2};$$

The score is:

$$\frac{\partial \ell}{\partial \beta_t} = \frac{1}{\sigma^2} (y_t - \beta_t x_t) x_t;$$

The hessian is:

$$\frac{\partial^2 \ell}{\partial \beta_t^2} = - \frac{1}{\sigma^2} x_t^2;$$

Define
$$\psi(\beta_t) =
\frac{\frac{\partial \ell}{\partial \beta_t}}{\sqrt{- \frac{\partial^2 \ell}{\partial \beta_t^2}}}
=
\frac{\frac{1}{\sigma^2} (y_t - \beta_t x_t) x_t}{\sqrt{\frac{1}{\sigma^2} x_t^2}}
= 
\frac{1}{\sigma} \text{sign}(x_t) (y_t - \beta_t x_t)
$$

The model is (new version):

$$\beta_{t+1} = \omega + \phi \beta_t + \alpha \psi(\beta_t)
=
\omega + \phi \beta_t + \alpha \frac{\text{sign}(x_t) (y_t - \beta_t x_t)}{\sigma} 
$$

In [None]:
# Abandon code
# Aggregate daily data to weekly data (non-overlap, non-rolling window)
# df = @pipe daily |> select(_, :Ct, :Nt);
# df.week = repeat((1:ceil(Int, n/7)), inner = 7)[1:n];
# weekly =  @_ groupby(df, [:week]) |> combine(__, :Ct => sum => :WeekCt, :Nt => sum => :WeekNt);
# weekly.logCt = log.(weekly.WeekCt);
# weekly.logNt = log.(weekly.WeekNt);
# weekly = @pipe transform(weekly, :logCt => (x -> x - lag(x)) => :diff_logCt, 
#             :logNt => (x -> x - lag(x)) => :diff_logNt) |> 
# dropmissing(_) |> 
# subset(_, :diff_logCt => c -> .!isinf.(c), :diff_logCt => c -> .!isnan.(c));