One benefit of modelling dependent errors is that the waiting time for physical calibration and validation of a new satellite can be reduced considerably when two instead of three collocations are employed (Su et al. 2014).  Here, we demonstrate the retrieval of regression slope using two collocation methods: standard triple collocation (Stoffelen 1998) and a collocation of only two datasets, but where one dataset offers five samples (the other four samples being "nearby" in time or space).  We show that both methods (triple collocation and a so-called INFERS method) provide good estimates of regression slope.

Triple collocation employs this error model:

\begin{eqnarray}
  \begin{array}{r} \mathrm{in\ situ}\ \\
                   \mathrm{nowcast} \\
                   \mathrm{satellite} \end{array}
  \begin{array}{r}   I   \\   N   \\   S   \end{array}
  \begin{array}{c} \ = \ \\ \ = \ \\ \ = \ \end{array}
  \begin{array}{l} \color{white}{\alpha_N + \beta_N} \color{black} t + \epsilon_I \\
                   \alpha_N + \beta_N t + \epsilon_N \\
                   \alpha_S + \beta_S t + \epsilon_S \end{array} \\
  \nonumber
\end{eqnarray}

where $t$ is truth and $\epsilon_I \epsilon_N \epsilon_S$ are independent errors.  The regression slope of interest is $\beta_N$ (or $\beta_S$).  Following McColl et al. (2014), we can retrieve $\beta_N$ from the $INS$ collocations.  In [julia](https://julialang.org) this is

In [12]:
ct = 9.0 ;            ci = 2.0;                         # rand() is random between 0 and 1
an = 1.0 ;            cn = 1.0;                         # randn() is Gaussian with 0 mean and SD of 1
as = 2.0 ; bs = 1.5 ; cs = 1.5;
numb = 10^5;                                            # number of collocations

for bn = 0.4:0.2:3.2
  TT = ct *  rand(numb);                                # truth
  DI = ci * randn(numb);                                # in situ error
  DN = cn * randn(numb);                                # nowcast error
  DS = cs * randn(numb);                                # satellite error
  II =           TT + DI;
  NN = an + bn * TT + DN;
  SS = as + bs * TT + DS;
  @printf("estimated bn %6.3f should be close to %6.3f\n", cov(NN,SS)/cov(II,SS), bn)
end

estimated bn  0.398 should be close to  0.400
estimated bn  0.599 should be close to  0.600
estimated bn  0.801 should be close to  0.800
estimated bn  0.997 should be close to  1.000
estimated bn  1.193 should be close to  1.200
estimated bn  1.396 should be close to  1.400
estimated bn  1.595 should be close to  1.600
estimated bn  1.792 should be close to  1.800
estimated bn  1.998 should be close to  2.000
estimated bn  2.202 should be close to  2.200
estimated bn  2.403 should be close to  2.400
estimated bn  2.594 should be close to  2.600
estimated bn  2.815 should be close to  2.800
estimated bn  2.993 should be close to  3.000
estimated bn  3.192 should be close to  3.200


The INFERS method employs a more complicated model, but with the simplest (AR-1) type of shared or propagated error.  Note that with application of AR-1, the minimum number of equations (and samples of the second dataset) to match the number of unknowns is four (NFER or NFRS), but the symmetry of five samples (NFERS) may facilitate a retrieval of all parameters, which is done numerically:

\begin{eqnarray} \\
  \begin{array}{r} \mathrm{in\ situ}\ \\
                   \mathrm{nowcast} \\
                   \mathrm{forecast} \\
                   \mathrm{extended\ forecast} \\
                   \mathrm{revcast} \\
                   \mathrm{extended\ revcast} \end{array}
  \begin{array}{r} I \\ N \\ F \\ E \\ R \\ S \end{array}
  \begin{array}{c} = \\  =  \\ = \\ = \\ = \\ = \end{array}
  \begin{array}{l} \color{white}{\alpha_N + \beta_N} \color{black} t + \color{white}{\lambda_E (  \lambda_F ( \lambda_N} \color{black} \epsilon_I \\
                                 \alpha_N + \beta_N  t + \color{white}{\lambda_E (                \lambda_F (} \color{black} \lambda_N \epsilon_I + \epsilon_N \\
                                 \alpha_F + \beta_F  t + \color{white}{\lambda_E (} \color{black} \lambda_F ( \lambda_N \epsilon_I + \epsilon_N ) + \epsilon_F \\
                                 \alpha_E + \beta_E  t +               \lambda_E (                \lambda_F ( \lambda_N \epsilon_I + \epsilon_N ) + \epsilon_F ) + \epsilon_E \\
                                 \alpha_R + \beta_R  t + \color{white}{\lambda_E (} \color{black} \lambda_R ( \lambda_N \epsilon_I + \epsilon_N ) + \epsilon_R \\
                                 \alpha_S + \beta_S  t +               \lambda_S (                \lambda_R ( \lambda_N \epsilon_I + \epsilon_N ) + \epsilon_R ) + \epsilon_S \end{array} \\
  \nonumber
\end{eqnarray}

We thus allow for correlated errors, possibly because the in situ data is assimilated by the analysis ($\lambda_N \epsilon_I$) or because there is common error in adjacent analysis windows (i.e., taking "persistence" as a reasonable forecast or revcast, shared error is accommodated by nonzero $\lambda_F$ and $\lambda_R$).  Retrieval of $\beta_N$ employs a cost function defined by the covariance of FR and ES.  In particular, the method requires error correlation between E and S (cf. discussion of instrumental variable regression in Su et al. 2014).  This is in stark contrast to the triple collocation method, where error correlation is to be avoided; here, error correlation is not only helpful, it is required!  In practice, analyses admit observations in windows that are typically as large or larger than the interval between analyses (note that fractional ES error covariance is also quantified by the product $\lambda_E \lambda_F \lambda_R \lambda_S$).

In [20]:
ct = 9.0 ;            ci = 2.0 ; gi = 0.5               # rand() is random between 0 and 1
an = 1.0 ;            cn = 1.0 ;                        # randn() is Gaussian with 0 mean and SD of 1
af = 2.0 ; bf = 1.5 ; cf = 1.5 ; gf = 0.9
ar = 3.0 ; br = 0.5 ; cr = 3.0 ; gr = 1.1
numb = 10^5;                                            # number of collocations

for bn = 0.4:0.2:3.2
#for bn = 1.0:1.0
  TT = ct *  rand(numb);                                # truth
  DI = ci * randn(numb);                                # in situ error
  DN = cn * randn(numb);                                # nowcast error
  DF = cs * randn(numb);                                # forecast error
  DR = cr * randn(numb);                                # revcast error
  II =           TT + DI;
  NN = an + bn * TT + DN +      gi * DI;
  FF = af + bf * TT + DF + gf * gi * DI + gf * DN;
  RR = ar + br * TT + DR + gr * gi * DI + gr * DN;

  vari = cov(II, II)
  varn = cov(NN, NN)
  varf = cov(FF, FF)
  varr = cov(RR, RR)
  cvin = cov(II, NN)
  cvif = cov(II, FF)
  cvir = cov(II, RR)
  cvnf = cov(NN, FF)
  cvnr = cov(NN, RR)
  cvfr = cov(FF, RR)

  function cost(st::Float64, bn::Float64)               # INFR cost function
   nbt = varn - bn^2 * st ; inbt = cvin - bn * st
    bf = (cvif * nbt - cvnf * inbt) / (st * nbt - bn * st * inbt)
    br = (cvir * nbt - cvnr * inbt) / (st * nbt - bn * st * inbt)
    fbt = varf - bf^2 * st ; ifbt = cvif - bf * st
    rbt = varr - br^2 * st ; irbt = cvir - br * st
    nbt < 0 || inbt < 0 || fbt < 0 || ifbt < 0 || rbt < 0 || irbt < 0 && return(9999.0)
    sf = varf - bf^2 * st - cvnf + bn * bf * st
    sr = varr - br^2 * st - cvnr + bn * br * st
    cvfr * nbt - bf * br * st * nbt - (cvnf - bn * bf * st) * (cvnr - bn * br * st) - varr - varf + (bf^2 + br^2) * st + cvnf + cvnr - (bf + br) * bn * st + 1.5 * (sf + sr)
  end

  minst = minimum([cvin^2 / varn, cvif^2 / varf, cvir^2 / varr])
  maxst = vari
  intst = collect(linspace(minst, maxst, 100))
  minbn = cvin / vari                                   # check cost for values between OLS and RLS
  maxbn = varn / cvin
  intbn = collect(linspace(minbn, maxbn, 100))
  costs = Array(Float64, 100, 100)
  for (a, vala) in enumerate(intst), (b, valb) in enumerate(intbn)
    costs[a,b] = abs(cost(vala, valb))
  end
  indst, indbn = ind2sub(costs, findmin(costs)[2])
  @printf("estimated bn %6.3f should be close to %6.3f\n", intbn[indbn], bn)
end

estimated bn  0.446 should be close to  0.400
estimated bn  0.729 should be close to  0.600
estimated bn  0.688 should be close to  0.800
estimated bn  0.994 should be close to  1.000
estimated bn  0.942 should be close to  1.200
estimated bn  1.067 should be close to  1.400
estimated bn  1.192 should be close to  1.600
estimated bn  1.321 should be close to  1.800
estimated bn  1.521 should be close to  2.000
estimated bn  1.701 should be close to  2.200
estimated bn  1.763 should be close to  2.400
estimated bn  1.923 should be close to  2.600
estimated bn  2.064 should be close to  2.800
estimated bn  2.553 should be close to  3.000
estimated bn  2.760 should be close to  3.200
