Latex Macros:
$$
\newcommand{\E}{\text{E}}
\newcommand{\mbf}{\mathbf}
\newcommand{\bs}{\boldsymbol}
\newcommand{\Cov}{\text{Cov}}
\newcommand{\Var}{\text{Var}}
\newcommand{\Cov}{\text{Cov}}
\newcommand{\A}[1]{\mathbf{A}_{#1}}
\newcommand{\Ai}[1]{\mathbf{A}^{#1}}
$$

In [1]:
macro javascript_str(s) display("text/javascript", s); end
javascript"""
    MathJax.Hub.Config({
      TeX: { equationNumbers: { autoNumber: "AMS" } }
    });
    MathJax.Hub.Queue( 
        ["resetEquationNumbers",MathJax.InputJax.TeX], 
        ["PreProcess",MathJax.Hub], 
        ["Reprocess",MathJax.Hub] 
    );
"""

# Condition numbers of the MME corresponding to the marker-effects and breeding-value models

Consider the following marker effects model (MEM):

\begin{equation}
\mathbf{y} = \mathbf{1}\mu + \mathbf{X}\bs{\alpha} + \mathbf{e},
\end{equation}
where $\mathbf{X}$ is a matrix of marker covariates, $\bs{\alpha}$ a vector marker effects that are assumed to be random with null means and covariance matrix $\mathbf{I}\sigma^2_{\alpha}$, and $\mathbf{e}$ is a vector of residuals with null means and covariance matrix $\mathbf{I}\sigma^2_e$. The When genotype data are complete, the  coefficient matrix of the MME corresponding to this model is:


\begin{equation}
\label{MMEMM}
\begin{bmatrix}
\mathbf{1}'\mathbf{1} & \mathbf{1}'\mathbf{X} \\
\mathbf{X}'\mathbf{1} & \mathbf{X}'\mathbf{X} + \mathbf{I}\lambda
\end{bmatrix},
\end{equation}
where $\lambda = \frac{\sigma^2_e}{\sigma^2_{\alpha}}$. The equivalent breeding-value model (BVM) is:

\begin{equation}
\mathbf{y} = \mathbf{1}\mu + \mathbf{g} + \mathbf{e},
\end{equation}

where $\mathbf{g} = \mathbf{X}\bs{\alpha}$, which has null expectation and covariance matrix $\mathbf{X}\mathbf{X}'\sigma^2_{\alpha}$. The coefficient matrix of the MME corresponding this model is:

\begin{equation}
\label{MMEBVMSingular}
\begin{bmatrix}
\mathbf{1}'\mathbf{1} & \mathbf{1}'\\
\mathbf{1} &  \mathbf{I} + (\mathbf{X}\mathbf{X}')^{-1} \lambda
\end{bmatrix}
\end{equation}

When the marker covariates are centered, the $n\times n$ matrix 
$\mathbf{X}\mathbf{X}'$ is singular even when $n<p$. Thus, MME are modified as:

\begin{equation}
\label{MMEBVM}
\begin{bmatrix}
\mathbf{1}'\mathbf{1} & \mathbf{1}'\\
\mathbf{1} &  \mathbf{I} + (\mathbf{X}\mathbf{X}' \frac{1}{\lambda} + \mathbf{I} \times \delta)^{-1} 
\end{bmatrix}.
\end{equation}

A small numerical example is used below to find an appropriate value for $\delta$

In [25]:
using Distributions, LinearAlgebra
Identity(n) = Matrix(I,n,n)

Identity (generic function with 1 method)

### Simulate $\mathbf{X}$ and $\mathbf{y}$ 

In [444]:
n,p = 5, 10
varAlpha = 1
pq       = 0.25
varGen   = 2pq*varAlpha*p
varRes   = varGen
λ        = varRes/varAlpha
X = rand(Binomial(2,0.5), n,p)
α = randn(p)
y = X*α + randn(n)*sqrt(varRes);

### Centering $\mathbf{X}$

In [445]:
meanX = mean(X,dims=1)
X = X .- meanX;

### MME for the marker-effects model 

\begin{equation*}
\begin{bmatrix}
\mathbf{1}'\mathbf{1} & \mathbf{1}'\mathbf{X} \\
\mathbf{X}'\mathbf{1} & \mathbf{X}'\mathbf{X} + \mathbf{I}\lambda
\end{bmatrix}
\end{equation*}

In [446]:
J = ones(n)
lhsMEM = [J'J   J'X
          X'J   X'X + Identity(p)*λ]
rhsMEM = [J'y; X'y]
solMEM = inv(lhsMEM)*rhsMEM
ebvMEM = [J X]*solMEM;

### MME for the breeding-value model

\begin{equation*}
\begin{bmatrix}
\mathbf{1}'\mathbf{1} & \mathbf{1}'\\
\mathbf{1} &  \mathbf{I} + (\mathbf{X}\mathbf{X}' \frac{1}{\lambda} + \mathbf{I} \times \delta)^{-1} 
\end{bmatrix}
\end{equation*}

In [447]:
lhsBVM = [J'J  J'
          J    Identity(n) + inv(X*X'/λ+ Identity(n)*0.05)]   
rhsBVM = [J'y;   y]
solBVM = inv(lhsBVM)*rhsBVM
ebvBVM = [J Identity(n)]*solBVM;

### Predictions from the two models with $\delta=0.05$ 

In [448]:
round.([ebvMEM ebvBVM],digits=2)

5×2 Array{Float64,2}:
 -2.28  -2.27
 -1.6   -1.69
 -0.36  -0.3 
 -0.59  -0.59
  0.22   0.24

### Predictions from the two models with $\delta=0.005$ 

In [449]:
lhsBVM = [J'J  J'
          J    Identity(n) + inv(X*X'/λ+ Identity(n)*0.005)]   
rhsBVM = [J'y;   y]
solBVM = inv(lhsBVM)*rhsBVM
ebvBVM = [J Identity(n)]*solBVM;

In [450]:
round.([ebvMEM ebvBVM],digits=2)

5×2 Array{Float64,2}:
 -2.28  -2.28
 -1.6   -1.61
 -0.36  -0.35
 -0.59  -0.59
  0.22   0.22

### Predictions from the two models with $\delta=0.0005$ 

In [460]:
lhsBVM = [J'J  J'
          J    Identity(n) + inv(X*X'/λ+ Identity(n)*0.0005)]   
rhsBVM = [J'y;   y]
solBVM = inv(lhsBVM)*rhsBVM
ebvBVM = [J Identity(n)]*solBVM;

In [461]:
round.([ebvMEM ebvBVM],digits=2)

5×2 Array{Float64,2}:
 -2.28  -2.28
 -1.6   -1.6 
 -0.36  -0.36
 -0.59  -0.59
  0.22   0.22

When $\delta=0.0005$, predictions from (\ref{MMEBVM}) agree with the exact predictions up to two decimal places.  

### Condition numbers for MEM and BVM (X is centered)

Condition numbers are calculated below for MEM and BVM assuming $\sigma^2_e = 25$ and $\sigma^2_{\alpha}=1.0$. 

In [21]:
function condNMEMBVM(n,p,λ)
    X = rand(Binomial(2,0.5), n,p)
    meanX = mean(X,dims=1)
    X = X .- meanX
    J = ones(n)
    mmeMM = [J'J       J'X
             X'J   X'X + Identity(p)*λ]
    mmeAM = [J'J   J'
             J     Identity(n) + inv(X*X'/λ + Identity(n)*0.0005)]      
    return n, cond(mmeMM), cond(mmeAM)
end    

condNMEMBVM (generic function with 1 method)

In [23]:
p = 50
varAlpha = 1
pq       = 0.25
varGen   = 2pq*varAlpha*p
varRes   = varGen
λ        = varRes/varAlpha

25.0

In [26]:
res = [condNMEMBVM(n,50,λ) for n in [5,10,20,40,50,100,500,1000,2000]]
resMatC = [i[j] for i in res, j=1:3];

In [454]:
println(" ----------------------")
println("    n     MEM    BVM")
println(" ----------------------")
for i in res
    @printf("%5d %8.2f %8.2f\n", i[1], i[2], i[3])
end

 ----------------------
    n     MEM    BVM
 ----------------------
    5    11.29  1115.31
   10     6.83  1269.23
   20     3.96  1369.73
   40     4.88  1591.26
   50     4.60  1566.13
  100     5.21  1686.20
  500     3.56  1888.08
 1000     2.93  1934.13
 2000     2.68  2007.30


The condition number for the MEM tends to improve with $n$, but the opposite was true for the BVM. 
This behavior in the condition of the MME for the MEM is expected because the $p\times p$ matrix 
$\mathbf{X}'\mathbf{X}$ is singular when $n<p$ and, in this case, the MME for the MEM is non-singular
only because $\bs{\alpha}$ is considered random and $\mathbf{I}\lambda$ is added to 
$\mathbf{X}'\mathbf{X}$. As $n$ gets large so does the rank of $\mathbf{X}'\mathbf{X}$ until it 
becomes full rank, and
as a result the condition of the MME improves. On the other hand, the $n\times n$ matrix 
$\mathbf{X}\mathbf{X}'$ has rank $n-1$ when $n<p$, and it would have full rank if the covariates were 
not centered. However, as $n$ gets larger than $p$, the MME for the BVM grows in size, but the
rank cannot be greater than $p$. Thus, the MME has to be modified as in (\ref{MMEBVM}). The condition
of this MME is inversely related to $\delta$. However, if $\delta$ is chosen to be too large, the 
predictions are not a good  approximation of the exact BLUPs. As $n$ grows, more diagonals of
$\mathbf{X}\mathbf{X}'$ have to be approximated and the condition of the MME deteriorates. 

### Condition numbers for MEM and BVM (X is not centered)

In [455]:
function condNMEMBVMNC(n,p)
    X = rand(Binomial(2,0.5), n,p)
    J = ones(n)
    mmeMM = [J'J       J'X
             X'J   X'X + Identity(p)*λ]
    mmeAM = [J'J   J'
             J     Identity(n) + inv(X*X'/λ + Identity(n)*0.0005)]     
    return n, cond(mmeMM), cond(mmeAM)
end    

condNMEMBVMNC (generic function with 1 method)

In [456]:
res = [condNMEMBVMNC(n,50) for n in [5,10,20,40,50,100,500,1000,2000]]
resMatNC = [i[j] for i in res, j=1:3];
println(" -----------------------")
println("    n     MEM    BVM")
println(" -----------------------")
for i in res
    @printf("%5d %8.2f %9.2f\n", i[1], i[2], i[3])
end

 -----------------------
    n     MEM    BVM
 -----------------------
    5  2572.81    344.65
   10  4411.17    946.65
   20  7191.82   3059.78
   40 10091.16   7810.73
   50  9924.63 173882.51
  100  7533.99 284687.59
  500  6164.13 230832.54
 1000  5000.21 192642.28
 2000  5439.89 206850.65


Clearly, centering the marker covariates is important for both models. In SSMM, the covariates should be centered both before and after imputing the missing covariates. Failing to center all the covariates after the missing genotypes have been imputed may result in poorly conditioned MME. 


### Creating LaTex tables

In [16]:
function latexTable(A;
    fileName::AbstractString = "",
    colLabels = "",
    rowLabels = ""
    )
    if fileName == ""
        outFile = stdout
    else
        outFile = open(fileName, "w")
    end
    rows, cols = size(A)
    println(outFile,"\\begin{center}")
    print(outFile,"\\begin{tabular}{",)
    if rowLabels!=""
        print(outFile,"l") 
    end      
    for j=1:cols
        print(outFile,"r")
    end
    println(outFile,"}\\hline")
    if colLabels!=""
        nCol = length(colLabels)
        for j = 1:(nCol-1)
            print(outFile,colLabels[j]," & ")
        end
        print(outFile,colLabels[nCol]," \\\\ \\hline  \n")
    end
    for i =1:rows
        if rowLabels!=""
            print(outFile,rowLabels[i]," & ")
        end
        for j = 1:(cols-1)
            print(outFile,A[i,j])
            print(outFile," & ")
        end
        print(outFile,A[i,cols]," \\\\ \n")
    end
    println(outFile,"\\hline")
    println(outFile,"\\end{tabular}")
    println(outFile,"\\end{center}")
    if fileName != ""
        close(outFile)
    end
end

latexTable (generic function with 3 methods)

In [457]:
latexTable([resMatC[:,1] round.(resMat[:,2:3],digits=2)],colLabels=["\$n\$", "MEM", "BVM"])

\begin{center}
\begin{tabular}{rrr}\hline
$n$ & MEM & BVM \\ \hline  
5 & 13.81 & 1276.43 \\ 
10 & 6.53 & 1235.04 \\ 
20 & 4.22 & 1408.31 \\ 
40 & 3.91 & 1489.56 \\ 
50 & 4.56 & 1561.98 \\ 
100 & 5.48 & 1691.89 \\ 
500 & 3.23 & 1892.73 \\ 
1000 & 2.95 & 1934.65 \\ 
2000 & 2.65 & 2007.26 \\ 
\hline
\end{tabular}
\end{center}


In [458]:
latexTable([resMatNC[:,1] round.(resMatNC[:,2:3],digits=2)],colLabels=["\$n\$", "MEM", "BVM"])

\begin{center}
\begin{tabular}{rrr}\hline
$n$ & MEM & BVM \\ \hline  
5 & 2572.81 & 344.65 \\ 
10 & 4411.17 & 946.65 \\ 
20 & 7191.82 & 3059.78 \\ 
40 & 10091.16 & 7810.73 \\ 
50 & 9924.63 & 173882.51 \\ 
100 & 7533.99 & 284687.59 \\ 
500 & 6164.13 & 230832.54 \\ 
1000 & 5000.21 & 192642.28 \\ 
2000 & 5439.89 & 206850.65 \\ 
\hline
\end{tabular}
\end{center}


In [46]:
B = round.(randn(3,5),digits=2)
latexTable(B)

\begin{center}
\begin{tabular}{rrrrr}\hline
1.07 & -0.96 & 0.28 & 1.49 & -1.02 \\ 
-0.12 & -1.21 & -0.17 & 0.15 & 0.56 \\ 
-0.04 & -0.32 & 1.88 & -0.95 & 1.48 \\ 
\hline
\end{tabular}
\end{center}
