## Computing full-stored or sparse-stored X'X without matrix multiplication

Consider the following data from a hypothetical one-way experiment with four levels of a treatment.

### Data

In [1]:
using DataFrames
data = DataFrame(x=[1,1,2,2,2,2,3,3,4,1],y=[1.1,1.2,1.9,1.2,2.0,1.7,1.0,1.7,1.1,1.7])

Unnamed: 0_level_0,x,y
Unnamed: 0_level_1,Int64,Float64
1,1,1.1
2,1,1.2
3,2,1.9
4,2,1.2
5,2,2.0
6,2,1.7
7,3,1.0
8,3,1.7
9,4,1.1
10,1,1.7


The $\mathbf{X}$ matrix for the one way model

$$
y_{ij} = \mu + \alpha_i + e_{ij}
$$

is

$$
\mathbf{X} = 
\begin{bmatrix}
1 & 1 & 0 & 0 & 0 \\
1 & 1 & 0 & 0 & 0 \\
1 & 0 & 1 & 0 & 0 \\
1 & 0 & 1 & 0 & 0 \\
1 & 0 & 1 & 0 & 0 \\
1 & 0 & 1 & 0 & 0 \\
1 & 0 & 0 & 1 & 0 \\
1 & 0 & 0 & 1 & 0 \\
1 & 0 & 0 & 0 & 1 \\
1 & 1 & 0 & 0 & 0 \\
\end{bmatrix}
$$


Note that any row of $\mathbf{X}$ contains only two non-zero elements. These correspond to $\mu$ and $\alpha_i$ in the model. Recall that $\mathbf{x}_i$ denotes row $i$ of $\mathbf{X}$. The first element of $\mathbf{x}_i$ corresponds to $\mu$. Thus, all $\mathbf{x}_i$ will contain a "1" in this position. The second element of $\mathbf{x}_i$ corresponds to $\alpha_1$. Thus, $\mathbf{x}_1$, $\mathbf{x}_2$ and $\mathbf{x}_{10}$ contain a "1" in this position because observations 1, 2 and 10 are from treatment 1. So, the contribution from the first observation, for example,  to the $\mathbf{X'X}$ matrix is

$$
\mathbf{x}_1\mathbf{x}'_1 = 
\begin{bmatrix}
1 \\
1 \\
0 \\
0 \\
0
\end{bmatrix} 
\begin{bmatrix}
1 & 1 & 0 & 0 & 0 
\end{bmatrix} = 
\begin{bmatrix}
1 & 1 & 0 & 0 & 0 \\
1 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 
\end{bmatrix}
$$

The contributions from observations 2 and 10 would be identical to this because $\mathbf{x}_2$ and $\mathbf{x}_{10}$ are identical to $\mathbf{x}_1$. In general, set $pos\_\mu = 1$, which is the column position of the "1" in any $\mathbf{x}_i'$ corresponding to $\mu$, and set $pos\_\alpha$ equal to the column position for the "1" in any $\mathbf{x}_i'$ corresponding to $\alpha_i$. Then, the positions of the contributions to $\mathbf{X'X}$ from any observations are: ($pos\_\mu$,$pos\_\mu$), ($pos\_\mu$,$pos\_\alpha$), ($pos\_\alpha$,$pos\_\mu$) and ($pos\_\alpha$,$pos\_\alpha$). Further, in the one-way model, the contribution to each of these positions is a "1". So, $\mathbf{X'X}$ can be constructed efficiently by setting $pos\_\alpha=1+ilevel$, where ilevel is the level of the factor A for observation $i$, and adding "1" to positions ($pos\_\mu$,$pos\_\mu$), ($pos\_\mu$,$pos\_\alpha$), ($pos\_\alpha$,$pos\_\mu$) and ($pos\_\alpha$,$pos\_\alpha$) in $\mathbf{X'X}$ for each observation in the data file. Similarly, $\mathbf{X'y}$ can be constructed efficiently by adding $\mathbf{y}_i$ to positions $pos\_\mu$ and $pos\_\alpha$ in $\mathbf{X'y}$. This strategy is used in the program given below.

#### Number of levels for $\alpha$

In [3]:
nlevels = length(unique(data[:x]));

#### Make X'X and X'y

In [3]:
p = nlevels + 1
lhs = fill(0.0,p,p)
rhs = fill(0.0,p);
for i in 1:size(data,1)
    pos_μ              = 1
    pos_α              = 1 + data[i,:x]
    lhs[pos_μ,pos_μ]  += 1.0
    lhs[pos_μ,pos_α]  += 1.0
    lhs[pos_α,pos_μ]  += 1.0
    lhs[pos_α,pos_α]  += 1.0
    
    y            = data[i,:y]
    rhs[pos_μ]  += y
    rhs[pos_α]  += y   
end

In [4]:
lhs

5×5 Array{Float64,2}:
 10.0  3.0  4.0  2.0  1.0
  3.0  3.0  0.0  0.0  0.0
  4.0  0.0  4.0  0.0  0.0
  2.0  0.0  0.0  2.0  0.0
  1.0  0.0  0.0  0.0  1.0

In [5]:
rhs

5-element Array{Float64,1}:
 14.599999999999998
  4.0              
  6.8              
  2.7              
  1.1              

#### Solution
Note $\mathbf{X'X}$ is singular, but a solution can be obtained as follows. 

In [6]:
sol = lhs\rhs

5-element Array{Float64,1}:
  1.7458333333333331 
 -0.4125000000000002 
 -0.04583333333333354
 -0.3958333333333334 
 -0.6458333333333334 

#### Verify solution 

In [7]:
[lhs*sol rhs]

5×2 Array{Float64,2}:
 14.6  14.6
  4.0   4.0
  6.8   6.8
  2.7   2.7
  1.1   1.1

### comparison of efficiencies of different ways to calculate X'X

### Big Example

In [1]:
using Distributions

#### Generate data

In [2]:
n       = 1_000_000
p       = 1000
levels  = sample(1:p,n)
α       = randn(p)
y       = [α[i] .+ randn() for i in levels];

### Computing X'X as product of full-stored (X' and X)

In [None]:
X = fill(0.0,(n,p+1));

In [None]:
@time for i = 1:n
    j      = levels[i] + 1
    X[i,1] = 1.0
    X[i,j] = 1.0
end

In [9]:
@time lhs = X'X;

  0.723439 seconds (293.60 k allocations: 48.528 MiB, 0.74% gc time)


### Computing full-stored X'X without matrix multiplication

In [13]:
lhs = fill(0.0,p,p);

In [14]:
@time for i in 1:length(levels)
    pos_α               = levels[i]
    lhs[pos_α,pos_α]   += 1.0
end

  0.175713 seconds (5.49 M allocations: 98.989 MiB, 4.61% gc time)


### Computing sparse-stored X'X as product of sparse-stored (X' and X) 

In [3]:
using SparseArrays, LinearAlgebra
ii = 1:n
@time X = sparse(ii,levels,1.0);

  0.235881 seconds (351.43 k allocations: 62.986 MiB, 4.11% gc time)


In [4]:
X = [ones(n) X]

1000000×1001 SparseMatrixCSC{Float64,Int64} with 2000000 stored entries:
  [1      ,       1]  =  1.0
  [2      ,       1]  =  1.0
  [3      ,       1]  =  1.0
  [4      ,       1]  =  1.0
  [5      ,       1]  =  1.0
  [6      ,       1]  =  1.0
  [7      ,       1]  =  1.0
  [8      ,       1]  =  1.0
  [9      ,       1]  =  1.0
  [10     ,       1]  =  1.0
  [11     ,       1]  =  1.0
  [12     ,       1]  =  1.0
  ⋮
  [987487 ,    1001]  =  1.0
  [988526 ,    1001]  =  1.0
  [988977 ,    1001]  =  1.0
  [991821 ,    1001]  =  1.0
  [992766 ,    1001]  =  1.0
  [993510 ,    1001]  =  1.0
  [994744 ,    1001]  =  1.0
  [994876 ,    1001]  =  1.0
  [995594 ,    1001]  =  1.0
  [997826 ,    1001]  =  1.0
  [997863 ,    1001]  =  1.0
  [998542 ,    1001]  =  1.0

In [5]:
@time lhs = X'X;

  0.792606 seconds (283.54 k allocations: 69.151 MiB, 1.76% gc time)


In [6]:
lhs

1001×1001 SparseMatrixCSC{Float64,Int64} with 3001 stored entries:
  [1   ,    1]  =  1.0e6
  [2   ,    1]  =  996.0
  [3   ,    1]  =  962.0
  [4   ,    1]  =  999.0
  [5   ,    1]  =  1032.0
  [6   ,    1]  =  996.0
  [7   ,    1]  =  1009.0
  [8   ,    1]  =  1000.0
  [9   ,    1]  =  917.0
  [10  ,    1]  =  967.0
  [11  ,    1]  =  978.0
  [12  ,    1]  =  999.0
  ⋮
  [1   ,  996]  =  1004.0
  [996 ,  996]  =  1004.0
  [1   ,  997]  =  1019.0
  [997 ,  997]  =  1019.0
  [1   ,  998]  =  1005.0
  [998 ,  998]  =  1005.0
  [1   ,  999]  =  1026.0
  [999 ,  999]  =  1026.0
  [1   , 1000]  =  1028.0
  [1000, 1000]  =  1028.0
  [1   , 1001]  =  946.0
  [1001, 1001]  =  946.0

In [7]:
rhs = X'y;

In [8]:
QRLhs = qr(lhs)
sol = QRLhs\rhs
[lhs*sol rhs]

1001×2 Array{Float64,2}:
 -10944.7     -10944.7   
   -198.47      -198.47  
   -155.47      -155.47  
   -955.502     -955.502 
    137.293      137.293 
    561.892      561.892 
   -154.057     -154.057 
    939.809      939.809 
   -377.541     -377.541 
  -1099.08     -1099.08  
   -145.812     -145.812 
   -968.476     -968.476 
    409.695      409.695 
      ⋮                  
  -2054.3      -2054.3   
    499.959      499.959 
  -1228.5      -1228.5   
    137.28       137.28  
    -59.8778     -59.8778
   1271.02      1271.02  
   -341.985     -341.985 
    393.79       393.79  
    988.528      988.528 
   -851.492     -851.492 
  -1611.1      -1611.1   
   1304.83      1304.83  

### Computing sparse-stored X'X without matrix multiplication

In [17]:
lhs = spzeros(p,p)

1000×1000 SparseMatrixCSC{Float64,Int64} with 0 stored entries

In [18]:
@time for i in 1:length(levels)
    pos_α  = levels[i]
    lhs[pos_α,pos_α]   += 1.0
end

  0.176475 seconds (5.56 M allocations: 102.706 MiB, 6.28% gc time)


In [20]:
#varinfo()