This R package aims to fit Repeated Linear Regressions in which there are some same terms.
Suppose we want to fit a set of regressions which only differ in one variable. Specifically, denote the response variable as , and these regressions are as follows.
where are the same variables among these regressions.
Intuitively, we can finish this task by using a simple loop in R code.
model = vector(mode='list', length=n)
for (i in 1:n)
{
...
model[[i]] = lm(y~x)
}
However, it is not efficient in that situation. Due to the same variables, there are some same calculations among different regressions. In order to reduce the cost of computation, I write this fRLR
package which avoiding to repeat the same calculations. See more details in my blog, where I present the whole mathematical formula.
Let me take a simulation example to show how to use this R package.
## use fRLR package
library(fRLR)
set.seed(123)
X = matrix(rnorm(50), 10, 5)
Y = rnorm(10)
COV = matrix(rnorm(40), 10, 4)
frlr1(X, Y, COV)
## use simple loop
res = matrix(nrow = 0, ncol = 2)
for (i in 1:ncol(X))
{
mat = cbind(X[,i], COV)
df = as.data.frame(mat)
model = lm(Y~., data = df)
tmp = c(i, summary(model)$coefficients[2, 4])
res = rbind(res, tmp)
}
Then we can obtain the following results
To show the speed of my package, let me increase the number of regressions, and compare the time duration with simple-loop method.
library(fRLR)
set.seed(123)
n = 500
X = matrix(rnorm(10*n), 10, n)
Y = rnorm(10)
COV = matrix(rnorm(40), 10, 4)
#idx1 = c(1, 2, 3, 4, 1, 1, 1, 2, 2, 3)
#idx2 = c(2, 3, 4, 5, 3, 4, 5, 4, 5, 5)
id = combn(n, 2)
idx1 = id[1, ]
idx2 = id[2, ]
system.time(frlr2(X, idx1, idx2, Y, COV))
simpleLoop <- function()
{
res = matrix(nrow=0, ncol=4)
for (i in 1:length(idx1))
{
mat = cbind(X[, idx1[i]], X[,idx2[i]], COV)
df = as.data.frame(mat)
model = lm(Y~., data = df)
tmp = c(idx1[i], idx2[i], summary(model)$coefficients[2,4], summary(model)$coefficients[3,4])
res = rbind(res, tmp)
}
}
system.time(simpleLoop())
The results are as follows.
As you can see, fRLR
can speed this task significantly.
You'd better install this package on Linux or Mac, because they perfectly support the GNU scientific library (GSL). Of course, you can still this package on windows if you setup your GSL environment, and I opened a repository GSL where you can find solutions to install GSL on windows, although it might be cumbersome.
If GSL has been installed on your computer, you can install the package by the following command
devtools::install_github('szcf-weiya/fRLR')