-
Notifications
You must be signed in to change notification settings - Fork 6
/
ggmix.Rd
192 lines (166 loc) · 8.1 KB
/
ggmix.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ggmix.R
\name{ggmix}
\alias{ggmix}
\title{Fit Linear Mixed Model with Lasso or Group Lasso Regularization}
\usage{
ggmix(
x,
y,
U,
D,
kinship,
K,
n_nonzero_eigenvalues,
n_zero_eigenvalues,
estimation = c("full"),
penalty = c("lasso"),
group,
penalty.factor = rep(1, p_design),
lambda = NULL,
lambda_min_ratio = ifelse(n_design < p_design, 0.01, 1e-04),
nlambda = 100,
eta_init = 0.5,
maxit = 100,
fdev = 1e-20,
standardize = FALSE,
alpha = 1,
thresh_glmnet = 1e-08,
epsilon = 1e-04,
dfmax = p_design + 2,
verbose = 0
)
}
\arguments{
\item{x}{input matrix, of dimension n x p; where n is the number of
observations and p are the number of predictors.}
\item{y}{response variable. must be a quantitative variable}
\item{U}{left singular vectors corresponding to the non-zero eigenvalues
provided in the \code{D} argument.}
\item{D}{non-zero eigenvalues. This option is provided to the user should
they decide or need to calculate the eigen decomposition of the kinship
matrix or the singular value decomposition of the matrix of SNPs used to
calculate the kinship outside of this function. This may occur, if for
example, it is easier (e.g. because of memory issues, it's easier to
calculate in plink). This should correspond to the non-zero eigenvalues
only. Note that if you are doing an \code{svd} on the matrix of SNPs used
to calculate the kinship matrix, then you must provide the square of the
singular values so that they correspond to the eigenvalues of the kinship
matrix. If you want to use the low rank estimation algorithm, you must
provide the truncated eigenvalues and eigenvectors to the \code{D} and
\code{U} arguments, respectively. If you want \code{ggmix} to truncate the
eigenvectors and eigenvalues for low rank estimation, then provide either
\code{K} or \code{kinship} instead and specify
\code{n_nonzero_eigenvalues}.}
\item{kinship}{positive definite kinship matrix}
\item{K}{the matrix of SNPs used to determine the kinship matrix}
\item{n_nonzero_eigenvalues}{the number of nonzero eigenvalues. This argument
is only used when \code{estimation="low"} and either \code{kinship} or
\code{K} is provided. This argument will limit the function to finding the
\code{n_nonzero_eigenvalues} largest eigenvalues. If \code{U} and \code{D}
have been provided, then \code{n_nonzero_eigenvalues} defaults to the
length of \code{D}.}
\item{n_zero_eigenvalues}{Currently not being used. Represents the number of
zero eigenvalues. This argument must be specified when \code{U} and
\code{D} are specified and \code{estimation="low"}. This is required for
low rank estimation because the number of zero eigenvalues and their
corresponding eigenvalues appears in the likelihood. In general this would
be the rank of the matrix used to calculate the eigen or singular value
decomposition. When \code{kinship} is provided and \code{estimation="low"}
the default value will be \code{nrow(kinship) - n_nonzero_eigenvalues}.
When \code{K} is provided and \code{estimation="low"}, the default value is
\code{rank(K) - n_nonzero_eigenvalues}}
\item{estimation}{type of estimation. Currently only \code{type="full"} has
been implemented.}
\item{penalty}{type of regularization penalty. Currently, only
penalty="lasso" has been implemented.}
\item{group}{a vector of consecutive integers describing the grouping of the
coefficients. Currently not implemented, but will be used when
penalty="gglasso" is implemented.}
\item{penalty.factor}{Separate penalty factors can be applied to each
coefficient. This is a number that multiplies lambda to allow differential
shrinkage. Can be 0 for some variables, which implies no shrinkage, and
that variable is always included in the model. Default is 1 for all
variables}
\item{lambda}{A user supplied lambda sequence (this is the tuning parameter).
Typical usage is to have the program compute its own lambda sequence based
on nlambda and lambda.min.ratio. Supplying a value of lambda overrides
this. WARNING: use with care. Do not supply a single value for lambda (for
predictions after CV use predict() instead). Supply instead a decreasing
sequence of lambda values. glmnet relies on its warms starts for speed, and
its often faster to fit a whole path than compute a single fit.}
\item{lambda_min_ratio}{Smallest value for lambda, as a fraction of
lambda.max, the (data derived) entry value (i.e. the smallest value for
which all coefficients are zero). The default depends on the sample size
nobs relative to the number of variables nvars. If nobs > nvars, the
default is 0.0001, close to zero. If nobs < nvars, the default is 0.01. A
very small value of lambda.min.ratio will lead to a saturated fit in the
nobs < nvars case.}
\item{nlambda}{the number of lambda values - default is 100.}
\item{eta_init}{initial value for the eta parameter, with \eqn{0 < \eta < 1}
used in determining lambda.max and starting value for fitting algorithm.}
\item{maxit}{Maximum number of passes over the data for all lambda values;
default is 10^2.}
\item{fdev}{Fractional deviance change threshold. If change in deviance
between adjacent lambdas is less than fdev, the solution path stops early.
factory default = 1.0e-5}
\item{standardize}{Logical flag for x variable standardization, prior to
fitting the model sequence. The coefficients are always returned on the
original scale. Default is standardize=FALSE. If variables are in the same
units already, you might not wish to standardize.}
\item{alpha}{The elasticnet mixing parameter, with \eqn{0 \leq \alpha \leq
1}. alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.}
\item{thresh_glmnet}{Convergence threshold for coordinate descent for
updating beta parameters. Each inner coordinate-descent loop continues
until the maximum change in the objective after any coefficient update is
less than thresh times the null deviance. Defaults value is 1E-7}
\item{epsilon}{Convergence threshold for block relaxation of the entire
parameter vector \eqn{\Theta = ( \beta, \eta, \sigma^2 )}. The algorithm
converges when \deqn{crossprod(\Theta_{j+1} - \Theta_{j}) < \epsilon}.
Defaults value is 1E-4}
\item{dfmax}{limit the maximum number of variables in the model. Useful for
very large \code{p} (the total number of predictors in the design matrix),
if a partial path is desired. Default is the number of columns in the
design matrix + 2 (for the variance components)}
\item{verbose}{display progress. Can be either 0,1 or 2. 0 will not display
any progress, 2 will display very detailed progress and 1 is somewhere in
between. Default: 0.}
}
\value{
list which includes the fitted object, lambda sequence,
solution path, variance-covariance parameters, degrees of freedom,
and singular values, vectors of kinship matrix
}
\description{
Main function to fit the linear mixed model with lasso or group
lasso penalty for a sequence of tuning parameters. This is a penalized
regression method that accounts for population structure using either the
kinship matrix or the factored realized relationship matrix
}
\examples{
data(admixed)
fitlmm <- ggmix(x = admixed$xtrain, y = admixed$ytrain,
kinship = admixed$kin_train,
estimation = "full")
gicfit <- gic(fitlmm)
coef(gicfit, type = "nonzero")
predict(gicfit, newx = admixed$xtest)[1:5,,drop=FALSE]
plot(gicfit)
plot(fitlmm)
}
\references{
Bhatnagar, Sahir R, Yang, Yi, Lu, Tianyuan, Schurr, Erwin,
Loredo-Osti, JC, Forest, Marie, Oualkacha, Karim, and Greenwood, Celia MT.
(2020) \emph{Simultaneous SNP selection and adjustment for population
structure in high dimensional prediction models}
\doi{10.1101/408484}
Friedman, J., Hastie, T. and Tibshirani, R. (2008) \emph{Regularization
Paths for Generalized Linear Models via Coordinate Descent},
\url{https://web.stanford.edu/~hastie/Papers/glmnet.pdf} \emph{Journal of
Statistical Software, Vol. 33(1), 1-22 Feb 2010}
\url{https://www.jstatsoft.org/v33/i01/}
Yang, Y., & Zou, H. (2015). A fast unified algorithm for solving
group-lasso penalize learning problems. \emph{Statistics and Computing},
25(6), 1129-1141.
\url{https://www.math.mcgill.ca/yyang/resources/papers/STCO_gglasso.pdf}
}