Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGraphical models query #12
Comments
|
In principle, this is the correct place for your querry. However, I cannot really advice on the use of stability selection with graphical models as I am no expert in the latter. However, there exists some literature regarding this combination:
Searching the web will surely provide more examples of various flavors of graphical modelling with stability selection. Regarding your general questions regarding stability selection: As shown in our article, the choice of q should be such that it is large enough to capture all anticipated variables but (usually much) smaller than the number of available predictors. Meinshausen and Bühlmann propose in one place to choose q = sqrt(0.8 * p) or sqrt(0.8 * alpha * p), where alpha is for example 0.05 (i.e., the significance level). Yet, these choices are not applicable in all cases. I'd suggest to play arround and have a look at the selection frequencies as well as keep an eye on the PFER. Regarding your final question: Please note that resampling of size n/2 is important for the derivation of the bound for the PFER. Thus, I am not fully aware of the impact on theoretical properties! |
|
Thanks for your comments - I realise this is a potentially tricky question. On Tue, Sep 6, 2016 at 9:31 PM, Benjamin Hofner notifications@github.com
|
|
Sorry, I don't know such a package. (I also don't know any other package which implements the Shah/Samworth bounds which are usually preferable). However, I would love to add the relevant functions to stabs. What I would need is a function that takes arguments If we would need to use resampling of individuals rather than cases, we could consider to implement such a resampling functionality as well. However, you can always do this by hand if you use Another way would be to do the resampling (with samples of size
If the first way is doable, you could either provide a patch (i.e., the relevant code) or pointers to the relevant packages and functions. I would then assist you writing the required function(s) and manual(s). |
|
Thanks, On Wed, Sep 7, 2016 at 5:01 PM, Benjamin Hofner notifications@github.com
|
|
Correct. See Meinshausen and Bühlmann |
|
Was just starting to look into coding this and discovered the "pulsar" https://cran.r-project.org/package=pulsar Which looks like it might do a lot of the work. Having a read now to see On Wed, Sep 7, 2016 at 10:22 PM, Benjamin Hofner notifications@github.com
|
|
Hi, I've written a couple of stubs for testing graphical methods - see what you getLamPath <- function (max, min, len, log = FALSE)
{
if (max < min)
stop("Did you flip min and max?")
if (log) {
min <- log(min)
max <- log(max)
}
lams <- seq(max, min, length.out = len)
if (log)
exp(lams)
else lams
}
set.seed(10010)
p <- 40 ; n <- 1000
dat <- huge::huge.generator(n, p, "hub", verbose=FALSE, v=.1, u=.5)
stabs.quic <- function(x, y, q, ...)
{
## sort out a lambda path
if (!requireNamespace("QUIC")) {
stop("Package ", sQuote("QUIC"), " is required but not available")
}
empirical.cov <- cov(x)
max.cov <- max(abs(empirical.cov[upper.tri(empirical.cov)]))
lams <- getLamPath(max.cov, max.cov*0.05, len=40)
est <- QUIC::QUIC(empirical.cov, rho=1, path=lams,msg=0)
ut <- upper.tri(empirical.cov)
qvals <- sapply(1:length(lams), function(idx){
m <- est$X[,,idx]
sum(m[ut] != 0)
})
## Not sure if it is better to have more or less than q
lamidx <- which.max(qvals >= q)
## Need to return the entire upper triangle - think about how to save
## ram later
M <- est$X[,,lamidx][ut]
selected <- (M != 0)
s <- sapply(1:lamidx, function(idx){
m <- est$X[,,idx][ut] != 0
return(m)
})
colnames(s) <- as.character(1:ncol(s))
return(list(selected=selected, path=s))
}
sq <- stabsel(x=dat$data, y=dat$data, fitfun=stabs.quic, cutoff=0.75,
PFER=1) |
Hi,
Not sure if this is the forum you'd like to use for queries - let me know if it isn't.
I'm exploring approaches using the JGL package, specifically the fused group lasso. I'm likely to be working with two groups. I have the mechanisms in place to compute the two lambda values. The difference in partial correlation coefficient for corresponding graph edges is of interest. I have explored bootstrapping approaches to characterising this, but a stability selection approach looks interesting.
I'm unsure of how to use the q parameter in this setting. Do you have examples for glasso-like cases? I also need to be careful about how the resampling occurs within groups.
Thanks