Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Residual Variance Estimate Failed - Negative Value #162

Open
tianwen0003 opened this issue May 25, 2022 · 21 comments
Open

Residual Variance Estimate Failed - Negative Value #162

tianwen0003 opened this issue May 25, 2022 · 21 comments

Comments

@tianwen0003
Copy link

Hi

I just got the same error(2 out of 3000 runs) as #90 when did fine-mapping by susie_rss function.

Error in susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) * :
Estimating residual variance failed: the estimated value is negative
Calls: susie_rss -> susie_suff_stat
Execution halted

when set estimate_residual_variance = FALSE, the function run well but all pip are zero

Here, I provide the all files for you to test
ENSG00000149084.7.vcf.gz

ENSG00000149084.7_fastqtl.zip
In this file, the second column is the "slope" from fastqtl, the thrid column is the "slope_se" from fastqtl

ENSG00000149084.7_ldMatrix.zip
This is the ld matrix calculated from plink --r

the sample number n = 204

var_y = 0.960491

@stephens999
Copy link
Contributor

we recommend to use estimate_residual_variance = FALSE with susie_rss

If the PIP=0 with this setting then
this suggests there are no strong signals in the region.
Do you have strong signals in the region?

@tianwen0003
Copy link
Author

sorry for my late response.

  1. Actually, the signals in the regions are not weak, instead, the signals are very strong, the nominal p-values are range from 8.134e-05 to 3.09152e-50.
  2. Do you mean to use estimate_residual_variance = FALSE for all runs or the runs reported error?
  3. Could you please provide a input-preparing guide file for fine-mapping from raw genotype and phenotype and covariable file, the link in the https://stephenslab.github.io/susie-paper/ can not be opened

@pcarbo
Copy link
Member

pcarbo commented Jun 1, 2022

@tianwen0003 Which link cannot be opened exactly?

@tianwen0003
Copy link
Author

@pcarbo
Copy link
Member

pcarbo commented Jun 1, 2022

@stephens999
Copy link
Contributor

Actually, the signals in the regions are not weak, instead, the signals are very strong, the nominal p-values are range from 8.134e-05 to 3.09152e-50.

then PIP should not be 0 with estimate_residual_var = FALSE. Something is wrong. Can you try with L=1?

Do you mean to use estimate_residual_variance = FALSE for all runs or the runs reported error?

When using summary data (susie_rss) the recommendation is always FALSE.
When using individual level data (susie) the recommendation is TRUE
(And these are the software defaults).

@tianwen0003
Copy link
Author

then PIP should not be 0 with estimate_residual_var = FALSE. Something is wrong. Can you try with L=1?

Sorry, I checked the results, L=10 and L=1 did get non-zero PIP and 95% credible set.

When using summary data (susie_rss) the recommendation is always FALSE.

I found in the guide from https://stephenslab.github.io/susieR/articles/finemapping_summary_statistics.html, the suggestion is set estimate_residual_var = TRUE when use in-sample LD matrix, which is the data type I have. so, in my pipeline, I should set it to TRUE, right?

@stephens999
Copy link
Contributor

Hi

I just got the same error(2 out of 3000 runs) as #90 when did fine-mapping by susie_rss function.

Error in susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) * : Estimating residual variance failed: the estimated value is negative Calls: susie_rss -> susie_suff_stat Execution halted

when set estimate_residual_variance = FALSE, the function run well but all pip are zero

Here, I provide the all files for you to test ENSG00000149084.7.vcf.gz

ENSG00000149084.7_fastqtl.zip In this file, the second column is the "slope" from fastqtl, the thrid column is the "slope_se" from fastqtl

ENSG00000149084.7_ldMatrix.zip This is the ld matrix calculated from plink --r

the sample number n = 204

var_y = 0.960491

can you provide the code you run to apply susie to these input files so we can reproduce the problem?

@tianwen0003
Copy link
Author

can you provide the code you run to apply susie to these input files so we can reproduce the problem?

Sure, the code and input files are follows:
run_susieR.zip
ENSG00000149084.7.data.zip
ENSG00000149084.7.snpList.zip
ENSG00000149084.7.ld.zip

@gaow
Copy link
Member

gaow commented Jun 10, 2022

I'm in the middle of something else and have to run now ... but here is what I have so far:

library(stringr)
library(susieR)
set.seed(1)
gene <- c("ENSG00000149084.7")

print(gene)
qtlResult <- read.table(file = str_c(gene,".data"),sep = "\t",header = F)
ldMatrix <- as.matrix(read.table(file = str_c(gene,".ld"),sep = " ",header = F))
ldMatrix <- ldMatrix[1:dim(ldMatrix)[1],1:dim(ldMatrix)[1]]
expression_var <- 0.960491
sampleSize <- 204
fitted_rss1 <- susie_rss(bhat = qtlResult$V2,shat = qtlResult$V3,
                         n = sampleSize,R = ldMatrix,
                         var_y = expression_var,
                         L = 10,
                         estimate_residual_variance = TRUE)

I get

WARNING: XtX is not symmetric; forcing XtX to be symmetric by replacing XtX with (XtX + t(XtX))/2

and negative estimate for residual variance. Then I tried:

isSymmetric(ldMatrix) # FALSE
krig_res = kriging_rss(qtlResult$V2/qtlResult$V3, ldMatrix, sampleSize)
print(krig_res$plot)

I get

WARNING: The matrix R is not positive semidefinite. Negative eigenvalues are set to zero.

image

Certainly something is wrong with the LD matrix that is supposed to match the genotype; however there is no large difference between observed and expected z-scores. I dont think we have your genotype data to look further into this discrepency.

@pcarbo
Copy link
Member

pcarbo commented Jun 10, 2022

@gaow @zouyuxin Would you recommend that they try using the methods suggested in the susie-rss bioRxiv paper to adjust their LD matrix?

@stephens999
Copy link
Contributor

The report was that there were problems (PIP=0) even with L=1, for which LD should be irrelevant. So can we check L=1?

@zouyuxin
Copy link
Member

There is one CS using L=1, the estimated residual variance is 0.291. The CS contains 24 variants (perfectly correlated), each with PIP 0.04.
With L=3, it finds 2 CSs, and the estimated residual variance is 0.242. The correlation between the top variant in each CS is 0.72. The variants in CS1 have z scores -21.55285, and the variants in CS2 have z scores around -8.
I get the negative estimated residual variance error as I increase L = 4.

@zouyuxin
Copy link
Member

zouyuxin commented Jun 10, 2022

With estimated residual variance = FALSE, we identify 1 CS with L = 1 - 10.

@pcarbo the estimated regularization parameter lambda is 0.0027, the adjusted one also gives an error at L = 4 (estimate_residual_variance = TRUE). The smallest eigenvalue for the LD matrix is -1.6. The LD matrix is from plink, and it's a known issue that there are rounding and numerical errors.

@pcarbo @gaow @tianwen0003 Do you know whether the covariate effects are removed from the genotypes in fastqtl?

@gaow
Copy link
Member

gaow commented Jun 10, 2022

@zouyuxin good point about the covariates. The genotype data does not remove the covariates from it -- that'd be one reason the kriging plot is not perfect because the z score and LD can be a bit off because of the covarites is not removed from genotypes. In any case, the data used in susie_rss here is not exactly the "in sample LD" by our definition.

@stephens999
Copy link
Contributor

stephens999 commented Jun 10, 2022 via email

@stephens999
Copy link
Contributor

stephens999 commented Jun 10, 2022 via email

@gaow
Copy link
Member

gaow commented Jun 10, 2022

The original report was that the Pips were all 0 in that case.

it seems a false alarm according to #162 (comment)

@zouyuxin
Copy link
Member

So with estimate residual variance =False is everything ok?

yes, there is no error or pip all =0 when estimate residual variance = FALSE.

@stephens999
Copy link
Contributor

stephens999 commented Jun 10, 2022

Ok, good then it seems all is well. Maybe we should warn against using estimate residual variance =True with summary data, even with in sample ld? Or just not suggest it.... It seems difficult to be precise about the circumstances when it is ok to use?

@tianwen0003
Copy link
Author

@pcarbo @gaow @tianwen0003 Do you know whether the covariate effects are removed from the genotypes in fastqtl?

We used the PC1 and PC2 of genotype as covariants for fastqtl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants