-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mismatch between gene2pheno and WB-DGN models #57
Comments
Hi Kevin,
I double checked the DGN db model for A2LD1 and it does have 7 SNPs in the
model. Perhaps were you looking at the WB model from GTEx?
Haky
…On Mon, Aug 14, 2017 at 10:37 AM, Kevin Vervier ***@***.***> wrote:
Hi,
I used the 'gene2pheno' database for A2LD1 gene with WB-DGN-0.5 models. It
says that 7 SNPs are in the PredictDB model, and 5 were used. However, when
I downloaded the predictDB files, I found 54 SNPs for the same gene. On top
of that, the R^2 values do not match between the 2 databases.
Could you help me to figure out what was your strategy to filter the SNPs ?
Thanks,
Kevin Vervier, PhD
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#57>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC2ouZUQYO1Jm3KuJ62AVpm9ICKuiwWqks5sYGnHgaJpZM4O2iWh>
.
|
In the current version (downloaded yesterday), when I load the models using SQLITE I correctly found 7 SNPs. A2LD1 rs9557677 rs9557677 0.284777391428 I am probably confused by the 2 rsID columns. Could you please elaborate on what they contain ? Also, in the model file (DGN) that I downloaded one year ago, gene model for A2LD1 had 54 SNPs. What could explain this difference ? Did you re-train the DGN-WB model since the publication?
|
The first file contains the covariances of SNPs in models.
I think we had a bug where we were using standardized SNP dosages. I don't
remember other changes at the moment.
DGN models using GTEx pipelines from raw data is in the works although not
at the top of our priority list at the moment.
Haky
…On Mon, Aug 14, 2017 at 1:50 PM, Kevin Vervier ***@***.***> wrote:
In the current version (downloaded yesterday), when I load the models
using SQLITE I correctly found 7 SNPs.
But when I grep 'A2LD1' in 'DGN-WB.txt', I have 28 lines:
A2LD1 rs9557677 rs9557677 0.284777391428
A2LD1 rs11842969 rs9557677 0.0127323707159
A2LD1 rs11842969 rs11842969 0.498562410398
A2LD1 rs11842969 rs7328395 0.49428528431
A2LD1 rs11842969 rs9518128 -0.0616341789898
A2LD1 rs4772303 rs9557677 -0.0118413027809
A2LD1 rs4772303 rs11842969 0.390062018328
A2LD1 rs4772303 rs4772303 0.503172201849
A2LD1 rs4772303 rs7328395 0.390687746034
A2LD1 rs4772303 rs9518128 -0.0107522197492
A2LD1 rs7328395 rs9557677 0.0132590908731
A2LD1 rs7328395 rs7328395 0.493976380759
A2LD1 rs7328395 rs9518128 -0.063404433954
A2LD1 rs837322 rs9557677 -0.0132432496654
A2LD1 rs837322 rs11842969 -0.10503908818
A2LD1 rs837322 rs4772303 -0.232010328467
A2LD1 rs837322 rs7328395 -0.107486554775
A2LD1 rs837322 rs837322 0.416687128227
A2LD1 rs837322 rs3783183 0.287509999762
A2LD1 rs837322 rs9518128 -0.0713131569151
A2LD1 rs3783183 rs9557677 -0.00247518870839
A2LD1 rs3783183 rs11842969 -0.116032886347 <(603)%20288-6347>
A2LD1 rs3783183 rs4772303 -0.219143307486
A2LD1 rs3783183 rs7328395 -0.119612999295
A2LD1 rs3783183 rs3783183 0.521571764631
A2LD1 rs3783183 rs9518128 -0.0594401717187
A2LD1 rs9518128 rs9557677 -0.00339001845501
A2LD1 rs9518128 rs9518128 0.514823410137
I am probably confused by the 2 rsID columns. Could you please elaborate
on what they contain ?
Also, in the model file (DGN) that I downloaded one year ago, gene model
for A2LD1 had 54 SNPs. What could explain this difference ? Did you
re-train the DGN-WB model since the publication?
query('select * from weights where gene = "A2LD1" ')
rsid gene weight ref_allele eff_allele pval N cis
1 rs10508048 A2LD1 2.467627e-03 C T NA NA NA
2 rs1055705 A2LD1 3.480362e-03 G A NA NA NA
3 rs11069419 A2LD1 -4.681995e-04 T C NA NA NA
4 rs11842969 A2LD1 -2.138703e-02 C T NA NA NA
5 rs1283142 A2LD1 -2.100454e-02 C A NA NA NA
6 rs1283211 A2LD1 4.202340e-03 C T NA NA NA
7 rs1298167 A2LD1 -1.615785e-02 A C NA NA NA
8 rs1335592 A2LD1 1.098255e-03 C T NA NA NA
9 rs1338040 A2LD1 9.362564e-04 T G NA NA NA
10 rs1572329 A2LD1 -7.501915e-03 A G NA NA NA
11 rs1572641 A2LD1 3.018157e-03 C T NA NA NA
12 rs1711178 A2LD1 -2.228864e-02 C T NA NA NA
13 rs17491680 A2LD1 9.419314e-03 T C NA NA NA
14 rs17578011 A2LD1 5.794283e-03 G A NA NA NA
15 rs17580625 A2LD1 1.904165e-03 T G NA NA NA
16 rs17676626 A2LD1 1.371448e-02 G A NA NA NA
17 rs1886030 A2LD1 1.687703e-03 G A NA NA NA
18 rs1886031 A2LD1 -4.530280e-03 A G NA NA NA
19 rs2297701 A2LD1 -4.248479e-03 C T NA NA NA
20 rs2490529 A2LD1 -1.340083e-02 T C NA NA NA
21 rs2761168 A2LD1 -7.759965e-03 A C NA NA NA
22 rs2765319 A2LD1 -6.055511e-03 C T NA NA NA
23 rs2783224 A2LD1 1.845512e-02 A G NA NA NA
24 rs2803214 A2LD1 1.536861e-02 A G NA NA NA
25 rs2806302 A2LD1 2.102273e-02 T G NA NA NA
26 rs3783183 A2LD1 3.824820e-02 A G NA NA NA
27 rs4772303 A2LD1 -1.684863e-01 C T NA NA NA
28 rs4772344 A2LD1 3.695509e-03 T G NA NA NA
29 rs4772345 A2LD1 2.360825e-03 A G NA NA NA
30 rs554997 A2LD1 2.232654e-02 A C NA NA NA
31 rs7328395 A2LD1 -9.405746e-02 C T NA NA NA
32 rs7337904 A2LD1 1.122966e-02 C T NA NA NA
33 rs767932 A2LD1 -1.568054e-02 T G NA NA NA
34 rs7982561 A2LD1 -1.180468e-02 T C NA NA NA
35 rs7997419 A2LD1 5.009783e-03 T G NA NA NA
36 rs837290 A2LD1 -3.199241e-02 G A NA NA NA
37 rs885304 A2LD1 -8.648958e-03 C T NA NA NA
38 rs9300647 A2LD1 -7.434882e-03 G T NA NA NA
39 rs9513770 A2LD1 -1.560461e-02 T C NA NA NA
40 rs9513774 A2LD1 -1.143146e-02 G A NA NA NA
41 rs9513781 A2LD1 -4.452036e-03 G T NA NA NA
42 rs9513812 A2LD1 3.676916e-04 G A NA NA NA
43 rs9518103 A2LD1 -6.774604e-03 T C NA NA NA
44 rs9518107 A2LD1 -4.820980e-03 C T NA NA NA
45 rs9518128 A2LD1 -1.499361e-02 C T NA NA NA
46 rs9518361 A2LD1 -1.114544e-02 C T NA NA NA
47 rs9554711 A2LD1 1.187383e-02 A G NA NA NA
48 rs9554712 A2LD1 -8.682582e-05 G A NA NA NA
49 rs9557286 A2LD1 1.092076e-02 A G NA NA NA
50 rs9557474 A2LD1 -3.129581e-03 G T NA NA NA
51 rs9557499 A2LD1 -1.075399e-02 G A NA NA NA
52 rs9557559 A2LD1 1.121800e-02 C T NA NA NA
53 rs9557677 A2LD1 -2.252316e-02 G A NA NA NA
54 rs972366 A2LD1 6.944471e-03 C T NA NA NA
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC2ouTSGjcw1-s3-LabClaLWn7aLU8oJks5sYJcSgaJpZM4O2iWh>
.
|
Also, I don't remember whether we generated prediction models with 1000G
snpset for DGN or not. That could be another difference.
…On Mon, Aug 14, 2017 at 2:09 PM, Hae Kyung Im ***@***.***> wrote:
The first file contains the covariances of SNPs in models.
I think we had a bug where we were using standardized SNP dosages. I don't
remember other changes at the moment.
DGN models using GTEx pipelines from raw data is in the works although not
at the top of our priority list at the moment.
Haky
On Mon, Aug 14, 2017 at 1:50 PM, Kevin Vervier ***@***.***>
wrote:
> In the current version (downloaded yesterday), when I load the models
> using SQLITE I correctly found 7 SNPs.
> But when I grep 'A2LD1' in 'DGN-WB.txt', I have 28 lines:
>
> A2LD1 rs9557677 rs9557677 0.284777391428
> A2LD1 rs11842969 rs9557677 0.0127323707159
> A2LD1 rs11842969 rs11842969 0.498562410398
> A2LD1 rs11842969 rs7328395 0.49428528431
> A2LD1 rs11842969 rs9518128 -0.0616341789898
> A2LD1 rs4772303 rs9557677 -0.0118413027809
> A2LD1 rs4772303 rs11842969 0.390062018328
> A2LD1 rs4772303 rs4772303 0.503172201849
> A2LD1 rs4772303 rs7328395 0.390687746034
> A2LD1 rs4772303 rs9518128 -0.0107522197492
> A2LD1 rs7328395 rs9557677 0.0132590908731
> A2LD1 rs7328395 rs7328395 0.493976380759
> A2LD1 rs7328395 rs9518128 -0.063404433954
> A2LD1 rs837322 rs9557677 -0.0132432496654
> A2LD1 rs837322 rs11842969 -0.10503908818
> A2LD1 rs837322 rs4772303 -0.232010328467
> A2LD1 rs837322 rs7328395 -0.107486554775
> A2LD1 rs837322 rs837322 0.416687128227
> A2LD1 rs837322 rs3783183 0.287509999762
> A2LD1 rs837322 rs9518128 -0.0713131569151
> A2LD1 rs3783183 rs9557677 -0.00247518870839
> A2LD1 rs3783183 rs11842969 -0.116032886347 <(603)%20288-6347>
> A2LD1 rs3783183 rs4772303 -0.219143307486
> A2LD1 rs3783183 rs7328395 -0.119612999295
> A2LD1 rs3783183 rs3783183 0.521571764631
> A2LD1 rs3783183 rs9518128 -0.0594401717187
> A2LD1 rs9518128 rs9557677 -0.00339001845501
> A2LD1 rs9518128 rs9518128 0.514823410137
>
> I am probably confused by the 2 rsID columns. Could you please elaborate
> on what they contain ?
>
> Also, in the model file (DGN) that I downloaded one year ago, gene model
> for A2LD1 had 54 SNPs. What could explain this difference ? Did you
> re-train the DGN-WB model since the publication?
>
> query('select * from weights where gene = "A2LD1" ')
> rsid gene weight ref_allele eff_allele pval N cis
> 1 rs10508048 A2LD1 2.467627e-03 C T NA NA NA
> 2 rs1055705 A2LD1 3.480362e-03 G A NA NA NA
> 3 rs11069419 A2LD1 -4.681995e-04 T C NA NA NA
> 4 rs11842969 A2LD1 -2.138703e-02 C T NA NA NA
> 5 rs1283142 A2LD1 -2.100454e-02 C A NA NA NA
> 6 rs1283211 A2LD1 4.202340e-03 C T NA NA NA
> 7 rs1298167 A2LD1 -1.615785e-02 A C NA NA NA
> 8 rs1335592 A2LD1 1.098255e-03 C T NA NA NA
> 9 rs1338040 A2LD1 9.362564e-04 T G NA NA NA
> 10 rs1572329 A2LD1 -7.501915e-03 A G NA NA NA
> 11 rs1572641 A2LD1 3.018157e-03 C T NA NA NA
> 12 rs1711178 A2LD1 -2.228864e-02 C T NA NA NA
> 13 rs17491680 A2LD1 9.419314e-03 T C NA NA NA
> 14 rs17578011 A2LD1 5.794283e-03 G A NA NA NA
> 15 rs17580625 A2LD1 1.904165e-03 T G NA NA NA
> 16 rs17676626 A2LD1 1.371448e-02 G A NA NA NA
> 17 rs1886030 A2LD1 1.687703e-03 G A NA NA NA
> 18 rs1886031 A2LD1 -4.530280e-03 A G NA NA NA
> 19 rs2297701 A2LD1 -4.248479e-03 C T NA NA NA
> 20 rs2490529 A2LD1 -1.340083e-02 T C NA NA NA
> 21 rs2761168 A2LD1 -7.759965e-03 A C NA NA NA
> 22 rs2765319 A2LD1 -6.055511e-03 C T NA NA NA
> 23 rs2783224 A2LD1 1.845512e-02 A G NA NA NA
> 24 rs2803214 A2LD1 1.536861e-02 A G NA NA NA
> 25 rs2806302 A2LD1 2.102273e-02 T G NA NA NA
> 26 rs3783183 A2LD1 3.824820e-02 A G NA NA NA
> 27 rs4772303 A2LD1 -1.684863e-01 C T NA NA NA
> 28 rs4772344 A2LD1 3.695509e-03 T G NA NA NA
> 29 rs4772345 A2LD1 2.360825e-03 A G NA NA NA
> 30 rs554997 A2LD1 2.232654e-02 A C NA NA NA
> 31 rs7328395 A2LD1 -9.405746e-02 C T NA NA NA
> 32 rs7337904 A2LD1 1.122966e-02 C T NA NA NA
> 33 rs767932 A2LD1 -1.568054e-02 T G NA NA NA
> 34 rs7982561 A2LD1 -1.180468e-02 T C NA NA NA
> 35 rs7997419 A2LD1 5.009783e-03 T G NA NA NA
> 36 rs837290 A2LD1 -3.199241e-02 G A NA NA NA
> 37 rs885304 A2LD1 -8.648958e-03 C T NA NA NA
> 38 rs9300647 A2LD1 -7.434882e-03 G T NA NA NA
> 39 rs9513770 A2LD1 -1.560461e-02 T C NA NA NA
> 40 rs9513774 A2LD1 -1.143146e-02 G A NA NA NA
> 41 rs9513781 A2LD1 -4.452036e-03 G T NA NA NA
> 42 rs9513812 A2LD1 3.676916e-04 G A NA NA NA
> 43 rs9518103 A2LD1 -6.774604e-03 T C NA NA NA
> 44 rs9518107 A2LD1 -4.820980e-03 C T NA NA NA
> 45 rs9518128 A2LD1 -1.499361e-02 C T NA NA NA
> 46 rs9518361 A2LD1 -1.114544e-02 C T NA NA NA
> 47 rs9554711 A2LD1 1.187383e-02 A G NA NA NA
> 48 rs9554712 A2LD1 -8.682582e-05 G A NA NA NA
> 49 rs9557286 A2LD1 1.092076e-02 A G NA NA NA
> 50 rs9557474 A2LD1 -3.129581e-03 G T NA NA NA
> 51 rs9557499 A2LD1 -1.075399e-02 G A NA NA NA
> 52 rs9557559 A2LD1 1.121800e-02 C T NA NA NA
> 53 rs9557677 A2LD1 -2.252316e-02 G A NA NA NA
> 54 rs972366 A2LD1 6.944471e-03 C T NA NA NA
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#57 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AC2ouTSGjcw1-s3-LabClaLWn7aLU8oJks5sYJcSgaJpZM4O2iWh>
> .
>
|
According to your paper, it was done using HapMap: "To reduce computational burden in the application to WTCCC data, we used models developed on the HapMap Phase 2 subset of SNPs.". |
I would recommend using the latest model we provide on predictdb.org.
We did perform some comparison back then when we found the scaling bug. The
prediction performance in independent RNAseq data was not very different.
You could compare predicted expressions with both models and compare the
predictions.
Haky
…On Mon, Aug 14, 2017 at 2:34 PM, Kevin Vervier ***@***.***> wrote:
According to your paper, it was done using HapMap: "To reduce
computational burden in the application to WTCCC data, we used models
developed on the HapMap Phase 2 subset of SNPs.".
And based on the current name of the file, it is still related to HapMap
('DGN-HapMap-2015' db).
So, if I used the old model file that I mentioned, what confidence can I
put in the predictions ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC2ouTFaZ-RouFkHAKrcZ7p_ioWx3bRRks5sYKEsgaJpZM4O2iWh>
.
|
Hi,
I used the 'gene2pheno' database for A2LD1 gene with WB-DGN-0.5 models. It says that 7 SNPs are in the PredictDB model, and 5 were used. However, when I downloaded the predictDB files, I found 54 SNPs for the same gene. On top of that, the R^2 values do not match between the 2 databases.
Could you help me to figure out what was your strategy to filter the SNPs ?
Thanks,
Kevin Vervier, PhD
The text was updated successfully, but these errors were encountered: