Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mismatch between gene2pheno and WB-DGN models #57

Closed
kevinVervier opened this issue Aug 14, 2017 · 6 comments
Closed

mismatch between gene2pheno and WB-DGN models #57

kevinVervier opened this issue Aug 14, 2017 · 6 comments

Comments

@kevinVervier
Copy link

Hi,
I used the 'gene2pheno' database for A2LD1 gene with WB-DGN-0.5 models. It says that 7 SNPs are in the PredictDB model, and 5 were used. However, when I downloaded the predictDB files, I found 54 SNPs for the same gene. On top of that, the R^2 values do not match between the 2 databases.

Could you help me to figure out what was your strategy to filter the SNPs ?
Thanks,
Kevin Vervier, PhD

@hakyim
Copy link
Contributor

hakyim commented Aug 14, 2017 via email

@kevinVervier
Copy link
Author

In the current version (downloaded yesterday), when I load the models using SQLITE I correctly found 7 SNPs.
But when I grep 'A2LD1' in 'DGN-WB.txt', I have 28 lines:

A2LD1 rs9557677 rs9557677 0.284777391428
A2LD1 rs11842969 rs9557677 0.0127323707159
A2LD1 rs11842969 rs11842969 0.498562410398
A2LD1 rs11842969 rs7328395 0.49428528431
A2LD1 rs11842969 rs9518128 -0.0616341789898
A2LD1 rs4772303 rs9557677 -0.0118413027809
A2LD1 rs4772303 rs11842969 0.390062018328
A2LD1 rs4772303 rs4772303 0.503172201849
A2LD1 rs4772303 rs7328395 0.390687746034
A2LD1 rs4772303 rs9518128 -0.0107522197492
A2LD1 rs7328395 rs9557677 0.0132590908731
A2LD1 rs7328395 rs7328395 0.493976380759
A2LD1 rs7328395 rs9518128 -0.063404433954
A2LD1 rs837322 rs9557677 -0.0132432496654
A2LD1 rs837322 rs11842969 -0.10503908818
A2LD1 rs837322 rs4772303 -0.232010328467
A2LD1 rs837322 rs7328395 -0.107486554775
A2LD1 rs837322 rs837322 0.416687128227
A2LD1 rs837322 rs3783183 0.287509999762
A2LD1 rs837322 rs9518128 -0.0713131569151
A2LD1 rs3783183 rs9557677 -0.00247518870839
A2LD1 rs3783183 rs11842969 -0.116032886347
A2LD1 rs3783183 rs4772303 -0.219143307486
A2LD1 rs3783183 rs7328395 -0.119612999295
A2LD1 rs3783183 rs3783183 0.521571764631
A2LD1 rs3783183 rs9518128 -0.0594401717187
A2LD1 rs9518128 rs9557677 -0.00339001845501
A2LD1 rs9518128 rs9518128 0.514823410137

I am probably confused by the 2 rsID columns. Could you please elaborate on what they contain ?

Also, in the model file (DGN) that I downloaded one year ago, gene model for A2LD1 had 54 SNPs. What could explain this difference ? Did you re-train the DGN-WB model since the publication?

query('select * from weights where gene = "A2LD1" ')
rsid gene weight ref_allele eff_allele pval N cis
1 rs10508048 A2LD1 2.467627e-03 C T NA NA NA
2 rs1055705 A2LD1 3.480362e-03 G A NA NA NA
3 rs11069419 A2LD1 -4.681995e-04 T C NA NA NA
4 rs11842969 A2LD1 -2.138703e-02 C T NA NA NA
5 rs1283142 A2LD1 -2.100454e-02 C A NA NA NA
6 rs1283211 A2LD1 4.202340e-03 C T NA NA NA
7 rs1298167 A2LD1 -1.615785e-02 A C NA NA NA
8 rs1335592 A2LD1 1.098255e-03 C T NA NA NA
9 rs1338040 A2LD1 9.362564e-04 T G NA NA NA
10 rs1572329 A2LD1 -7.501915e-03 A G NA NA NA
11 rs1572641 A2LD1 3.018157e-03 C T NA NA NA
12 rs1711178 A2LD1 -2.228864e-02 C T NA NA NA
13 rs17491680 A2LD1 9.419314e-03 T C NA NA NA
14 rs17578011 A2LD1 5.794283e-03 G A NA NA NA
15 rs17580625 A2LD1 1.904165e-03 T G NA NA NA
16 rs17676626 A2LD1 1.371448e-02 G A NA NA NA
17 rs1886030 A2LD1 1.687703e-03 G A NA NA NA
18 rs1886031 A2LD1 -4.530280e-03 A G NA NA NA
19 rs2297701 A2LD1 -4.248479e-03 C T NA NA NA
20 rs2490529 A2LD1 -1.340083e-02 T C NA NA NA
21 rs2761168 A2LD1 -7.759965e-03 A C NA NA NA
22 rs2765319 A2LD1 -6.055511e-03 C T NA NA NA
23 rs2783224 A2LD1 1.845512e-02 A G NA NA NA
24 rs2803214 A2LD1 1.536861e-02 A G NA NA NA
25 rs2806302 A2LD1 2.102273e-02 T G NA NA NA
26 rs3783183 A2LD1 3.824820e-02 A G NA NA NA
27 rs4772303 A2LD1 -1.684863e-01 C T NA NA NA
28 rs4772344 A2LD1 3.695509e-03 T G NA NA NA
29 rs4772345 A2LD1 2.360825e-03 A G NA NA NA
30 rs554997 A2LD1 2.232654e-02 A C NA NA NA
31 rs7328395 A2LD1 -9.405746e-02 C T NA NA NA
32 rs7337904 A2LD1 1.122966e-02 C T NA NA NA
33 rs767932 A2LD1 -1.568054e-02 T G NA NA NA
34 rs7982561 A2LD1 -1.180468e-02 T C NA NA NA
35 rs7997419 A2LD1 5.009783e-03 T G NA NA NA
36 rs837290 A2LD1 -3.199241e-02 G A NA NA NA
37 rs885304 A2LD1 -8.648958e-03 C T NA NA NA
38 rs9300647 A2LD1 -7.434882e-03 G T NA NA NA
39 rs9513770 A2LD1 -1.560461e-02 T C NA NA NA
40 rs9513774 A2LD1 -1.143146e-02 G A NA NA NA
41 rs9513781 A2LD1 -4.452036e-03 G T NA NA NA
42 rs9513812 A2LD1 3.676916e-04 G A NA NA NA
43 rs9518103 A2LD1 -6.774604e-03 T C NA NA NA
44 rs9518107 A2LD1 -4.820980e-03 C T NA NA NA
45 rs9518128 A2LD1 -1.499361e-02 C T NA NA NA
46 rs9518361 A2LD1 -1.114544e-02 C T NA NA NA
47 rs9554711 A2LD1 1.187383e-02 A G NA NA NA
48 rs9554712 A2LD1 -8.682582e-05 G A NA NA NA
49 rs9557286 A2LD1 1.092076e-02 A G NA NA NA
50 rs9557474 A2LD1 -3.129581e-03 G T NA NA NA
51 rs9557499 A2LD1 -1.075399e-02 G A NA NA NA
52 rs9557559 A2LD1 1.121800e-02 C T NA NA NA
53 rs9557677 A2LD1 -2.252316e-02 G A NA NA NA
54 rs972366 A2LD1 6.944471e-03 C T NA NA NA

@hakyim
Copy link
Contributor

hakyim commented Aug 14, 2017 via email

@hakyim
Copy link
Contributor

hakyim commented Aug 14, 2017 via email

@kevinVervier
Copy link
Author

According to your paper, it was done using HapMap: "To reduce computational burden in the application to WTCCC data, we used models developed on the HapMap Phase 2 subset of SNPs.".
And based on the current name of the file, it is still related to HapMap ('DGN-HapMap-2015' db).
So, if I used the old model file that I mentioned, what confidence can I put in the predictions ?

@hakyim
Copy link
Contributor

hakyim commented Aug 14, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants