Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error running relateness check #3

Closed
blueskypie opened this issue Jun 20, 2019 · 11 comments
Closed

error running relateness check #3

blueskypie opened this issue Jun 20, 2019 · 11 comments

Comments

@blueskypie
Copy link

blueskypie commented Jun 20, 2019

Hi,
Thanks for make the package! I'm getting error whiling running on some data. Here is the command

perIndividualQC(indir = "/DATA/ProcessedData",name = 'd2',dont.check_ancestry = T)

part of the output:

Run check_relatedness via plink --genome
PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /DATA/ProcessedData/d2.log.
Options in effect:
  --bfile /DATA/ProcessedData/d2
  --extract /DATA/ProcessedData/d2.prune.in
  --genome
  --maf 0.1
  --min 0.185
  --out /DATA/ProcessedData/d2

515685 MB RAM detected; reserving 257842 MB for main workspace.
13440095 variants loaded from .bim file.
40 people (19 males, 21 females) loaded from .fam.
40 phenotype values loaded from .fam.
--extract: 1214304 variants remaining.
Using up to 95 threads (change this with --threads).
Before main variant filters, 40 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Warning: 9887 het. haploid genotypes present (see
/DATA/ProcessedData/d2.hh ); many
commands treat these as missing.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.266382.
256255 variants removed due to minor allele threshold(s)
(--maf/--max-maf/--mac/--max-mac).
958049 variants and 40 people pass filters and QC.
Among remaining phenotypes, 23 are cases and 17 are controls.
Excluding 41709 variants on non-autosomes from IBD calculation.
IBD calculations complete.  
Finished writing
/DATA/ProcessedData/d2.genome .
Identification of related individuals
Error in if (one) failID1 <- pair[1] : argument is of length zero
@HannahVMeyer
Copy link
Member

could you send a sessionInfo() please? Could you share your dataset with me (confidentially)?

@blueskypie
Copy link
Author

Thanks for the reply! where can I upload the data?

@HannahVMeyer
Copy link
Member

HannahVMeyer commented Jun 21, 2019 via email

@blueskypie
Copy link
Author

I sent the link of the data to your work email. Thanks!

@HannahVMeyer
Copy link
Member

I've received the data. Could you please send your sessionInfo()?

@blueskypie
Copy link
Author

sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /usr/lib64/libblas.so.3.4.2
LAPACK: /site/ne/app/x86_64/R/v3.5.0/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] formattable_0.2.0.1 plinkQC_0.2.2
[3] SNPRelate_1.16.0 gdsfmt_1.18.1
[5] ggpubr_0.2.1 officer_0.3.2
[7] magrittr_1.5 flextable_0.5.4
[9] shiny_1.3.2 gridExtra_2.3
[11] annotate_1.60.0 XML_3.98-1.16
[13] multcomp_1.4-8 TH.data_1.0-10
[15] MASS_7.3-51.1 survival_2.43-3
[17] mvtnorm_1.0-8 lme4_1.1-19
[19] Matrix_1.2-14 statVisual_1.1.8
[21] affy_1.60.0 pvca_1.22.0
[23] GGally_1.4.0 openxlsx_4.1.0
[25] reshape2_1.4.3 dplyr_0.8.1
[27] plyr_1.8.4 plotly_4.8.0
[29] ggplot2_3.2.0 stringr_1.4.0
[31] captioner_2.2.3 kableExtra_1.1.0
[33] xtable_1.8-3 pander_0.6.3
[35] knitr_1.21 bookdown_0.9
[37] hgu133plus2.db_3.2.3 org.Hs.eg.db_3.7.0
[39] AnnotationDbi_1.44.0 IRanges_2.16.0
[41] S4Vectors_0.20.1 Biobase_2.42.0
[43] BiocGenerics_0.28.0 arrayQualityMetrics_3.38.0

loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 RSQLite_2.1.1 htmlwidgets_1.3
[4] beadarray_2.32.0 grid_3.5.0 pROC_1.13.0
[7] munsell_0.5.0 codetools_0.2-16 preprocessCore_1.44.0
[10] withr_2.1.2 colorspace_1.4-1 highr_0.7
[13] uuid_0.1-2 rstudioapi_0.10 setRNG_2015.7-1
[16] ggsignif_0.5.0 labeling_0.3 optparse_1.6.2
[19] GenomeInfoDbData_1.2.0 hwriter_1.3.2 bit64_0.9-7
[22] pheatmap_1.0.12 xfun_0.4 randomForest_4.6-14
[25] R6_2.4.0 GenomeInfoDb_1.18.1 illuminaio_0.24.0
[28] gridSVG_1.6-1 bitops_1.0-6 reshape_0.8.8
[31] assertthat_0.2.1 promises_1.0.1 scales_1.0.0
[34] nnet_7.3-12 gtable_0.2.0 Cairo_1.5-11
[37] sandwich_2.5-0 rlang_0.3.4 genefilter_1.64.0
[40] splines_3.5.0 lazyeval_0.2.2 acepack_1.4.1
[43] checkmate_1.9.1 BiocManager_1.30.4 yaml_2.2.0
[46] backports_1.1.3 httpuv_1.5.1 Hmisc_4.1-1
[49] tools_3.5.0 affyio_1.52.0 gplots_3.0.1
[52] RColorBrewer_1.1-2 ggdendro_0.1-20 Rcpp_1.0.1
[55] base64enc_0.1-3 zlibbioc_1.28.0 purrr_0.3.2
[58] RCurl_1.95-4.12 rpart_4.1-13 openssl_1.4
[61] cowplot_0.9.4 zoo_1.8-4 ggrepel_0.8.0
[64] cluster_2.0.7-1 factoextra_1.0.5 data.table_1.12.0
[67] forestplot_1.9 hms_0.4.2 mime_0.7
[70] evaluate_0.12 gcrma_2.54.0 compiler_3.5.0
[73] tibble_2.1.3 rpart.plot_3.0.7 KernSmooth_2.23-15
[76] crayon_1.3.4 minqa_1.2.4 htmltools_0.3.6
[79] later_0.8.0 Formula_1.2-3 tidyr_0.8.2
[82] DBI_1.0.0 sys_2.1 getopt_1.20.3
[85] readr_1.3.1 vsn_3.50.0 gdata_2.18.0
[88] GenomicRanges_1.34.0 pkgconfig_2.0.2 foreign_0.8-71
[91] xml2_1.2.0 foreach_1.4.4 BeadDataPackR_1.34.0
[94] affyPLM_1.58.0 webshot_0.5.1 XVector_0.22.0
[97] rvest_0.3.2 digest_0.6.19 Biostrings_2.50.2
[100] rmarkdown_1.11 base64_2.0 htmlTable_1.13.1
[103] gdtools_0.1.7 gtools_3.8.1 nloptr_1.2.1
[106] nlme_3.1-137 jsonlite_1.6 viridisLite_0.3.0
[109] askpass_1.1 limma_3.38.3 pillar_1.4.1
[112] lattice_0.20-35 httr_1.4.0 glue_1.3.1
[115] zip_1.0.0 UpSetR_1.4.0 iterators_1.0.10
[118] glmnet_2.0-16 bit_1.1-14 stringi_1.4.3
[121] blob_1.1.1 latticeExtra_0.6-28 caTools_1.17.1.1
[124] memoise_1.1.0

@HannahVMeyer
Copy link
Member

Thank you for pointing me to this issue: in your example data set, there are no related individuals, all IBD estimates are 0. When trying to select non-related individuals plinkQV failed as it was lacking a check for this scenario. Can you try again by installing the github version of the package via:
devtools::install_github("HannahVMeyer/plinkQC")

@blueskypie
Copy link
Author

I used remove.packages("plinkQC"); devtools::install_github("HannahVMeyer/plinkQC") but still got the same error. I see the version is still 0.2.2. Is the version number updated? If not, can you do that so I can be sure that I'm using the new version?

@HannahVMeyer
Copy link
Member

I've updated the version, can you check again. Thanks!

@blueskypie
Copy link
Author

Thanks, working now! However, I'm a bit surprised that "all IBD estimates are 0". If I use plink --bfile d2 --rel-cutoff --out d3, it does say two samples are related. But I'm not sure how the values for rel-cutoff and PI_HAT (I guess the highIBDTh in your code) are matched.

@HannahVMeyer
Copy link
Member

rel-cutoff estimates the relationship based on a genetic relationship matrix that build using all genetic variants that you provide and default threshold of 0.025 as suggested in the original GTCA paper. check_relatedness uses IBD estimation which leads (in my opinion) to more interpretable values for the level of relatedness, with eg 1 being identical, 0.5 first degree relatives etc. Either way, the example data you provided has a very poor genotyping rate of 0.308368, a high degree of missingness per individual and therefore many low minor allele frequencies. As minor allele frequency and missingness are considered respectively in estimating relatedness and selecting the individual of a pair of related individuals to keep, the two individuals marked as related with rel-cutoff were not detected with the highIBD filter. I have updated the documentation making this process clearer and added user input of MAF as an option (8ac935b). I am closing this issue now as the initial bug has been resolved.

@HannahVMeyer HannahVMeyer mentioned this issue Sep 25, 2019
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants