diff --git a/docs/404.html b/docs/404.html index d61fa67..3af3fe8 100644 --- a/docs/404.html +++ b/docs/404.html @@ -52,7 +52,7 @@
vignettes/AncestryCheck.Rmd
AncestryCheck.Rmd
vignettes/Genomes1000.Rmd
Genomes1000.Rmd
vignettes/HapMap.Rmd
HapMap.Rmd
vignettes/plinkQC.Rmd
plinkQC.Rmd
For perIndividualQC, one simply specifies the directory where the data is stored (qcdir) and the prefix of the plink files (i.e. prefix.bim, prefix.bed, prefix.fam). In addition, the names of the files containing information about the reference population and the merged dataset used in check_ancestry have to be provided: refSamplesFile, refColorsFile and prefixMergedDataset. Per default, all quality control checks will be conducted.
perIndividualQC
check_ancestry
In addition to running each check, perIndividualQC writes a list of all fail individual IDs to the qcdir. These IDs will be removed in the computation of the perMarkerQC. If the list is not present, perMarkerQC will send a message about conducting the quality control on the entire dataset.
perMarkerQC
NB: To reduce the data size of the example data in plinkQC, data.genome has already been reduced to the individuals that are related. Thus the relatedness plots in C only show counts for related individuals only.
plinkQC
NB: To demonstrate the results of the ancestry check, the required eigenvector file of the combined study and reference datasets have been precomputed and for the purpose of this example will be copied to the qcdir. In practice, the qcdir will often be the same as the indir and this step will not be required.
qcdir
indir
system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'), qcdir))
overviewperIndividualQC depicts overview plots of quality control failures and the intersection of quality control failures with ancestry exclusion.
overviewperIndividualQC
overview_individuals <- overviewPerIndividualQC(fail_individuals, interactive=TRUE)
Depending on the future use of the genotypes, it might required to remove any related individuals from the study. Related individuals can be identified by their proporting of shared alleles at the genotyped markers (identity by descend, IBD). Standardly, individuals with second-degree relatedness or higher will be excluded. Identifying related individuals is implemented in check_relatedness. It finds pairs of samples whose proportion of IBD is larger than the specified highIBDTh. Subsequently, for pairs of individual that do not have additional relatives in the dataset, the individual with the greater genotype missingness rate is selected and returned as the individual failing the relatedness check. For more complex family structures, the unrelated individuals per family are selected (e.g. in a parents-offspring trio, the offspring will be marked as fail, while the parents will be kept in the analysis).
check_relatedness
exclude_relatedness <- check_relatedness(indir=indir, qcdir=qcdir, name=name, interactive=TRUE, path2plink=path2plink)
checkPlink()
perIndividualQC()
Check PLINK software access
Quality control for all individuals in plink-dataset
check_ancestry()
perMarkerQC()
Identification of individuals of divergent ancestry
Quality control for all markers in plink-dataset
check_het_and_miss()
overviewPerIndividualQC()
Identification of individuals with outlying missing genotype or -heterozygosity rates
Overview of per sample QC
check_hwe()
overviewPerMarkerQC()
Identification of SNPs showing a significant deviation from Hardy-Weinberg- -equilibrium (HWE)
Overview of per marker QC
check_maf()
cleanData()
Identification of SNPs with low minor allele frequency
Create plink dataset with individuals and markers passing quality control
Functions for step-by-step per-individual quality control
check_sex()
Identification of individuals with discordant sex information
Identification of individuals with outlying missing genotype or +heterozygosity rates
check_snp_missingness()
Identification of SNPs with high missingness rate
Functions for step-by-step per-marker quality control
evaluate_check_ancestry()
Evaluate results from PLINK PCA on combined study and reference data
evaluate_check_het_and_miss()
Evaluate results from PLINK missing genotype and heterozygosity rate check.
Identification of SNPs showing a significant deviation from Hardy-Weinberg- +equilibrium (HWE)
Helper functions for step-by-step per-individual quality control: accesible to the user, but recommended use via per-individual check_* functions.
evaluate_check_relatedness()
run_check_sex()
Evaluate results from PLINK IBD estimation.
Run PLINK sexcheck
run_check_heterozygosity()
Run PLINK heterozygosity rate calculation
run_check_missingness()
Run PLINK missingness rate calculation
run_check_relatedness()
Run PLINK IBD estimation
relatednessFilter()
Remove related individuals while keeping maximum number of individuals
testNumerics()
Test lists for different properties of numerics