Skip to content
Permalink
Browse files

Update links and build package

  • Loading branch information
HannahVMeyer committed Mar 12, 2020
1 parent e8afbb9 commit da987d8f225aa6aca0596b9c4f6a2484b102bdb6
@@ -25,7 +25,7 @@ functions easily accessible from within R and allows for automatic evaluation
of the results.

**plinkQC** generates a per-individual and per-marker quality control report.
A step-by-step guide on how to run these analyses can be found [here](https://hannahvmeyer.github.io/plinkQC/articles/plinkQC.html).
A step-by-step guide on how to run these analyses can be found [here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html).

Individuals and markers that fail the quality control can subsequently be
removed with **plinkQC** to generate a new, clean dataset.
@@ -1,4 +1,4 @@
## ----setup knitr, include = FALSE----------------------------------------
## ----setup knitr, include = FALSE---------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
@@ -20,6 +20,6 @@ knitr::opts_chunk$set(
# sep=""),
# interactive=TRUE)

## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
knitr::include_graphics("checkAncestry.png")

@@ -41,8 +41,8 @@ is provided with $plinkQC$ in `file.path(find.package('plinkQC'),'extdata')`.
## Download reference data
A suitable reference dataset should be downloaded and if necessary, re-formated
into PLINK format. Vignettes
['Processing HapMap III reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/HapMap.html) and
['Processing 1000Genomes reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/1000Genomes.html),
['Processing HapMap III reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/HapMap.html) and
['Processing 1000Genomes reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/1000Genomes.html),
show the download and processing of the HapMap phase III and 1000Genomes phase
III dataset, respectively. In this example, we will use the HammapIII data as
the reference dataset.
BIN -107 KB (70%) doc/AncestryCheck.pdf
Binary file not shown.
@@ -1,4 +1,4 @@
## ----setup knitr, include = FALSE----------------------------------------
## ----setup knitr, include = FALSE---------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
@@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
1000 Genomes study [@a1000Genomes2015],[@b1000Genomes2015] as a reference
dataset. Using the 1000 Genomes reference, population structure down to
large-scale continental ancestry can be detected. A step-by-step instruction on
how to conduct this ancestry analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
how to conduct this ancestry analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).


# Workflow
BIN +6.89 KB (100%) doc/Genomes1000.pdf
Binary file not shown.
@@ -1,4 +1,4 @@
## ----setup knitr, include = FALSE----------------------------------------
## ----setup knitr, include = FALSE---------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
@@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
HapMap study [@HapMap2005][@HapMap2007][@HapMap2010] as a reference dataset.
Using this reference, population structure down to large-scale continental
ancestry can be detected. A step-by-step instruction on how to conduct this
analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).


# Workflow
@@ -106,12 +106,22 @@ https://genome.ucsc.edu/cgi-bin/hgLiftOver and the appropriate liftover chain fr
zero-based [UCSC bed](https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
format.

Hapmap chromosome data is encoded numerically, with chrX represented by chr23,
and chrY as chr24. In order to match to data encoded by chrX and chrY, we will
have to rename these hapmap chromosomes. Converting to zero-based UCSC format
and re-coding chromosome codes can be achieved by:

```{bash prepare liftover, eval=FALSE}
awk '{print "chr" $1, $4 -1, $4, $2 }' $refdir/HapMapIII_NCBI36.bim | \
sed 's/chr23/chrX/' | sed 's/chr24/chrY/' > \
$refdir/HapMapIII_NCBI36.tolift
```

[Note: In the official HapMap release, chromosome codes described above, however
in the orignal download files (link above), no chr24 detected. I will keep this
line in for completeness, but note, when inspecting file that no chr24/chrY are
present.]

We use the liftOver tool and the UCSC bed formated annotation file together
with the appropriate chain file to do the lift over.

@@ -133,7 +143,7 @@ awk '{print $4, $3}' $refdir/HapMapIII_CGRCh37 > $refdir/HapMapIII_CGRCh37.pos
## Update the reference data
We can now use PLINK to extract the mappable variants from the old build and
update their position. After these steps, the HapMap III dataset can be used
for infering study ancestry as described in the corresponding [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
for infering study ancestry as described in the corresponding [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).
```{bash update annotation, eval=FALSE}
plink --bfile $refdir/HapMapIII_NCBI36 \
--extract $refdir/HapMapIII_CGRCh37.snps \
BIN +8.05 KB (100%) doc/HapMap.pdf
Binary file not shown.
@@ -1,22 +1,22 @@
## ----setup, include = FALSE----------------------------------------------
## ----setup, include = FALSE---------------------------------------------------
library(plinkQC)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)

## ----set parameters------------------------------------------------------
## ----set parameters-----------------------------------------------------------
package.dir <- find.package('plinkQC')
indir <- file.path(package.dir, 'extdata')
qcdir <- tempdir()
name <- 'data'
path2plink <- "/Users/hannah/bin/plink"

## ----copy files----------------------------------------------------------
## ----copy files---------------------------------------------------------------
system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
qcdir))

## ----individual QC, eval=FALSE, fig.height=12, fig.width=9--------------
## ----individual QC, eval=FALSE, fig.height=12, fig.width=9-------------------
# fail_individuals <- perIndividualQC(indir=indir, qcdir=qcdir, name=name,
# refSamplesFile=paste(indir, "/HapMap_ID2Pop.txt",
# sep=""),
@@ -31,41 +31,40 @@ system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
par(mfrow=c(2,1), las=1)
knitr::include_graphics("individualQC.png")

## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE--------
## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE-------------
# overview_individuals <- overviewPerIndividualQC(fail_individuals,
# interactive=TRUE)

## ----load overviewIndividualQC, out.width = "500px", echo=FALSE----------
## ----load overviewIndividualQC, out.width = "500px", echo=FALSE---------------
par(mfrow=c(2,1), las=1)
knitr::include_graphics("overviewQC.png")
knitr::include_graphics("overviewAncestryQC.png")

## ----marker QC, eval=FALSE-----------------------------------------------
## ----marker QC, eval=FALSE----------------------------------------------------
# fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name,
# path2plink=path2plink,
# verbose=TRUE, interactive=TRUE,
# showPlinkOutput=FALSE)

## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'----
## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'-------
par(mfrow=c(2,1), las=1)
knitr::include_graphics("markerQC.png")

## ----overview marker QC, eval=FALSE--------------------------------------
## ----overview marker QC, eval=FALSE-------------------------------------------
# overview_marker <- overviewPerMarkerQC(fail_markers, interactive=TRUE)

## ----load overviewMarkerQC, out.width = "500px", echo=FALSE--------------
## ----load overviewMarkerQC, out.width = "500px", echo=FALSE-------------------
par(mfrow=c(2,1), las=1)
knitr::include_graphics("overviewMarkerQC.png")

## ----clean data, eval=FALSE----------------------------------------------
## ----clean data, eval=FALSE---------------------------------------------------
# Ids <- cleanData(indir=indir, qcdir=qcdir, name=name, path2plink=path2plink,
# verbose=TRUE, showPlinkOutput=FALSE)

## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'------
## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'-----------
# fail_sex <- check_sex(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
# verbose=TRUE, path2plink=path2plink)

## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'----
## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'-------
knitr::include_graphics("checkSex.png")

## ----check het miss, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'----
@@ -93,10 +92,10 @@ knitr::include_graphics("checkRelatedness.png")
# path2plink=path2plink, run.check_ancestry = FALSE,
# interactive=TRUE)

## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
knitr::include_graphics("checkAncestry.png")

## ----check snp missing, eval=FALSE---------------------------------------
## ----check snp missing, eval=FALSE--------------------------------------------
# fail_snpmissing <- check_snp_missingness(indir=indir, qcdir=qcdir, name=name,
# interactive=TRUE,
# path2plink=path2plink,
@@ -105,17 +104,17 @@ knitr::include_graphics("checkAncestry.png")
## ----load snp missing, out.width = "500px", echo=FALSE, fig.align='center'----
knitr::include_graphics("snpmissingness.png")

## ----check hwe, eval=FALSE-----------------------------------------------
## ----check hwe, eval=FALSE----------------------------------------------------
# fail_hwe <- check_hwe(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
# path2plink=path2plink, showPlinkOutput=FALSE)

## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'-------
## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'------------
knitr::include_graphics("hwe.png")

## ----check maf, eval=FALSE-----------------------------------------------
## ----check maf, eval=FALSE----------------------------------------------------
# fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
# path2plink=path2plink, showPlinkOutput=FALSE)

## ----load maf, out.width = "500px", echo=FALSE, fig.align='center'------
## ----load maf, out.width = "500px", echo=FALSE, fig.align='center'-----------
knitr::include_graphics("maf.png")

@@ -115,6 +115,10 @@ individual IDs to the qcdir. These IDs will be removed in the computation of the
`perMarkerQC`. If the list is not present, `perMarkerQC` will send a message
about conducting the quality control on the entire dataset.

NB: To reduce the data size of the example data in `plinkQC`,
data.genome has already been reduced to the individuals that are related. Thus
the relatedness plots in C only show counts for related individuals only.

NB: To demonstrate the results of the ancestry check, the required eigenvector
file of the combined study and reference datasets have been precomputed and
for the purpose of this example will be copied to the `qcdir`. In practice,
@@ -156,7 +160,6 @@ overview_individuals <- overviewPerIndividualQC(fail_individuals,
```{r load overviewIndividualQC, out.width = "500px", echo=FALSE}
par(mfrow=c(2,1), las=1)
knitr::include_graphics("overviewQC.png")
knitr::include_graphics("overviewAncestryQC.png")
```


@@ -273,6 +276,10 @@ complex family structures, the unrelated individuals per family are selected
(e.g. in a parents-offspring trio, the offspring will be marked as fail, while
the parents will be kept in the analysis).

NB: To reduce the data size of the example data in `plinkQC`,
data.genome has already been reduced to the individuals that are related. Thus
the relatedness plots in C only show counts for related individuals only.

```{r check related, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'}
exclude_relatedness <- check_relatedness(indir=indir, qcdir=qcdir, name=name,
interactive=TRUE,
BIN +23.8 KB (100%) doc/plinkQC.pdf
Binary file not shown.

Some generated files are not rendered by default. Learn more.

Some generated files are not rendered by default. Learn more.

Some generated files are not rendered by default. Learn more.

0 comments on commit da987d8

Please sign in to comment.
You can’t perform that action at this time.