Skip to content

Commit

Permalink
individual and marker filtering in all individual qc functions; add a…
Browse files Browse the repository at this point in the history
…s additional function examples; update documentation (#33)
  • Loading branch information
HannahVMeyer authored and HannahVMeyer committed Feb 5, 2021
1 parent 16def3f commit bf9d8d3
Show file tree
Hide file tree
Showing 17 changed files with 823 additions and 115 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Generated by roxygen2: do not edit by hand

export(checkFiltering)
export(checkPlink)
export(check_ancestry)
export(check_het_and_miss)
Expand Down
326 changes: 217 additions & 109 deletions R/individualQC.R

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -447,3 +447,72 @@ checkFormat <- function(prefix) {
stop("plink binary file: ", prefix, ".bed does not exist.")
}
}

#' Check and construct PLINK sample and marker filters
#'
#' checkFiltering checks that the file names with the individuals and markers to
#' be filtered can be found. If so, it constructs the command for filtering
#'
#' @param keep_individuals [character] Path to file with individuals to be
#' retained in the analysis. The file has to be a space/tab-delimited text file
#' with family IDs in the first column and within-family IDs in the second
#' column. All samples not listed in this file will be removed from the current
#' analysis. See \url{https://www.cog-genomics.org/plink/1.9/filter#indiv}.
#' Default: NULL, i.e. no filtering on individuals.
#' @param remove_individuals [character] Path to file with individuals to be
#' removed from the analysis. The file has to be a space/tab-delimited text file
#' with family IDs in the first column and within-family IDs in the second
#' column. All samples listed in this file will be removed from the current
#' analysis. See \url{https://www.cog-genomics.org/plink/1.9/filter#indiv}.
#' Default: NULL, i.e. no filtering on individuals.
#' @param extract_markers [character] Path to file with makers to be
#' included in the analysis. The file has to be a text file with a list of
#' variant IDs (usually one per line, but it's okay for them to just be
#' separated by spaces). All unlisted variants will be removed from the current
#' analysis. See \url{https://www.cog-genomics.org/plink/1.9/filter#snp}.
#' Default: NULL, i.e. no filtering on markers.
#' @param exclude_markers [character] Path to file with makers to be
#' removed from the analysis. The file has to be a text file with a list of
#' variant IDs (usually one per line, but it's okay for them to just be
#' separated by spaces). All listed variants will be removed from the current
#' analysis. See \url{https://www.cog-genomics.org/plink/1.9/filter#snp}.
#' Default: NULL, i.e. no filtering on markers.
#' @return Vector containing args in sys::exec_wait format to enable filtering
#' on individuals and/or markers.
#' @export
checkFiltering <- function(keep_individuals=NULL,
remove_individuals=NULL,
extract_markers=NULL,
exclude_markers=NULL
) {
args <- NULL
if (!is.null(keep_individuals)) {
if (!file.exists(keep_individuals)) {
stop("File with individuals to keep in analysis does not exist: ",
keep_individuals)
}
args = c(args, "--keep", keep_individuals)
}
if (!is.null(remove_individuals)) {
if (!file.exists(remove_individuals)) {
stop("File with individuals to remove from analysis does not exist: ",
remove_individuals)
}
args = c(args, "--remove", remove_individuals)
}
if (!is.null(extract_markers)) {
if (!file.exists(extract_markers)) {
stop("File with markers to extract for analysis does not exist: ",
extract_markers)
}
args = c(args, "--extract", extract_markers)
}
if (!is.null(exclude_markers)) {
if (!file.exists(exclude_markers)) {
stop("File with markers to exclude from analysis does not exist: ",
exclude_markers)
}
args = c(args, "--exclude", exclude_markers)
}
return(args)
}
10 changes: 10 additions & 0 deletions inst/extdata/keep_individuals
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
ID_26 ID_26
ID_125 ID_125
ID_162 ID_162
ID_169 ID_169
ID_147 ID_147
ID_152 ID_152
ID_187 ID_187
ID_17 ID_17
ID_153 ID_153
ID_5 ID_5
20 changes: 20 additions & 0 deletions inst/extdata/remove_individuals
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
ID_132 ID_132
ID_97 ID_97
ID_8 ID_8
ID_150 ID_150
ID_168 ID_168
ID_16 ID_16
ID_99 ID_99
ID_88 ID_88
ID_129 ID_129
ID_34 ID_34
ID_54 ID_54
ID_106 ID_106
ID_130 ID_130
ID_95 ID_95
ID_127 ID_127
ID_200 ID_200
ID_7 ID_7
ID_98 ID_98
ID_84 ID_84
ID_101 ID_101
50 changes: 50 additions & 0 deletions man/checkFiltering.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

41 changes: 41 additions & 0 deletions man/check_ancestry.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 32 additions & 0 deletions man/check_het_and_miss.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 32 additions & 0 deletions man/check_relatedness.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 32 additions & 0 deletions man/check_sex.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit bf9d8d3

Please sign in to comment.