Skip to content

Data management options

Guo-Bo Chen edited this page May 22, 2018 · 4 revisions

Data management

GEAR supports usual data management via the options below. Those options can often be used with a specified analysis.


Sample selection

  • --keep [filename]
  • --remove [filename]

--keep/--remove can accept a file with family IDs and within-family ID in the first and the second columns, separated by whitespace, and it keeps/removes all unlisted samples from the current analysis.

  • --keep-fam [filename]
  • --remove-fam [filename]

--keep-fam/--remove-fam can accept a file with one family ID per line, and it keeps/removes the families have the listed family IDs from the current analysis.


SNP selection

  • --extract [filename]
  • --exclude [filename]

--extract accepts a text file with a list of SNP IDs (usually one SNP per line, but it's okay if separated by spaces), and removes unlisted variants from the current analysis. --exclude does the same.

  • --chr [chromosome ids]
  • --not-chr [chromosome ids]

It is flexible to include (--chr) or exclude (--not-chr) SNPs. For example,

--chr 1 4-6, it includes SNPs on chromosome 1, 4, 5, and 6.

--not-chr 1 4-8, it removes SNPs on chromosome 1, 4, 5, 6, 7, and 8.


Quality control

  • --maf [cutoff], "--maf 0.1" removes any SNP the minor allele frequency of which is lower than 0.1.
  • --max-maf [cutoff], "--max-maf 0.4" removes any SNP the minor allele frequencies of which is greater than 0.4.
  • --maf-range [range1 range2], "--maf-range 0.05-0.1 0.25-0.3" keeps any SNP the minor allele frequency of which is within the range 0.05-0.1 and 0.25-0.3.
  • --geno [missing-rate], "--geno 0.1" removes any SNP the missing rate of which is greater than 0.1.
  • --zero-var, when it is switched on, the SNP of zero variance will be eliminated.

Of note, --zero-var will remove a SNP all genotypes are "Aa", which has maf 0.5.

Clone this wiki locally