Skip to content

Simulating case control data

gc5k edited this page Jan 2, 2018 · 1 revision

###Simulation for case-control data###


Options

--simu-cc

Specify the numbers of cases and controls.

--simu-order

The effects are sorted in ascending order and assign to QTLs. So the first QTL has the smallest effect and the last QTL has the largest effect.

--poly-loci

Specify the number of total loci, which is 1000 by default.

--poly-loci-null

Specify the number of null loci, which is zero by default.

--poly-ld

Specify LD in Lewontin's D', a value between -1 to 1. It defaults to 0, linkage equilibrium for markers.

--poly-U

If want to the effects to be uniformly distributed, turn this option on; otherwise, the additive effects follow a normal distribution N(0,h2/N), in which h2 is the heritability and N is the number of loci.

--poly-effect

Specify the file that has the effect for each locus. This command will mask --poly-U.

--simu-k

The prevalence of the cases in the population. It defaults to 0.05.

--simu-hsq

Specify the heritability of the trait. It defaults to 0.5 under the liability scale.

--seed

Specify the seed for simulation.

--make-bed

The genotypes will be written in the plink binary format.

Examples

gear --simu-cc 500,500 --poly-loci 100 --simu-k 0.01 --simu-hsq 0.8 --seed 2010 --out poly
gear --simu-cc 500,500 --poly-loci 100 --poly-loci-null 50 --simu-k 0.01 --simu-hsq 0.8 --seed 2010 --out poly
gear --simu-cc 500,500 --poly-loci 100 --simu-k 0.01 --simu-hsq 0.8 --make-bed --out poly
gear --simu-cc 500,500 --simu-order --poly-loci 100 --simu-k 0.01 --simu-hsq 0.8 --make-bed --out poly
gear --simu-cc 500,500 --poly-effect effect.txt --simu-k 0.01 --simu-hsq 0.8 --make-bed --out poly
~~~~~~
The output files include *.bim, *.fam, and *.bed (the genotype file in plink binary format).

*.phe: there are three columns included. The first two columns are family id and individual id. The 3rd column is the phenotypic value.

*.breed: genotypic (3rd) and the phenotypic (4th) values in the liability scale. 
*.rnd: there are three columns included. 1st is the marker name, 2nd is the reference allele, the 3rd column is its additive effect.

*.add: the genotype in additive model coding scheme.

*.cov: there are four columns included. The first two columns are family id and individual id, the third column is probability given the liability, and the fourth column is the liability.

[Return to GEAR Home](https://github.com/gc5k/GEAR/wiki)