Skip to content

Latest commit

 

History

History
68 lines (40 loc) · 2.57 KB

README.md

File metadata and controls

68 lines (40 loc) · 2.57 KB

ngsUtils

In most cases you won't need to use them but, in case you do, there are several general programs available to manipulate intermediate files. Just keep in mind, that they haven't been extensively tested!

Installation

To install the entire package just download the source code:

% git clone https://github.com/mfumagalli/ngsUtils.git

and run:

% cd ngsUtils
% make

To run the tests (only if installed through ngsTools):

% make test

Executables are built into the main directory. If you wish to clean all binaries and intermediate files:

% make clean

Usage

Examples:

  • GetMergedGeno - Merge genotype posterior probabilities files. Since GetMergedGeno needs two files as input, the classic -infile option is replaced by -infiles and takes the path of two files.

% ./GetMergedGeno -infiles file1.geno file2.geno -nind 5 10 -nsites 10000 500000 -outfile out -verbose 0
  • GetSubGeno - Select a subset of genotype posterior probabilities files.

% ./GetSubGeno -infile file.geno -posfile pos_file.txt -nind 5 -nsites 10000 -len 2474 -outfile out -verbose 0
  • GetSubSim - Select a subset of of individuals and sites from simulated data files generated by ngsSim (.glf.gz). It extracts the first -nsites_new and -nind_new out of the original values.

% ./GetSubSim -infile file.geno -nind 5 -nsites 10000 -ncat 11 -outfile out -nind_new 2 -nsites_new 5000 -check 0
  • GetSwitchedGeno - Switch major/minor or ancestral/derived in genotype posterior probabilities files

% ./GetSwitchedGeno -infile file.geno -posfile pos_file.txt -nind 5 -nsites 10000 -len 2474 -outfile out -verbose 0

Parameters:

All programs use a subset of these options; please check above examples.

  • -infile FILE: path to input file
  • -nind INT: number of individuals in input file(s)
  • -nsites INT: number of sites in input file(s)
  • -posfile FILE: text file with list of sites/lines (1-based)
  • -len INT: length (number of lines) of posfile
  • -ncat INT: number of genotype categories [10], leave the default value
  • -nind_new INT: number of individuals to extract (starting from the beginning)
  • -nsites_new INT: number of sites to extract (starting from the beginning)
  • -check INT: additional verbose check of sanity of file dimensions (for debugging)
  • -outfile FILE: prefix for output files
  • -verbose INT: extra information is printed on screen

Further examples can be found here.