GitHub - ridgelab/SelecT: A genome-wide study of evolution

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
example		example
lib		lib
src		src
EnviSetup.jar		EnviSetup.jar
README		README
SignificanceAnalyzer.jar		SignificanceAnalyzer.jar
StatsCalc.jar		StatsCalc.jar
selection.slurm		selection.slurm
Repository files navigation

 ____       _          _____ 
/ ___|  ___| | ___  __|_   _|
\___ \ / _ \ |/ _ \/ __|| |
 ___) |  __/ |  __/ (__ | |
|____/ \___|_|\___|\___||_|

Created by RidgeLab Group, BYU Bioinformatics



Table of Contents
-----------------

  I. Introduction
 II. Installation Instructions
III. Usage Instructions and Examples
 IV. Funding and Acknowledgements
  V. Contact



I. Introduction
---------------
SelecT is a software tool developed to ___.

Please see our paper in __journal__ for further information:
    http://sub-domain.domain.tld/some/path/to/resource



II. Installation Instructions
-----------------------------
To install SelecT, first ensure the Java Runtime Environment (JRE) is installed
on your machine.  Second, download the software from the git repository as
follows:
    git clone https://github.com/ridgelab/SelecT.git



III. Usage Instructions and Examples
-------------------------------------
Instructions are created for use on a high-performance computing cluster.
Modifications for individual setup may be necessary. The pipeline has been
divided into three phases.

Please note, a log file will be created in the current working directory when
any part of SelecT is run. Should the same part of SelecT be run again in the
same directory, a number (first `1', then `2', etc.) will be appended to the new
logfile name so as to avoid collisions.


--------------------------------
| PHASE 1 -- Environment Setup |
--------------------------------

Required Positional Arguments:
[1]    Data Directory        Directory should contain all phased VCF or
                HAP/LEGEND file required for selection analysis. File names must
                contain proper flags and file extensions. Include optional (but
                highly recommended) Ancestral data embedded in VCF files or as
                separate LEGEND/EMF file

[2]    Map Directory        Directory that contains all required genetic map
                files for SelecT analysis. File names must contain proper
                chromosome flags.

[3]    Start Chromosome    Must be a number between 1-22; sex chromosomes not
                yet supported.

[4]    End Chromosome        Must be a number between 1-22 and greater or equal
                to Start Chromosome; sex chromosomes not yet supported.

[5]    Target Population    Population identifier for experimental population
                TST can be used if no standard indentifier exists

[6]    Cross Population    Population identifier for cross population TST can
                be used if no standard indentifier exists. Cross Population
                cannont be the same as Target Population.

Optional Arguments:
--out_pop    Outgroup Population    Population identifier for outgroup
                    population. TST can be used if no standard indentifier
                    exists. Outgroup Population cannont be the same as Target
                    Population.

--working_dir    Working Directory    Defines the directory where SelecT will
                    create a new working directory. Default is current
                    directory.

--win_size    Window Size        For changing SelecT analysis window size (in
                    megabases. Default is 0.5Mb.

Examples:
java -Xmx[MB]m -jar EnviSetup.jar [1] [2] [3] [4] [5] [6] \
        --working_dir=path/to/directory
java -Xmx3000m -jar EnviSetup.jar example/haplegend_data example/map 21 21 CEU YRI \
        --working_dir=example



-----------------------------------
| PHASE 2 -- Calculate Statistics |
-----------------------------------

Required Positional Arguments:
[1]    Working Directory    SelecT working directory created in Phase 1.
                Working Directory name can be changed but subdirectory names
                must be unchanged.

[2]    Simulation Directory    Directory where simulations can be found.
                Must contain simulation file neutral_simulation.tsv and
                selection_simulation.tsv. These can be found here:
                https://github.com/ridgelab/SelecT/tree/master/example/sim

[3]    Chromosome        Chromosome number where window can be found

[4]    Window Number        Window index number as defined by SelecT evironment
                setup. See SelecT_workspace/envi_files/all_wins for window
                ranges.

Optional Arguments:
-inon        Non-absolute iHS    Runs iHS score probabilities where large
             negative scores ONLY are associated with selection (replicate
             CMS_local). Defualt is large positive AND negative iHS scores
             equate to greater selection (Voight).

--prior_prob    Prior Probability    Set Prior Probability to a custom value
                    between 0.0 and 1. Defaults to (1 / actual number of
                    variants within window).

-pp        Prior Probability Flag    Sets Prior Probability to 1/10,000
                    (replicate CMS_local).

--daf_cutoff    DAF Cutoff        Defines the derived dllele frequency cutoff
                    for compose score. Defaults to a DAF value of 0.2. Special
                    case: DAF value of 0.0 indicates incomplete MoP score
                    calculation, PoP is unchanged.

Exmaples:
java -Xmx[MB]m -jar StatsCalc.jar [1] [2] [3] [4]
java -jar StatsCalc.jar example/SelecT_workspace example/sim 21 2

Note, this could be run by hand on a local machine, but we imagine automating
the process a bit. A simple example SLURM script is provided (`selection.slurm')
for running StatsCalc.jar for a single window.  A possible bash script for
submitting this SLURM script for each window to a high-performance computing
cluster running SLURM might look like this:

    #! /bin/bash
    
    chromosome=21

    for window in {0..3}
    do
        sbatch selection.slurm $chromosome $window
    done

    exit 0



-----------------------------------
| PHASE 3 -- Analyze Significance |
-----------------------------------

Required Positional Arguments:
[1]     Working Directory    SelecT working directory created in Phase 1.
                Working Directory name can be changed but subdirectory names
                must be unchanged.

[2]    Chromosome        Chromosome number where window can be found

Optional Arguments:
-co        Combine Only    Only runs first half of analysis where windows are
                combined into one file.

--combine_fltr    Combine Filter    Uses a specific filter for printing specific
                combination of stats in output. Can only be used in conjunction
                with the -co flag i=iHS, x=XPEHH, h=iHH, dd=dDAF, d=DAF, f=Fst,
                up=unstd_PoP, um=unstd_MoP, p=PoP, m=MoP. Each tag should be
                separated by a colon (i.e. i:x:h:dd:d:f:up:um:p:m).

--p_value    p_Value        Sets the p-value cutoff for significance check on
                composite scores. Defaults to 0.01

-wc        Write Combine    Similar to -co, but also runs significance filtering

-rn        Normalization    Runs normalization step across the entire
                dataset/chromosome. Normalizes by standardization (mean 0;
                standard deviation 1).

-ui        Use Incomplete    Use incomplete data when analyzing MoP scores

-im        Ignore MoP    Ignores all MoP scores and finds significance based
                upon PoP only.

-ip        Ignore PoP     Ignores all PoP scores and finds significance based
                upon MoP only. If both -im and -ip flags are present
                significance is found by looking at either PoP or MoP. If
                neither -im or -ip flag is present significance is found by
                looking at both PoP and MoP.

Exmaples:
java -Xmx[MB]m -jar SignificanceAnalyzer.jar [1] [2]
java -jar SignificanceAnalyzer.jar example/SelecT_workspace 21



IV. Funding and Acknowledgements
-------------------------------
Funding for the research and production of this software was provided by
startup funds to Perry Ridge, Ph.D.



V. Contact
-----------
For questions, comments, concerns, feature requests, suggestions, etc., please
contact:

Hayden Smith -- smithinformatics@gmail.com
Pery Ridge, Ph.D. -- perry.ridge@byu.edu

Note: For usage questions, please consult section `III. Usage Instructions and
Examples' first.