Skip to content

nayoung9/PAPipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PAPipe: a comprehensive pipeline for population genetic analysis

Main workflow

  1. Read trimming (by Trim Galore)
  2. Read mapping (by BWA or Bowtie 2)
  3. Genetic variant calling (by GATK3, GATK4 or BCFtools)
  4. Data filtering and format converting (by PLINK v 1.9)
  5. Population genetic analyses
    • Principal component analysis (by PLINK v 1.9 or PLINK v 2.0)
    • Phylogenetic tree analysis (by SNPhylo)
    • Population tree analysis (by TreeMix)
    • Population structure analysis (by ADMIXTURE)
    • Linkage disequilibrium decay analysis (by PopLDdecay)
    • Selective sweep analysis (by SweepFinder2)
    • Population admixture analysis (by AdmixTools)
    • Pairwise sequentially Markovian coalescent analysis (by psmc)
    • Multiple sequentially Markovian coalescent analysis (by msmc2)
    • Fixation index analysis (by VCFtools)

Install a Docker Engine (Need root permission)

Skip if your machine already has the engine (Installation document).

curl -fsSL https://get.docker.com/ | sudo sh

Add a Docker user to the docker group (Need root permission)

Skip if your account is already added in the docker group

sudo usermod -aG docker $USER 	

Install the PAPipe Docker image

wget http://bioinfo.konkuk.ac.kr/PAPipe/PAPipe.tar.gz    # Download the Docker image file
docker load -i ./PAPipe.tar.gz    # Load the Docker image file
docker image ls    # Check if the image loaded well ("REPOSITORY:pap_docker, TAG:latest" must be shown)

Run PAPipe

Setting local input directories (Caution: do not change the names and the directory structure)

mkdir RUN_DOCKER/
cd RUN_DOCKER/

mkdir data/
cd data/

mkdir ref/
mkdir input/
  • Place the following two files of a reference species in RUN_DOCKER/data/ref/

    • Genome assembly file (gzip-compressed FASTA file with an extension .fa.gz)
    • dbSNP VCF file (gzip-compressed VCF file with an extension .vcf.gz)
  • Place all other input data (read sequence files, read mapping files, or variant calling files) in RUN_DOCKER/data/input/

    • First, create separate directory for each population (one per population) in the "input" directory
    • Then, place files of each population in its directory (example below)
      • Files for Angus in RUN_DOCKER/data/input/Angus/
      • Files for Jersey in RUN_DOCKER/data/input/Jersey/

Preparing parameter files

PAPipe requires the following three parameter files

  • main_sample.txt: setting for populations and samples
  • main_input.txt: setting for input data files
  • main_param.txt: controlling parameters for PAPipe including various tools in PAPipe

The above three files must be placed in the above "RUN_DOCKER" directory.

You can easily generate the parameter files using our parameter file genetator.

Check out more details about the parameter file generator here.

Creating a Docker container that mounts the above "RUN_DOCKER" directory

docker run -v [absolute path of the "RUN_DOCKER" directory]:/RUN_DOCKER/  -it pap_docker:latest

Running PAPipe inside the Docker container

# Run in the docker container
cd /RUN_DOCKER/
python3 /PAPipe/bin/main.py  -P ./main_param.txt  -I ./main_input.txt -A ./main_sample.txt &> ./log

Analysis results will be generated in the output directory specified in the "main_param.txt" file.

Check out more details about the analysis results here.

Generating HTML pages for browsing analysis results

You can check all analysis results in the output directory specified in the "main_param.txt" file.

However, PAPipe also supports the generation of HTML pages for easily browsing the analysis results.

# Run in the docker container
perl /PAPipe/bin/webEnvSet.pl ./out &> webenvset.log    # Suppose "out" is the output directory set in the "main_param.txt" file
cd ./out/web/
perl /PAPipe/bin/html/prep_html.pl ./ &> ./webgen.log

After successfully running the above commands, follow the two steps below to open the HTML pages.

  • Download the entire directory of "web" into your personal computer.
  • Open the "index.html" file in the "web" directory using any web browser

If your machine supports a graphic user interface, you can directly go into the "web" directory and open the "index.html" file without downloading the "web" directory into your personal computer.

Check out more details about the generated HTML pages here.

Run PAPipe with test data

You can test PAPipe using a small test data. Check out more details here.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published