Skip to content

Commit

Permalink
Merge branch 'jody'
Browse files Browse the repository at this point in the history
  • Loading branch information
aewebb80 committed Jul 6, 2020
2 parents 07daee0 + f00d3f1 commit 829646b
Show file tree
Hide file tree
Showing 71 changed files with 340,561 additions and 1,661,803 deletions.
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,5 @@ cache: pip
script:
- sh run_jared_tests.sh
- sh run_andrew_tests.sh
- sh run_jh_tests.sh

Empty file modified build.sh
100644 → 100755
Empty file.
Empty file modified docs/.nojekyll
100755 → 100644
Empty file.
6 changes: 6 additions & 0 deletions docs/source/PPP_pages/Analyses/vcf_to_sfs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
================================================
vcf_to_sfs.py: Site Frequency Spectrum generator
================================================
.. automodule:: vcf_to_sfs


2 changes: 0 additions & 2 deletions docs/source/PPP_pages/Functions/vcf_to_treemix.rst

This file was deleted.

4 changes: 4 additions & 0 deletions docs/source/PPP_pages/Input_File_Generators/vcf_to_dadi.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
===============================================
vcf_to_dadi.py: VCF to dadi Conversion Function
===============================================
.. automodule:: vcf_to_dadi
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
=============================================================
vcf_to_fastsimcoal.py: VCF to fastsimcoal Conversion Function
=============================================================
.. automodule:: vcf_to_fastsimcoal

4 changes: 4 additions & 0 deletions docs/source/PPP_pages/Input_File_Generators/vcf_to_gphocs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
===================================================
vcf_to_gphocs.py: VCF to GPhocs Conversion Function
===================================================
.. automodule:: vcf_to_gphocs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
=====================================================
vcf_to_treemix.py: VCF to treemix Conversion Function
=====================================================
.. automodule:: vcf_to_treemix
5 changes: 5 additions & 0 deletions docs/source/PPP_pages/Utilities/vcf_bed_to_seq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
========================================================
vcf_bed_to_seq.py: Generate sequences from VCF/BED Files
========================================================
.. automodule:: vcf_bed_to_seq

3 changes: 3 additions & 0 deletions docs/source/PPP_pages/analyses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,6 @@ These functions were developed to perform the actual population genetic analyses
Analyses/admixture
Analyses/ima3_wrapper
Analyses/plink_linkage_disequilibrium
Analyses/vcf_to_sfs


10 changes: 7 additions & 3 deletions docs/source/PPP_pages/input_file_generators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,14 @@
PPP Input File Generators
=========================

Many techniques are incompatible with VCF files and require file conversion that may be computationally intensive and/or time consuming. To avoid potential redundancies, we have not automated conversions within the PPP. Instead we have developed multiple functions to generate the necessary input files.
For several programs that implement analyses of population genomic variation, PPP provides scripts to generate input files from VCF files.

.. toctree::
:maxdepth: 1

Functions/vcf_format_conversions
Functions/vcf_to_ima
Input_File_Generators/vcf_format_conversions
Input_File_Generators/vcf_to_ima
Input_File_Generators/vcf_to_treemix
Input_File_Generators/vcf_to_gphocs
Input_File_Generators/vcf_to_fastsimcoal
Input_File_Generators/vcf_to_dadi
2 changes: 1 addition & 1 deletion docs/source/PPP_pages/utilities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ The utility functions were developed to perform various tasks often needed when

Utilities/vcf_utilities
Utilities/bed_utilities

Utilities/vcf_bed_to_seq
Empty file modified docs/source/conf.py
100755 → 100644
Empty file.
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Popgen Pipeline Platform
========================

.. only:: not html

------------
Introduction
------------
Expand All @@ -21,7 +21,7 @@ The PPP was written using the Python programming language and designed to operat

The core functions of the PPP were designed to operate using VCF-based files primarily due to frequent support for the format among publicly available datasets and population genomics software. Most users will begin their pipelines with these core functions before moving onto an analysis function. Please note that most analysis functions require a preceding file conversion function to operate.

Please Note: This documentation is currently being devlopement and will be updated freqeuntly in the coming days
Please Note: This documentation is currently being devloped and will be updated freqeuntly in the coming days

.. toctree::
:maxdepth: 2
Expand Down
Empty file modified docs/source/requirements.txt
100755 → 100644
Empty file.
333,335 changes: 333,335 additions & 0 deletions jhtests/chr22_pan_example2_ref.fa

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions jhtests/chr22_pan_example2_ref.fa.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
22 20000000 4 60 61
222 changes: 222 additions & 0 deletions jhtests/jhtests_run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
import unittest
import filecmp
import sys
import os
import logging
import shutil
import zipfile

## Import scripts to test

import pgpipe.vcf_bed_to_seq as vcf_bed_to_seq
import pgpipe.vcf_to_sfs as vcf_to_sfs
import pgpipe.vcf_to_dadi as vcf_to_dadi
import pgpipe.vcf_to_treemix as vcf_to_treemix
import pgpipe.vcf_to_gphocs as vcf_to_gphocs
import pgpipe.vcf_to_fastsimcoal as vcf_to_fastsimcoal


## Used to compare two files, returns bool
def file_comp(test_output, expected_output):
return filecmp.cmp(test_output, expected_output)



class jh_function_tests (unittest.TestCase):

def test_vcf_bed_to_seq1 (self):
vcf_bed_to_seq.run(['--vcf','pan_example.vcf.gz',
'--fasta-reference',"pan_example_ref.fa",
'--model-file',"panmodels.model",
'--modelname',"4Pop",
'--region',"21:4431001-4499000",
'--out','ppp_test_temp.out'])

# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out',
'results/vcf_bed_to_seq_test.out'))

# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out')


def test_vcf_to_sfs1 (self):

vcf_to_sfs.run(['--vcf',"pan_example2.vcf.gz",
'--model-file',"panmodels.model",
'--modelname','4Pop',
'--downsamplesizes','3','3','3','4',
'--folded','--outgroup-fasta',"chr22_pan_example2_ref.fa",
'--out',"ppp_test_temp.out"])

# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out',
'results/vcf_to_sfs_test1.txt'))

# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out')

def test_vcf_to_sfs2 (self):
vcf_to_sfs.run(['--vcf',"pan_example.vcf.gz",
'--model-file',"panmodels.model",'--modelname','5Pop',
'--downsamplesizes','3','3','3','4','2',
'--folded','--outgroup-fasta',"pan_example_ref.fa",
'--out',"ppp_test_temp.out"])


# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out',
'results/vcf_to_sfs_test2.txt'))

# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out')


def test_vcf_to_gphocs1 (self):
vcf_to_gphocs.run(['--vcf','pan_example.vcf.gz','--reference',
"pan_example_ref.fa",
'--model-file',"panmodels.model",'--modelname',"4Pop",
'--bed-file',"pan_example_regions.bed",
'--out','ppp_test_temp.out'])



# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out',
'results/vcf_gphocs_test.out'))

# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out')
def test_vcf_to_gphocs2 (self):
vcf_to_gphocs.run(['--vcf','pan_example2.vcf.gz','--reference',
"chr22_pan_example2_ref.fa",
'--model-file',"panmodels.model",'--modelname',"4Pop",
'--bed-file',"pan_test_regions.bed",
'--out','ppp_test_temp.out','--diploid','False'])


# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out',
'results/vcf_gphocs_test2.out'))

# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out')

def test_vcf_to_treemix1(self):

vcf_to_treemix.run(['--vcf','pan_example.vcf.gz',
'--model-file',"panmodels.model",
'--modelname',"4Pop",
'--out','ppp_test_temp.out',
'--bed-file',"pan_example_regions.bed",
'--kblock','1000'])
# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out.gz',
'results/vcf_treemixtest1.gz'))

# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out.gz')

def test_vcf_to_treemix2(self):
vcf_to_treemix.run(['--vcf','pan_example.vcf.gz',
'--model-file',"panmodels.model",
'--modelname',"4Pop",
'--out','ppp_test_temp.out'])
# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out.gz',
'results/vcf_treemixtest2.gz'))

# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out.gz')


def test_vcf_to_fastsimcoal1(self):

vcf_to_fastsimcoal.run(['--vcf',"pan_example.vcf.gz",
'--model-file',"panmodels.model",
'--modelname','3Pop',
'--dim','1','2','m',
'--basename','ppp_test_temp.out'])
# must extract archives to check that files match
pz = zipfile.ZipFile('ppp_test_temp.out.zip',mode='r')
vz = zipfile.ZipFile('results/vcf_fsc1.zip',mode='r')
pznl = pz.namelist()
vznl = vz.namelist()
assert len(pznl) == len(vznl)
for fi in range(len(pznl)):
pz.extract(pznl[fi])
vz.extract(vznl[fi])
# Confirm that the output is what is expected
self.assertTrue(file_comp(pznl[fi],vznl[fi]))
# remove extracted files
self.addCleanup(os.remove, pznl[fi])
self.addCleanup(os.remove, vznl[fi])


def test_vcf_to_fastsimcoal2(self):

vcf_to_fastsimcoal.run(['--vcf',"pan_example2.vcf.gz",
'--model-file',"panmodels.model",
'--modelname','5Pop',
'--downsamplesizes','3','3','3','4','2',
'--basename','ppp_test_temp.out',
'--folded','--dim','1','2','m',
'--outgroup-fasta',"chr22_pan_example2_ref.fa"])
# must extract archives to check that files match
pz = zipfile.ZipFile('ppp_test_temp.out.zip',mode='r')
vz = zipfile.ZipFile('results/vcf_fsc2.zip',mode='r')
pznl = pz.namelist()
vznl = vz.namelist()
assert len(pznl) == len(vznl)
for fi in range(len(pznl)):
pz.extract(pznl[fi])
vz.extract(vznl[fi])
# Confirm that the output is what is expected
self.assertTrue(file_comp(pznl[fi],vznl[fi]))
# remove extracted files
self.addCleanup(os.remove, pznl[fi])
self.addCleanup(os.remove, vznl[fi])


def test_vcf_to_dadi1(self):

vcf_to_dadi.run(['--vcf','pan_example.vcf.gz',
'--model-file',"panmodels.model",
'--modelname',"4Pop",'--out','ppp_test_temp.out',
'--comment','testing bedfile',
'--bed-file',"pan_example_regions.bed"])
# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out',
'results/vcf_dadisnp_bedfile_test.out'))
# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out')


def test_vcf_to_dadi2(self):
vcf_to_dadi.run(['--vcf',"pan_example2.vcf.gz",
'--model-file',"panmodels.model",
'--modelname',"4Pop",'--out','ppp_test_temp.out',
'--comment','testing comment'])
# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out',
'results/vcf_dadisnp_test.out'))
# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out')

def test_vcf_to_dadi3(self):
vcf_to_dadi.run(['--vcf',"pan_example2.vcf.gz",'--model-file',
"panmodels.model",
'--modelname',"4Pop",'--out','ppp_test_temp.out',
'--comment','testing outgroup-fasta','--outgroup-fasta',
"chr22_pan_example2_ref.fa"])
# Confirm that the output is what is expected
self.assertTrue(file_comp('ppp_test_temp.out',
'results/vcf_dadisnp_fasta_test.out'))
# Remove the ouput
self.addCleanup(os.remove, 'ppp_test_temp.out')

if __name__ == "__main__":
unittest.main(verbosity = 2)



File renamed without changes.
Binary file added jhtests/pan_example.vcf.gz
Binary file not shown.
Binary file added jhtests/pan_example.vcf.gz.tbi
Binary file not shown.
2 changes: 2 additions & 0 deletions jhtests/pan_example2.fa.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
21 14365450 4 50 51
22 19389255 14652767 50 51
1,114 changes: 1,114 additions & 0 deletions jhtests/pan_example2.vcf

Large diffs are not rendered by default.

Binary file added jhtests/pan_example2.vcf.gz
Binary file not shown.
Binary file added jhtests/pan_example2.vcf.gz.tbi
Binary file not shown.
File renamed without changes.
2 changes: 2 additions & 0 deletions jhtests/pan_example_ref.fa.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
21 14365450 4 50 51
22 19389255 14652767 50 51
File renamed without changes.
3 changes: 3 additions & 0 deletions jhtests/pan_test_regions.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
22 14430599 14431600
22 15658547 15659548

File renamed without changes.
27 changes: 27 additions & 0 deletions jhtests/results/vcf_bed_to_seq_test.out

Large diffs are not rendered by default.

0 comments on commit 829646b

Please sign in to comment.