Skip to content

Commit

Permalink
Added additional output formats to vcf_filter and edited vcf_calc to …
Browse files Browse the repository at this point in the history
…support a list of filtered positions
  • Loading branch information
aewebb80 committed Sep 15, 2017
1 parent 432fc0d commit 1851d97
Show file tree
Hide file tree
Showing 10 changed files with 151 additions and 9 deletions.
15 changes: 15 additions & 0 deletions andrew/out.filter.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
--gzvcf example/input/merged_chr1_10000.vcf.gz
--max-alleles 2
--min-alleles 2
--removed-sites

Using zlib version: 1.2.8
After filtering, kept 38 out of 38 Individuals
Outputting Removed Sites...
After filtering, kept 19 out of a possible 10000 Sites
Run Time = 0.00 seconds
19 changes: 19 additions & 0 deletions andrew/out.removed.sites
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
CHROM POS
chr1 1036917
chr1 1048024
chr1 1051951
chr1 1056730
chr1 1079307
chr1 1084295
chr1 1096361
chr1 1101571
chr1 1122398
chr1 1134614
chr1 1135724
chr1 1148630
chr1 1194151
chr1 1466991
chr1 1472522
chr1 1504742
chr1 1508112
chr1 1508620
34 changes: 34 additions & 0 deletions andrew/out.windowed.weir.fst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST MEAN_FST
chr1 1030001 1040000 11 0.122493 0.0941129
chr1 1040001 1050000 14 0.748157 0.400248
chr1 1050001 1060000 14 0.671993 0.237697
chr1 1060001 1070000 5 0.245747 0.158881
chr1 1070001 1080000 12 0.211005 0.169369
chr1 1080001 1090000 6 0.21397 0.190779
chr1 1090001 1100000 5 -0.0539413 -0.0615674
chr1 1100001 1110000 6 0.00701423 0.0225826
chr1 1110001 1120000 22 0.108047 0.0214824
chr1 1120001 1130000 6 0.305248 0.14991
chr1 1130001 1140000 5 0.708117 0.382197
chr1 1140001 1150000 3 -0.0730342 -0.0704015
chr1 1150001 1160000 5 0.435111 0.26577
chr1 1160001 1170000 1 -0.055794 -0.055794
chr1 1170001 1180000 17 -0.010333 0.0226575
chr1 1180001 1190000 10 0.496138 0.043603
chr1 1190001 1200000 7 0.324824 0.200795
chr1 1200001 1210000 5 0.0104593 0.0381257
chr1 1210001 1220000 1 -0.0464602 -0.0464602
chr1 1220001 1230000 6 0.755604 0.11035
chr1 1230001 1240000 4 0.127154 0.122056
chr1 1240001 1250000 12 0.528519 0.234784
chr1 1270001 1280000 1 -0.060459 -0.060459
chr1 1280001 1290000 1 0.926057 0.926057
chr1 1290001 1300000 1 0.174957 0.174957
chr1 1320001 1330000 1 0.509247 0.509247
chr1 1340001 1350000 3 0.0743918 0.0423312
chr1 1350001 1360000 1 -0.0626921 -0.0626921
chr1 1460001 1470000 5 0.495124 0.117919
chr1 1470001 1480000 29 0.403198 0.244726
chr1 1480001 1490000 11 0.419534 0.12102
chr1 1490001 1500000 4 0.218818 0.0282106
chr1 1500001 1510000 23 0.426864 0.115176
23 changes: 23 additions & 0 deletions andrew/out.windowed.weir.fst.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
--gzvcf example/input/merged_chr1_10000.vcf.gz
--exclude-positions out.removed.sites
--fst-window-size 10000
--fst-window-step 20000
--weir-fst-pop example/input/Troglodytes.txt
--weir-fst-pop example/input/Paniscus.txt
--keep example/input/Troglodytes.txt
--keep example/input/Paniscus.txt

Using zlib version: 1.2.8
Keeping individuals in 'keep' list
After filtering, kept 17 out of 38 Individuals
Outputting Windowed Weir and Cockerham Fst estimates.
Fst: Only using diploid sites.
Weir and Cockerham mean Fst estimate: 0.14725
Weir and Cockerham weighted Fst estimate: 0.36382
After filtering, kept 9982 out of a possible 10000 Sites
Run Time = 0.00 seconds
14 changes: 14 additions & 0 deletions andrew/pipeline.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
2017-09-15 14:28:30,690 - logArgs - INFO: Arguments for vcf_calc:
2017-09-15 14:28:30,690 - logArgs - INFO: Arguments pop_file: ['example/input/Troglodytes.txt', 'example/input/Paniscus.txt']
2017-09-15 14:28:30,690 - logArgs - INFO: Arguments vcfname: example/input/merged_chr1_10000.vcf.gz
2017-09-15 14:28:30,690 - logArgs - INFO: Arguments filter_exclude_positions: out.removed.sites
2017-09-15 14:28:30,690 - logArgs - INFO: Arguments calc_statistic: windowed-weir-fst
2017-09-15 14:28:30,691 - logArgs - INFO: Arguments filter_include_positions: None
2017-09-15 14:28:30,691 - logArgs - INFO: Arguments statistic_window_step: 20000
2017-09-15 14:28:30,691 - logArgs - INFO: Arguments statistic_window_size: 10000
2017-09-15 14:28:30,691 - logArgs - INFO: Arguments out: out
2017-09-15 14:28:30,691 - run - INFO: vcftools parameters assigned
2017-09-15 14:28:30,691 - check_for_vcftools_output - INFO: Output file assigned
2017-09-15 14:28:30,691 - check_for_vcftools_output - INFO: Log file assigned
2017-09-15 14:28:30,691 - run - INFO: Input file assigned
2017-09-15 14:28:30,909 - run - INFO: vcftools call complete
19 changes: 19 additions & 0 deletions andrew/vcf_calc.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ def metavar_list (var_list):
vcf_parser.add_argument('--statistic-window-size', help = 'Specifies the size of window calculations', type = int, default = 10000)
vcf_parser.add_argument('--statistic-window-step', help = 'Specifies step size between windows', type = int, default = 20000)

# Position-based position filters
vcf_parser.add_argument('--filter-include-positions', help = 'Specifies a set of sites to include within a file (tsv chromosome and position)', action = parser_confirm_file())
vcf_parser.add_argument('--filter-exclude-positions', help = 'Specifies a set of sites to exclude within a file (tsv chromosome and position)', action = parser_confirm_file())


if passed_arguments:
return vcf_parser.parse_args(passed_arguments)
Expand Down Expand Up @@ -91,6 +95,10 @@ def run (passed_arguments = []):
Specifies the window size for window-based statistics
--statistic-window-step : int
Specifies step size between windows for spcific window-based statistics
--filter-include-positions : str
Specifies a set of sites to include within a tsv file (chromosome and position)
--filter-exclude-positions : str
Specifies a set of sites to exclude within a tsv file (chromosome and position)
Returns
-------
Expand Down Expand Up @@ -176,6 +184,17 @@ def run (passed_arguments = []):
# Assigns the suffix for the vcftools log file
vcftools_log_suffix = 'het'

# Assing the filtered positions
if vcf_args.filter_include_positions or vcf_args.filter_exclude_positions:

# Assigns the sites to keep
if vcf_args.filter_include_positions:
vcftools_call_args.extend(['--positions', vcf_args.filter_include_positions])

# Assigns the sites to remove
if vcf_args.filter_exclude_positions:
vcftools_call_args.extend(['--exclude-positions', vcf_args.filter_exclude_positions])

logging.info('vcftools parameters assigned')

# Confirm the vcftools output and log file do not exist
Expand Down
36 changes: 27 additions & 9 deletions andrew/vcf_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ def metavar_list (var_list):
vcf_parser.add_argument("vcfname", metavar = 'VCF_Input', help = "Input VCF filename", type = str, action = parser_confirm_file())

# Other file arguments. Expand as needed
vcf_parser.add_argument('--out', help = 'Specifies the filtered VCF output filename', type = str, default = 'out', action = parser_confirm_no_file())
vcf_parser.add_argument('--out', help = 'Specifies the output filename', type = str, default = 'out', action = parser_confirm_no_file())

out_format_list = ['vcf', 'bcf']
out_format_default = 'bcf'
out_format_list = ['vcf', 'bcf', 'removed_sites', 'kept_sites']
out_format_default = 'removed_sites'

vcf_parser.add_argument('--out-format', metavar = metavar_list(out_format_list), help = 'Specifies the output format.', type = str, choices = out_format_list, default = out_format_default)
### Filters
Expand All @@ -58,6 +58,10 @@ def metavar_list (var_list):
vcf_parser.add_argument('--filter-from-bp', help = 'Specifies the lower bound of sites to include (May only be used with a single chromosome)', type = int)
vcf_parser.add_argument('--filter-to-bp', help = 'Specifies the upper bound of sites to include (May only be used with a single chromosome)', type = int)

# Position-based position filters
vcf_parser.add_argument('--filter-include-positions', help = 'Specifies a set of sites to include within a file (tsv chromosome and position)', action = parser_confirm_file())
vcf_parser.add_argument('--filter-exclude-positions', help = 'Specifies a set of sites to exclude within a file (tsv chromosome and position)', action = parser_confirm_file())

# BED-based position filters
vcf_parser.add_argument('--filter-include-bed', help = 'Specifies a set of sites to include within a BED file', action = parser_confirm_file())
vcf_parser.add_argument('--filter-exclude-bed', help = 'Specifies a set of sites to exclude within a BED file', action = parser_confirm_file())
Expand All @@ -72,8 +76,8 @@ def metavar_list (var_list):
vcf_parser.add_argument('--filter-exclude-info', help = 'Specifies that all sites with the given info flag should be excluded', nargs = '+', type = str)

# Allele count filters
vcf_parser.add_argument('--filter-min-alleles', help = 'Specifies that only sites with a number of allele >= to the number given should be included', type = int)
vcf_parser.add_argument('--filter-max-alleles', help = 'Specifies that only sites with a number of allele <= to the number given should be included', type = int)
vcf_parser.add_argument('--filter-min-alleles', help = 'Specifies that only sites with a number of allele >= to the number given should be included', type = int, default = 2)
vcf_parser.add_argument('--filter-max-alleles', help = 'Specifies that only sites with a number of allele <= to the number given should be included', type = int, default = 2)

# Missing data filter
vcf_parser.add_argument('--filter-max-missing', help = 'Specifies that only sites with more than this number of genotypes among individuals should be included', type = int)
Expand Down Expand Up @@ -105,7 +109,7 @@ def run (passed_arguments = []):
--out : str
Specifies the output filename
--out-format : str
Specifies the output format {vcf, bcf} (Default: bcf)
Specifies the output format {vcf, bcf, removed_sites, kept_sites} (Default: removed_sites)
--filter-include-chr : list or str
Specifies the chromosome(s) to include
--filter-exclude-chr : list or str
Expand All @@ -114,6 +118,10 @@ def run (passed_arguments = []):
Specifies the lower bound of sites to include. May only be used with a single chromosome
--filter-to-bp : int
Specifies the upper bound of sites to include. May only be used with a single chromosome
--filter-include-positions : str
Specifies a set of sites to include within a tsv file (chromosome and position)
--filter-exclude-positions : str
Specifies a set of sites to exclude within a tsv file (chromosome and position)
--filter-include-bed : str
Specifies a set of sites to include within a BED file
--filter-exclude-bed : str
Expand All @@ -129,9 +137,9 @@ def run (passed_arguments = []):
--filter-exclude-info : list or str
Specifies that all sites with the given info flag should be excluded
--filter-min-alleles : int
Specifies that only sites with a number of allele >= to the number given should be included
Specifies that only sites with a number of allele >= to the number given should be included (Default: 2)
--filter-min-alleles : int
Specifies that only sites with a number of allele <= to the number given should be included
Specifies that only sites with a number of allele <= to the number given should be included (Default: 2)
Returns
-------
Expand Down Expand Up @@ -159,7 +167,11 @@ def run (passed_arguments = []):
vcftools_call_args = ['--out', vcf_args.out]

if vcf_args.out_format:
if vcf_args.out_format == 'bcf':
if vcf_args.out_format == 'removed_sites':
vcftools_call_args.append('--removed-sites')
elif vcf_args.out_format == 'kept_sites':
vcftools_call_args.append('--kept-sites')
elif vcf_args.out_format == 'bcf':
vcftools_call_args.append('--recode-bcf')
elif vcf_args.out_format == 'vcf':
vcftools_call_args.append('--recode')
Expand All @@ -178,6 +190,12 @@ def run (passed_arguments = []):
if vcf_args.filter_exclude_chr:
vcftools_call_args.extend(['--to-bp', vcf_args.filter_to_bp])

if vcf_args.filter_include_positions or vcf_args.filter_exclude_positions:
if vcf_args.filter_include_positions:
vcftools_call_args.extend(['--positions', vcf_args.filter_include_positions])
if vcf_args.filter_exclude_positions:
vcftools_call_args.extend(['--exclude-positions', vcf_args.filter_exclude_positions])

if vcf_args.filter_include_bed or vcf_args.filter_exclude_bed:
if vcf_args.filter_include_bed:
vcftools_call_args.extend(['--bed', vcf_args.filter_include_bed])
Expand Down
Binary file added andrew/vcftools.pyc
Binary file not shown.
Binary file added jared/logging_module.pyc
Binary file not shown.
Binary file added jared/vcf_reader_func.pyc
Binary file not shown.

0 comments on commit 1851d97

Please sign in to comment.