Skip to content

Commit

Permalink
Update fasta_generate_regions.py to work with Snakemake example
Browse files Browse the repository at this point in the history
Use ceil() instead of round() to prevent round-off errors from affecting regions/numbering
  • Loading branch information
mglubber committed Feb 16, 2021
1 parent 887fe73 commit c926cc4
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 4 deletions.
4 changes: 2 additions & 2 deletions examples/snakemake-freebayes-parallel.smk
@@ -1,7 +1,7 @@
## A simple example snakemake .smk file for parallelising freebayes
## Uses a fasta_generate_regions to split the genome into regions of equal size based on the .fai index
## As snakemake automatically moves each cpu core to the next genome chunk, this works out faster
## than the freebayes-parallel wrapper and allows pooled sample calling.
## than the freebayes-parallel wrapper.
## This .smk file assumes we have a list of the bam files called bam.list
## This .smk file splits the genome by chromosome, which of course, is not necessary.
## One will want to edit the paths (for example, the path to bam files)
Expand All @@ -14,7 +14,7 @@ chroms = [1,2,3]
nchunks = 9

bamlist = "path/to/bam.list"
chunks = [i for i in range(1,nchunks+1)]
chunks = list(range(1,nchunks+1))

rule GenomeIndex:
input:
Expand Down
5 changes: 3 additions & 2 deletions scripts/fasta_generate_regions.py
@@ -1,6 +1,7 @@
#!/usr/bin/env python3

import argparse
from math import ceil


def generate_regions(fasta_index_file, size, chunks=False, chromosomes=None, bed_files=None):
Expand All @@ -17,7 +18,7 @@ def generate_regions(fasta_index_file, size, chunks=False, chromosomes=None, bed
continue
region_start = 0
if chunks is True:
region_size = round(chrom_length / size) # have to make sure this works
region_size = ceil(chrom_length / size) # have to make sure this works
else:
region_size = size
while region_start < chrom_length:
Expand All @@ -27,7 +28,7 @@ def generate_regions(fasta_index_file, size, chunks=False, chromosomes=None, bed
start = str(region_start)
end = str(region_end)
if bed_files is not None:
region = str(round(region_end / region_size))
region = str(ceil(region_end / region_size))
file_path = f"{bed_files}.{chrom_name}.region.{region}.bed"
with open(file_path, "w") as f:
f.write("\t".join([chrom_name, start, end]))
Expand Down

0 comments on commit c926cc4

Please sign in to comment.