A collection of Python scripts (not only) for processing and analyzing bisulfite sequencing (Bismark) data.
BioScripts provides tools for annotating DNA methylation calls with sequence context and merging forward/reverse strand methylation data from Bismark coverage files.
Annotates Bismark coverage files with sequence context information (±1 base around methylated sites).
Usage:
python get_CpG_from_genome/get_CpG_from_genome.py -f <fasta_file> -b <bed_file> -o <output_bed> [-c <chunk_size>] [-m]Arguments:
-f, --fasta_file: Path to the input FASTA file (required)-b, --bed_file: Path to the input BED/coverage file (required)-o, --output_bed: Path to the output BED file (required)-c, --chunk_size: Chunk size for reading FASTA file-m, --methylkit: Input is methylKit format
Output: Generates an annotated BED file with additional columns:
REF_-1+1: Sequence context (base -1 and +1 relative to the CpG site)
Merges bookended forward and reverse strand methylation calls from the output of get_CpG_from_genome.
Usage:
python merge_reverse_strand_calls/merge_reverse_strand_calls.py -b <bed_file> -o <output_bed>Arguments:
-b, --bed_file: Path to the input BED file (required)-o, --output_bed: Path to the output BED file (required)
Output: BED file with merged methylation calls containing:
seqname: Chromosome/sequence namestart: Start positionend: End positionperc_mCpG: Percentage of methylated cytosinesnumCs: Number of methylated cytosinesnumTs: Number of unmethylated cytosines
Splits a large FASTA file into individual sequence files based on regex pattern matching.
Usage:
python split_rename_fasta/split_rename_fasta.py -i <input_file> -p <pattern> -o <output_dir>Arguments:
-i, --input_file: Path to the input FASTA file-p, --pattern: Regex pattern to match sequence names-o, --output_dir: Directory for output files
Generates a nextflow samplesheet from FASTQ files in a directory.
Usage:
python nf-samplesheet/get_samplesheet.py <sample1> [<sample2> ...] -d <fastq_dir> [--single]Arguments:
samples: One or more sample names (positional)-d, --dir: Directory with FASTQ files--single: Single-end reads flag
Output: CSV samplesheet with columns:
sample: Sample namesingle: Single-end flag (true/false)fastq_1: Path to R1 FASTQ filefastq_2: Path to R2 FASTQ file (if paired-end)genome: Genome reference (empty by default)
This project uses conda for environment management:
conda env create -f environment.yamlOr use the Apptainer container:
apptainer run bioscripts.sif <script> [arguments]- Python 3.x
- pandas
- numpy
- BioPython
MIT License - See LICENSE for details.
Copyright (c) 2025 Fritjof Lammers