Skip to content

BCsummarizer

masikol edited this page May 26, 2023 · 3 revisions

Description:

BCsummarizer.py (makes a summary of basecalling) -- this script is designed for generating a brief summary of basecalling. It determines, in which FASTQ files are reads from FAST5 files placed.

This script can be useful, because basecallers (popular Guppy, in particular) often missasign names of input FAST5 and output FASTQ files. In result, source FAST5 and basecalled FASTQ files contain different reads although their names match one another.

Pre-requirements: h5py Python package is necessary for working with FAST5 files. See Pre-requirements section above for installation details.

Options:

    -h (--help) --- show help message;

    -v (--version) --- show version;

    -5 (--fast5-dir) --- directory that contains FAST5 files
        meant to be processed. It may contain not only FAST5 files;

    -q (--fastq-dir) --- directory that contains FASTQ files
        meant to be processed. It may contain not only FASTQ files.
        FASTQ files can be gzipped;

    -o (--outfile) --- output summary file;

Examples:

  1. FAST5 files are in directory F5_dir. Basecalled FASTQ files are in directory FQ_dir:

./BCsummarizer.py -5 F5_dir -q FQ_dir

  1. FAST5 and basecalled FASTQ files are in the working directory. Write results in the file /tmp/seq_summ.txt:

./BCsummarizer -5 ./ -q ./ -o /tmp/seq_summ.txt