Skip to content

Latest commit

 

History

History

scripts

Brief description of scripts:

We describe below the pipeline used for the analysis as well as brief description of the various scripts. For more details, see the scripts themselves, which contain the usage. Note that paths to various external tools are hardcoded and these tools need to be installed and the paths need to be updated before running the analysis (see ../TOOLS.md for details). In most cases, the scripts with name starting with run_exp_ were actually run to perform the experiments for all the different compression parameters, the remaining scripts are called within these scripts.

Generation of lossily compressed raw signal and computing compression sizes

  • generate_lossy_fast5.py: Generate new fast5 files with the raw signal replaced by a lossily compressed version (called in run_exp_generate_fast5.sh). This is also used to perform VBZ lossless compression and compute the compressed sizes.

Basecalling and accuracy evaluation

  • basecall_guppy.sh: Perform basecalling with guppy (hac/fast) (called in run_exp_basecall_analysis_guppy.sh).
  • basecall_bonito.sh: Perform basecalling with bonito (called in run_exp_basecall_analysis_bonito.sh).
  • fix_fastq_bonito.py: used to fix the fastq files generated by seqtk when using bonito. Bonito produces a fasta file and we use seqtk to convert to fastq, but sometimes the basecalling produces an empty sequence and seqtk produces an invalid fastq file (called in basecall_bonito.sh).
  • analysis_basecall_accuracy.sh: Perform basecalling accuracy analysis (called in run_exp_basecall_analysis*.sh).
  • read_length_identity.py: Perform basecalling analysis (called in analysis_basecall_accuracy.sh).

Assembly/Consensus and accuracy evaluation

  • assembly_guppy.sh: Perform assembly with Flye+Rebaler+Medaka for guppy basecalled reads (called in run_exp_assembly_guppy.sh).
  • assembly_bonito.sh: Perform assembly with Flye+Rebaler+Medaka for bonito basecalled reads (called in run_exp_assembly_bonito.sh).
  • analysis_assembly.sh: Perform assembly/consensus accuracy analysis (called in run_exp_assembly_analysis.sh).
  • chop_up_assembly.py, medians.py and read_length_identity.py: Called in analysis_assembly.sh to actually do the evaluation. analysis_assembly.sh also relies on fastmer.py which is installed in ../TOOLS.md.
  • subsample_fastqs.sh: subsample fastq file.

Methylation calling and accuracy evaluation

  • megalodon_modcall.sh: Perform methylation calling (used by run_exp_megalodon_modcall.sh).
  • evaluate_methylation_calls.py: Evaluate methylation calls.