lossy_compression_evaluation/scripts at master · shubhamchandak94/lossy_compression_evaluation

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
analysis_assembly.sh		analysis_assembly.sh
analysis_basecall_accuracy.sh		analysis_basecall_accuracy.sh
assembly_bonito.sh		assembly_bonito.sh
assembly_guppy.sh		assembly_guppy.sh
basecall_bonito.sh		basecall_bonito.sh
basecall_guppy.sh		basecall_guppy.sh
chop_up_assembly.py		chop_up_assembly.py
evaluate_methylation_calls.py		evaluate_methylation_calls.py
fix_fastq_bonito.py		fix_fastq_bonito.py
generate_lossy_fast5.py		generate_lossy_fast5.py
medians.py		medians.py
megalodon_modcall.sh		megalodon_modcall.sh
read_length_identity.py		read_length_identity.py
run_exp_assembly_analysis.sh		run_exp_assembly_analysis.sh
run_exp_assembly_bonito.sh		run_exp_assembly_bonito.sh
run_exp_assembly_guppy.sh		run_exp_assembly_guppy.sh
run_exp_basecall_analysis_bonito.sh		run_exp_basecall_analysis_bonito.sh
run_exp_basecall_analysis_guppy.sh		run_exp_basecall_analysis_guppy.sh
run_exp_generate_fast5.sh		run_exp_generate_fast5.sh
run_exp_megalodon_modcall.sh		run_exp_megalodon_modcall.sh
subsample_fastqs.sh		subsample_fastqs.sh

README.md

Brief description of scripts:

We describe below the pipeline used for the analysis as well as brief description of the various scripts. For more details, see the scripts themselves, which contain the usage. Note that paths to various external tools are hardcoded and these tools need to be installed and the paths need to be updated before running the analysis (see ../TOOLS.md for details). In most cases, the scripts with name starting with run_exp_ were actually run to perform the experiments for all the different compression parameters, the remaining scripts are called within these scripts.

Generation of lossily compressed raw signal and computing compression sizes

generate_lossy_fast5.py: Generate new fast5 files with the raw signal replaced by a lossily compressed version (called in run_exp_generate_fast5.sh). This is also used to perform VBZ lossless compression and compute the compressed sizes.

Basecalling and accuracy evaluation

basecall_guppy.sh: Perform basecalling with guppy (hac/fast) (called in run_exp_basecall_analysis_guppy.sh).
basecall_bonito.sh: Perform basecalling with bonito (called in run_exp_basecall_analysis_bonito.sh).
fix_fastq_bonito.py: used to fix the fastq files generated by seqtk when using bonito. Bonito produces a fasta file and we use seqtk to convert to fastq, but sometimes the basecalling produces an empty sequence and seqtk produces an invalid fastq file (called in basecall_bonito.sh).
analysis_basecall_accuracy.sh: Perform basecalling accuracy analysis (called in run_exp_basecall_analysis*.sh).
read_length_identity.py: Perform basecalling analysis (called in analysis_basecall_accuracy.sh).

Assembly/Consensus and accuracy evaluation

assembly_guppy.sh: Perform assembly with Flye+Rebaler+Medaka for guppy basecalled reads (called in run_exp_assembly_guppy.sh).
assembly_bonito.sh: Perform assembly with Flye+Rebaler+Medaka for bonito basecalled reads (called in run_exp_assembly_bonito.sh).
analysis_assembly.sh: Perform assembly/consensus accuracy analysis (called in run_exp_assembly_analysis.sh).
chop_up_assembly.py, medians.py and read_length_identity.py: Called in analysis_assembly.sh to actually do the evaluation. analysis_assembly.sh also relies on fastmer.py which is installed in ../TOOLS.md.
subsample_fastqs.sh: subsample fastq file.

Methylation calling and accuracy evaluation

megalodon_modcall.sh: Perform methylation calling (used by run_exp_megalodon_modcall.sh).
evaluate_methylation_calls.py: Evaluate methylation calls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

README.md

Brief description of scripts:

Generation of lossily compressed raw signal and computing compression sizes

Basecalling and accuracy evaluation

Assembly/Consensus and accuracy evaluation

Methylation calling and accuracy evaluation

Files

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

README.md

Brief description of scripts:

Generation of lossily compressed raw signal and computing compression sizes

Basecalling and accuracy evaluation

Assembly/Consensus and accuracy evaluation

Methylation calling and accuracy evaluation