This repository contains a nextflow workflow for the multiplexed analysis of Cas9-enrichment sequencingß. ## Introduction This workflow generated a report summarising the results of Cas9 enrichment sequencing.
Users provide a reference genome, fastq ONT reads, and a bed file containing enrichment regions. The reads are first mapped the reference genome using minimap2, and various plots and tables are generated summarizing the enrichment results.
The workflow uses nextflow to manage compute and software resources, as such nextflow will need to be installed before attempting to run the workflow.
It is not required to clone or download the git repository in order to run the workflow. For more information on running EPI2ME Labs workflows visit out website.
To obtain the workflow, having installed
nextflow, users can run:
nextflow run epi2me-labs/wf-cas9 --help
to see the options for the workflow.
- folder of fastq reads.
- genome reference file.
- target bed file with 4 columns:
To test on a small dataset with two targets and two chromosomes:
cd wf-cas9 nextflow run . --fastq test_data/fastq/ --ref_genome \ test_data/grch38/grch38_chr19_22.fa.gz --targets test_data/targets.bed \ -profile conda -resume
To evaluate on a larger dataset, use the evaluation script:
evaluation run_evaluation.sh <out_dir> [optional_nexflow config]
The primary outputs of the workflow include:
- a per-sample on-target reads fastq file.
- a per-sample simple text file providing a summary of sequencing reads.
- a combined HTML report document detailing the primary findings of the workflow across all samples.
By default, the report contains sequencing quality plots and two tables that summarize targeted sequencing results:
- on/off-target reads per sample.
- summaries of each sample/target pair.
--full_report, the report will also contain the following elements that may be useful for
diagnosing issues with the experiment. These are turned off by default as they can lead to slow loading of the
- plots of stranded coverage at each target.
- histograms of on and off-target coverage for each sample.
- off-target hotspot region tables.