Skip to content

siriusb-nox/ONT-workshop-Oct-2023

Repository files navigation

Workshop "Principles on the analysis of Oxford Nanopore Sequencing datasets" - 9-16 October 2023

Organizers: Liam Trethowan (Royal Botanic Gardens, Kew), Himmah Rustiami (Herbarium Bogoriense-BRIN), Oscar A. Pérez Escobar, Sidonie Bellot.

1. Introduction

This repository contains a tutorial guide to the analysis of raw data derived from Oxford Nanopore Technologies (ONT) and the initial steps for conducting a genome assembly. Additionally, it includes a real example of ONT data application/usability e.g., conducting sequence searches on a predetermined database using ncbi blast to determine the identiy of an organism sequenced using ONT.

This tutorial is intended for users with a basic knowledge in programming and is designed to run in UNIX environments. The participant should ideally have experience using shell, and text file manipulation (e.g., using awk, sed, grep, among others). The workshop will be run on pre-configured laptops (Ubuntu 22.04). A basic introduction to the UNIX enviroment with some useful commands is available here.

This tutorial requires the following programs/dependencies (it is highly recommended to have these installed before starting the tutorial). Please make sure that the dependencies on which these programs run are also available:

  1. NCBI blast: This program builds blast databases, which are required for searches of DNA/AA sequences in blast databases.
  2. NCBI magicblast: This program conducts DNA/AA sequence searches derived from illumina/Nanopore sequencing experioments (in fasta or fastq format) against blast databases. IMPORTANT: please create a free NCBI account to then freely access an NCBI API KEY here. This is needed to perform remote (online) sequence queries.
  3. CANU: this program allows the correction and filtering of ONT/PacBio sequences.
  4. SMARTdenovo: this program assembles de-novo corrected and trimmed ONT/PacBio sequences.
  5. minimap2: this programs conducts pairwaise genome alignments, or short-short, short-long read mapping. This is required for genome polishing.
  6. racon: this program corrects long reads or scaffolds from short read mapped reads against said scaffolds/genome. This is required for genome polishing.
  7. NanoPlot: An online executable version is available here; this program produces plots with information associated with sequencing experiments conducted on ONT technologies.
  8. Guppy: This program (now legacy) calls bases from FAST5 files generated by ONT. It is only available for ONT users (this part of the tutorial, although explained, will not be executed).
  9. dorado: This program (now the official ONT basecaller) calls bases from POD5 files generated by ONT. It is Open Access and can perform a wide range of functions.

2. Workshope structure

This tutorial is divided into four main steps:

A. Base Calling

B. Data quality analysis

C. Data trimming, correction and genome assembly

D. Genome search and/or annotation operations

Figure 1 Figure 1: Simplified view of tutorial/pipeline

Important

The base data needed to run this tutorial is available in the different subfolders of this repo (e.g., NGS and NanoPlot). Some files need to be downloaded from a google drive folder. The link to such files is provided in the README.md files of each subdfolder

2.1. Pipeline configuration

In any bioinformatics pipeline, it is essential to relate which programs the pipeline depends on and to know where the input files, etc. are located. To run this tutorial, you must copy this repository to a directory of your choice (ideally /home/ontasia*/Documents). To do this, please execute:

git clone https://github.com/siriusb-nox/ONT-workshop-Oct-2023.git

For users with programs installed in a UNIX environment on personal computers, these can be entered in the current session (terminal) using the following command, for example:

PATH=$PATH:/directory/of/the/folder/programx

For this particular workshop, users with Dell Laptops should run the following lines to add the dependencies to ENV:

# Canu
PATH=$PATH:/home/ontasia*/Softwares/canu/canu-1.9/Linux-amd64/bin/
# Racon 
PATH=$PATH:/home/ontasia*/softwares/genomics/racon/build/bin
# Minimap2
PATH=$PATH:/home/ontasia*/softwares/genomics/minimap2-2.17_x64-linux/
# samtools
PATH=$PATH:/home/ontasia*/softwares/genomics/samtools-1.10
# magicblast
PATH=$PATH:/home/ontasia*/softwares/genomics/ncbi-magicblast-1.5.0/bin/
# ncbi blast
PATH=$PATH:/home/ontasia*/softwares/genomics/ncbi-blast-2.10.0+/bin/
# SMARTdenovo
PATH=$PATH:/home/ontasia*/softwares/genomics/
export PATH

About

Tutorial on the basics of ONT data analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages