<a href="https://colab.research.google.com/github/ellinium/nanopore_basecaller/blob/main/Nanopore_guppy_book.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Nanopore guppy basecaller
Processes fast5 files to fastq.
The supported version is 4.0.15.

# **Instructions**
1. Upload ont-guppy_4.0.15_linux64.tar.gz to your Google Drive (download from https://drive.google.com/file/d/1iGm9l6tX2KExjL1UOAbgeVG9sLzCAtpo/view?usp=sharing) - do it only once, no need to repeat each run.

2. Upload your fast5 files on a Google Drive

2. Check “Runtime/Change Runtime Mode” from the main menu. “Hardware accelerator“ should be set as ‘GPU’

3.  Specify input parameters:

```
    fast5_drive_directory - a folder on Google Drive with your fast5 files
    guppy_folder - a Google Drive folder with the ont-guppy_4.0.15_linux64.tar.gz archive
    flowcell - Nanopore flow cell name
    kit - Nanopore kit name
    gpu_runners_per_device - a number of neural network runners to create per CUDA device (20 works ok with Google colab)
    cpu_threads_per_caller - a number of CPU worker threads per basecaller (2 works ok with Google Colab)
    min_qscore - sets a minimum qscore threshold for the reads to pass (the maximum is 10)
    output_fastq_filename - a name of a final fastq file obtained by concatenating all fastq files passed min_qscore.
```

4. Run the code from the main menu: Runtime/Run all.
While connecting, permit this notebook to access your Google Drive files (a separate window will be launched by Google).


In [None]:
#@title Specify a folder with fast5 files and parameters for guppy basecaller

from google.colab import drive

fast5_drive_directory = "Copy of UQ phage genomes/SH9454 nanopore reads" #@param {type:"string"}
guppy_folder = "Nanopore_Sequencing" #@param {type:"string"}
flowcell = "FLO-MIN106" #@param {type:"string"}
kit = "SQK-RBK004" #@param {type:"string"}
gpu_runners_per_device = "20" #@param {type:"string"}
cpu_threads_per_caller = "2" #@param {type:"string"}
min_qscore = "7" #@param {type:"string"}
output_fastq_filename = "FASTQ_pass_barcode01.fastq" #@param {type:"string"}

google_drive_path = '/content/gdrive'
drive.mount(google_drive_path)
google_drive_path = google_drive_path + '/MyDrive/'

google_drive_fast5_path = google_drive_path + "'/" + fast5_drive_directory + "'"
guppy_tar_folder = google_drive_path  + '/' + guppy_folder 
guppy_full_path = guppy_tar_folder + '/ont-guppy-gpu/ont-guppy'

#unzip guppy archive and give permissions to execute 
!mkdir -p $guppy_full_path
!tar -xf $guppy_tar_folder/ont-guppy_4.0.15_linux64.tar.gz -C $guppy_full_path
!chmod +x $guppy_full_path/bin/guppy_basecaller



Mounted at /content/gdrive


In [None]:
#@title Install prerequisites
!apt-get -qq install -y libidn11




Selecting previously unselected package libidn11:amd64.
(Reading database ... 129504 files and directories currently installed.)
Preparing to unpack .../libidn11_1.33-2.2ubuntu2_amd64.deb ...
Unpacking libidn11:amd64 (1.33-2.2ubuntu2) ...
Setting up libidn11:amd64 (1.33-2.2ubuntu2) ...
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...


In [None]:
#@title Launch basecaller
#execute basecaller
!$guppy_full_path/bin/guppy_basecaller -i $google_drive_fast5_path -s $google_drive_fast5_path --flowcell $flowcell --kit $kit --gpu_runners_per_device $gpu_runners_per_device --cpu_threads_per_caller $cpu_threads_per_caller --qscore_filtering --min_qscore=$min_qscore -q 0 --device cuda:0
#!$guppy_full_path/bin/guppy_basecaller -i $google_drive_fast5_path -s $google_drive_fast5_path --flowcell FLO-MIN106 --kit SQK-RBK004 --gpu_runners_per_device 20 --cpu_threads_per_caller 2 --qscore_filtering --min_qscore=7 -q 0 --device cuda:0

#concat all valid fastq files
! cat $google_drive_fast5_path/pass/*.fastq >> $google_drive_fast5_path/$output_fastq_filename

#execute guppy aligner
#! /content/gdrive/MyDrive/Nanopore_Sequencing/ont-guppy-gpu/ont-guppy/bin/guppy_aligner -i /content/gdrive/MyDrive/Nanopore_Sequencing/output/barcode12/pass/ -s /content/gdrive/MyDrive/Nanopore_Sequencing/output/barcode12/SAM --align_ref /content/gdrive/MyDrive/Nanopore_Sequencing/174WT_genome.fasta
#! cat /content/gdrive/MyDrive/Nanopore_Sequencing/output/barcode12/SAM/*.sam >> /content/gdrive/MyDrive/Nanopore_Sequencing/output/barcode12/barcode12_basecall.sam

ONT Guppy basecalling software version 4.0.15+5694074, client-server API version 2.1.0
config file:        /content/gdrive/MyDrive/Nanopore_Sequencing/ont-guppy-gpu/ont-guppy/data/dna_r9.4.1_450bps_hac.cfg
model file:         /content/gdrive/MyDrive/Nanopore_Sequencing/ont-guppy-gpu/ont-guppy/data/template_r9.4.1_450bps_hac.jsn
input path:         /content/gdrive/MyDrive//Copy of UQ phage genomes/SH9454 nanopore reads
save path:          /content/gdrive/MyDrive//Copy of UQ phage genomes/SH9454 nanopore reads
chunk size:         2000
chunks per runner:  512
minimum qscore:     7
records per file:   0
num basecallers:    4
gpu device:         cuda:0
kernel path:        
runners per device: 20

Found 6 fast5 files to process.
Init time: 10243 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 150518 ms, Samples called: 421509799, samples/s: 2.80039e+06
Finishing up 

In [None]:
#!nvidia-smi