# Every Variant Sequencing with Oxford Nanopore Technologies

This script is being used after sequencing. The raw pod5 files can be basecalled or the already basecalled files can be used directly (fastq.gz)

## Workflow

### 1. Basecalling (Optional)

- The raw reads are stored in the main folder of ONT (e.g /var/lib/minknow/data). Enter the experiment name as input. 
- Sequences are basecalled based on the model of choice. If enough computational power is available, we recommend "sup" method

### 2. Demultiplexing 
- Each reead is assigned to a well/plate combination. 

### 3. Variant Calling
- Minimap2 for creating Multiple Sequence Alignment (MSA)
- Base Frequency Caller is being used for variant calling



### Packages 

In [1]:
# Import all packages

import sys
sys.path.append("/home/emre/github_repo/MinION")

from minION.util import IO_processor
from minION.basecaller import Basecaller

from minION.variantcaller import *

from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import subprocess

### Meta Data 

- Provide the following arguments:

- Result Path: Path where the minion result folder will be created. All experiment results are then stored within the folder
- Experiment Name: The experiment name is assigned when running the sequencer. Use the same name for identification


In [2]:
result_path = Path("/home/emre/")
experiment_name = "Masked_3RBC-Minion"
basecall_model_type = "sup"
result_folder = IO_processor.create_folder( experiment_name,
                                            basecall_model_type, 
                                            target_path=result_path)



file_to_experiment= f"/var/lib/minknow/data/{experiment_name}"
barcode_file = "/home/emre/github_repo/MinION/data/refseq/hetcpiii_padded.fasta"

# Basecalling
basecall_folder = result_folder / "basecalled"
basecall_folder.mkdir(parents=True, exist_ok=True)
experiment_folder = IO_processor.find_experiment_folder(experiment_name) # Folder where pod5 files are located

# Demultiplexing
experiment_name = experiment_name + "_" + basecall_model_type
result_folder_path = IO_processor.find_folder(result_path, experiment_name)


In [3]:
# Add conditions to avoid running the script accidentally
skip_basecalling = True
skip_demultiplex = False
skip_variant_calling = False

### Step 1 (Optional): Basecall reads

- Basecall can usually be done while sequencing (if GPU available?)
- Otherwise, basecall afterwards

In [3]:
if not skip_basecalling:
    pod5_files = IO_processor.find_folder(experiment_folder, "pod5")
    bc = Basecaller(model=basecall_model_type, pod5_files, basecall_folder, fastq = True)
    bc.run_basecaller()


In [4]:
# Find fastq files
file_to_fastq = IO_processor.find_folder(experiment_folder, "fastq_pass")
print(file_to_fastq)

/var/lib/minknow/data/20240112-RL-8Plates-FLongle-2/no_sample/20240112_1646_MN45017_flg114_9ac1102b/fastq_pass


### Step 2: Demultiplex with SW
- Demultiplex with SW 

In [5]:
if not skip_demultiplex:
    path_to_code = "/home/emre/github_repo/MinION/source/source/demultiplex"
    prompt = f"{path_to_code} -f {file_to_fastq} -d {result_folder} -b {barcode_file} -w {100} -r {100}"
    subprocess.run(prompt, shell=True)

Processed argument: -f with value: /var/lib/minknow/data/20240112-RL-8Plates-FLongle-2/no_sample/20240112_1646_MN45017_flg114_9ac1102b/fastq_pass
Processed argument: -d with value: /home/emre/minION_results/20240112-RL-8Plates-FLongle-2_sup
Processed argument: -b with value: /home/emre/github_repo/MinION/minION/barcoding/minion_barcodes_pga9.fasta
Processed argument: -w with value: 100
Processed argument: -r with value: 100
Number of files: 305
Processing files: [##################################################] 100%


In [None]:
demultiplex_folder = result_folder 
print(demultiplex_folder)

### Step 3: Call Variant with PileUP Analysis

- Call Variant with min freq of 0.4 & depth min 15

Read Summary file (Optional):


In [5]:
template_fasta = "/home/emre/github_repo/MinION/data/refseq/hetcpiii_padded.fasta"
experiment_folder = Path("/home/emre/minION_results/MinION_RBC_0902723_sup")
demultiplex_folder_name = "Demultiplex_cpp_70_200k_reads"

In [6]:
if not skip_variant_calling:
    vc = VariantCaller(experiment_folder, 
                   template_fasta, 
                   demultiplex_folder_name=demultiplex_folder_name, 
                   padding_start=50, 
                   padding_end=50)
    
    variant_df = vc.get_variant_df(qualities=True, 
                                threshold=0.2,
                                min_depth=5)


19it [00:18,  1.03s/it]

unsupported operand type(s) for /: 'float' and 'str'


57it [00:48,  1.05it/s]

unsupported operand type(s) for /: 'float' and 'str'


72it [00:56,  2.34it/s]

Too many positions: 30, Skipping...


84it [01:05,  1.14it/s]

unsupported operand type(s) for /: 'float' and 'str'


96it [01:12,  1.59it/s]

unsupported operand type(s) for /: 'float' and 'str'


111it [01:21,  1.58it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


136it [01:39,  1.35it/s]

unsupported operand type(s) for /: 'float' and 'str'


138it [01:40,  1.55it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


158it [02:00,  1.18s/it]

unsupported operand type(s) for /: 'float' and 'str'


175it [02:10,  1.59it/s]

unsupported operand type(s) for /: 'float' and 'str'


288it [03:40,  1.31it/s]


In [9]:
variant_df.tail(20)

Unnamed: 0,Plate,Well,Path,Alignment_count,Variant,Probability
268,3,G5,/home/emre/minION_results/MinION_RBC_0902723_s...,589,#PARENT#,0.006631
269,3,G6,/home/emre/minION_results/MinION_RBC_0902723_s...,444,T148C,0.957951
270,3,G7,/home/emre/minION_results/MinION_RBC_0902723_s...,267,A105G_A176G,0.968139
271,3,G8,/home/emre/minION_results/MinION_RBC_0902723_s...,27,#PARENT#,0.808468
272,3,G9,/home/emre/minION_results/MinION_RBC_0902723_s...,264,T171C,0.188013
273,3,G10,/home/emre/minION_results/MinION_RBC_0902723_s...,306,#PARENT#,0.955548
274,3,G11,/home/emre/minION_results/MinION_RBC_0902723_s...,526,#PARENT#,0.927979
275,3,G12,/home/emre/minION_results/MinION_RBC_0902723_s...,197,#PARENT#,0.952706
276,3,H1,/home/emre/minION_results/MinION_RBC_0902723_s...,2,,
277,3,H2,/home/emre/minION_results/MinION_RBC_0902723_s...,173,T89C_G110T,0.888325
