# Every Variant Sequencing with Oxford Nanopore Technologies

This script is being used after sequencing. The raw pod5 files can be basecalled or the already basecalled files can be used directly (fastq.gz)

## Workflow

### 1. Basecalling (Optional)

- The raw reads are stored in the main folder of ONT (e.g /var/lib/minknow/data). Enter the experiment name as input. 
- Sequences are basecalled based on the model of choice. If enough computational power is available, we recommend "sup" method

### 2. Demultiplexing 
- Each reead is assigned to a well/plate combination. 

### 3. Variant Calling
- Minimap2 for creating Multiple Sequence Alignment (MSA)
- Base Frequency Caller is being used for variant calling



### Packages 

In [1]:
# Import all packages

import sys
sys.path.append("/home/emre/github_repo/MinION")

from minION.util import IO_processor
from minION.basecaller import Basecaller

from minION.variantcaller import *

from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import subprocess
import importlib
importlib.reload(IO_processor)

<module 'minION.util.IO_processor' from '/home/emre/github_repo/MinION/minION/util/IO_processor.py'>

### Meta Data 

- Provide the following arguments:

- Result Path: Path where the minion result folder will be created. All experiment results are then stored within the folder
- Experiment Name: The experiment name is assigned when running the sequencer. Use the same name for identification


In [6]:
result_path = Path("/home/emre/")
experiment_name = "20240126-RL-8-63"
basecall_model_type = "sup"
result_folder = IO_processor.create_folder( experiment_name,
                                            basecall_model_type, 
                                            target_path=result_path)




# Create Barcode fasta file 
barcode_path = "../minION/barcoding/minion_barcodes.fasta" # Path to standard barcode file
front_prefix = "NB"
back_prefix = "RB"
bp = IO_processor.BarcodeProcessor(barcode_path, front_prefix, back_prefix)
barcode_path = result_folder / "minion_barcodes_filtered.fasta"

# Barcode indexes
front_min = 1
front_max = 96
back_min = 9
back_max = 12

bp.filter_barcodes(barcode_path, (front_min,front_max), (back_min,back_max))


file_to_experiment= f"/var/lib/minknow/data/{experiment_name}"
template_fasta = "/home/emre/PgA9.fasta"

# Basecalling
basecall_folder = result_folder / "basecalled"
basecall_folder.mkdir(parents=True, exist_ok=True)
experiment_folder = IO_processor.find_experiment_folder(experiment_name) # Folder where pod5 files are located

# Demultiplexing
experiment_name = experiment_name + "_" + basecall_model_type
result_folder_path = IO_processor.find_folder(result_path, experiment_name)


In [4]:
# Add conditions to avoid running the script accidentally
skip_basecalling = True
skip_demultiplex = False
skip_variant_calling = False

In [21]:
result_folder

PosixPath('/home/emre/minION_results/20240126-RL-8-63_sup')

### Step 1 (Optional): Basecall reads

- Basecall can usually be done while sequencing (if GPU available?)
- Otherwise, basecall afterwards

In [3]:
if not skip_basecalling:
    pod5_files = IO_processor.find_folder(experiment_folder, "pod5")
    bc = Basecaller(model=basecall_model_type, pod5_files, basecall_folder, fastq = True)
    bc.run_basecaller()


In [7]:
# Find fastq files
file_to_fastq = IO_processor.find_folder(experiment_folder, "fastq_pass")
print(file_to_fastq)

/var/lib/minknow/data/20240126-RL-8-63/no_sample/20240126_1724_MN45017_flg114_bf69f73a/fastq_pass


### Step 2: Demultiplex with SW
- Demultiplex with SW 

In [8]:
if not skip_demultiplex:
    path_to_code = "/home/emre/github_repo/MinION/source/source/demultiplex"
    prompt = f"{path_to_code} -f {file_to_fastq} -d {result_folder} -b {barcode_path} -w {100} -r {100}"
    subprocess.run(prompt, shell=True)

Processed argument: -f with value: /var/lib/minknow/data/20240126-RL-8-63/no_sample/20240126_1724_MN45017_flg114_bf69f73a/fastq_pass
Processed argument: -d with value: /home/emre/minION_results/20240126-RL-8-63_sup
Processed argument: -b with value: /home/emre/minION_results/20240126-RL-8-63_sup/minion_barcodes_filtered.fasta
Processed argument: -w with value: 100
Processed argument: -r with value: 100
Number of files: 1
Processing files: [##################################################] 100%


In [None]:
demultiplex_folder = result_folder 
print(demultiplex_folder)

### Step 3: Call Variant with PileUP Analysis

- Call Variant with min freq of 0.4 & depth min 15

Read Summary file (Optional):


In [7]:

demultiplex_folder_name = result_folder

In [11]:
if not skip_variant_calling:
    vc = VariantCaller(experiment_folder, 
                   template_fasta, 
                   demultiplex_folder_name=demultiplex_folder_name, 
                   padding_start=0, 
                   padding_end=0)
    
    variant_df = vc.get_variant_df(qualities=True, 
                                threshold=0.2,
                                min_depth=5)
    seq_gen = IO_processor.SequenceGenerator(variant_df, template_fasta)
    variant_df = seq_gen.get_sequences()
    #TODO: Save the variant_df to a file after running. Currently it is not saved.

0it [00:00, ?it/s]

unsupported operand type(s) for /: 'float' and 'str'


64it [00:32,  2.22it/s]

unsupported operand type(s) for /: 'float' and 'str'


95it [00:52,  1.25it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


108it [00:53,  4.82it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


112it [00:54,  5.29it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


116it [00:55,  4.79it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


125it [01:00,  2.12it/s]

unsupported operand type(s) for /: 'float' and 'str'


132it [01:11,  1.50s/it]

unsupported operand type(s) for /: 'float' and 'str'


142it [01:31,  2.18s/it]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


146it [01:36,  1.61s/it]

unsupported operand type(s) for /: 'float' and 'str'


155it [01:46,  1.25s/it]

unsupported operand type(s) for /: 'float' and 'str'


168it [02:05,  1.64s/it]

unsupported operand type(s) for /: 'float' and 'str'


170it [02:06,  1.19s/it]

unsupported operand type(s) for /: 'float' and 'str'


173it [02:11,  1.37s/it]

unsupported operand type(s) for /: 'float' and 'str'


191it [02:44,  1.97s/it]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and

351it [02:46, 27.30it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


356it [02:49, 15.73it/s]

unsupported operand type(s) for /: 'float' and 'str'


359it [02:52,  9.99it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


361it [02:55,  7.20it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


367it [03:00,  4.30it/s]

unsupported operand type(s) for /: 'float' and 'str'
unsupported operand type(s) for /: 'float' and 'str'


370it [03:02,  3.37it/s]

Too many positions: 70, Skipping...
unsupported operand type(s) for /: 'float' and 'str'


384it [03:46,  1.69it/s]

Too many positions: 23, Skipping...





In [19]:
#20 - 30
variant_df.to_csv(result_folder / "variant_df.csv", index=False)  

In [20]:
variant_df

Unnamed: 0,Plate,Well,Path,Alignment_count,Variant,Probability,Sequence
0,9,A1,,0,,,
1,9,A2,/home/emre/minION_results/20240126-RL-8-63_sup...,126,G175A,0.971026,ACCCATCACGGACCTTGAGTTTGACCTTCTGAAGAAGACTGTCATG...
2,9,A3,/home/emre/minION_results/20240126-RL-8-63_sup...,147,G175A,0.991372,ACCCATCACGGACCTTGAGTTTGACCTTCTGAAGAAGACTGTCATG...
3,9,A4,/home/emre/minION_results/20240126-RL-8-63_sup...,105,G175A_G178C_T565C,0.140141,ACCCATCACGGACCTTGAGTTTGACCTTCTGAAGAAGACTGTCATG...
4,9,A5,/home/emre/minION_results/20240126-RL-8-63_sup...,128,G175A_T587C,0.946687,ACCCATCACGGACCTTGAGTTTGACCTTCTGAAGAAGACTGTCATG...
...,...,...,...,...,...,...,...
379,12,H8,/home/emre/minION_results/20240126-RL-8-63_sup...,1562,G175A,0.979999,ACCCATCACGGACCTTGAGTTTGACCTTCTGAAGAAGACTGTCATG...
380,12,H9,/home/emre/minION_results/20240126-RL-8-63_sup...,3033,G175A,0.972652,ACCCATCACGGACCTTGAGTTTGACCTTCTGAAGAAGACTGTCATG...
381,12,H10,/home/emre/minION_results/20240126-RL-8-63_sup...,1575,G175A,0.979884,ACCCATCACGGACCTTGAGTTTGACCTTCTGAAGAAGACTGTCATG...
382,12,H11,/home/emre/minION_results/20240126-RL-8-63_sup...,1711,G175A,0.985583,ACCCATCACGGACCTTGAGTTTGACCTTCTGAAGAAGACTGTCATG...
