## 1. Load Required Libraries

## Ensure the required dependencies are installed

1. You need to have samtools installed
2. You also need minimap2 installed


## Download the test data or use your own

We recomend trying this by first using the data that we make available via zenodo (https://zenodo.org/records/13694463). You want to download the fastq.zip file, along with the reference file for that. Put that in a data folder so that the contents are:

```
20240421-YL-ParLQ-ep1.csv
20240422-YL-ParLQ-ep1.fastq
```


In [1]:
! samtools  # This will tell you whether you have samtools installed 

In [2]:
! minimap2

In [1]:
# Load necessary libraries
import os
import sys
import matplotlib.pyplot as plt
from pathlib import Path
import numpy as np
import pandas as pd
from importlib import resources
import subprocess
from Bio import SeqIO
import tqdm
import re
import gzip
import shutil

# Add the path to the levseq directory to the system path
sys.path.append('../levseq')

# Import custom functions from the provided script
from run_levseq import *
result_folder = os.getcwd()

## 2. Define Run Location
We'll specify the location of the sequencing run data. This is also within the same directory structure.


In [2]:
os.getcwd()

In [3]:
# Define the path to the run data
# This is where you downloaded that data to (i.e. where you put the concatenated fastq files)
run_location = '../zenodo_download/'

# This is the full path to the reference file.
ref_df = pd.read_csv('../zenodo_download/20240421-YL-ParLQ-ep1.csv')
variant_csv_path = 'OutputExample.csv'
name = 'Test-ep1'

## 3. Demultiplexing and variant calling
Demultiplexing is the process of separating out individual samples from a multiplexed sequencing run. We'll use the `demux_fastq` function from the custom script to perform this step.


In [4]:
# Create empty variant df
result_folder = os.path.join(result_folder, name)
variant_df = pd.DataFrame(columns=["barcode_plate", "name", "refseq", "variant"])

for i, row in ref_df.iterrows():
    barcode_plate = row["barcode_plate"]
    name = row["name"]
    refseq = row["refseq"].upper()

    # Create a subfolder for the current iteration using the name value
    name_folder = os.path.join(result_folder, name)
    os.makedirs(name_folder, exist_ok=True)

    # Write the refseq to a temporary fasta file
    temp_fasta_path = os.path.join(name_folder, f"temp_{name}.fasta")
    with open(temp_fasta_path, "w") as f:
        f.write(f">{name}\n{refseq}\n")
    # Create filtered barcode path
    f_min = 1
    f_max = 96
    rbc = i+1
    front_prefix = "NB"
    back_prefix = "RB"
    barcode_path = "../levseq/barcoding/minion_barcodes.fasta"
    barcode_path_filter = os.path.join(name_folder, "minion_barcodes_filtered.fasta")
    filter_barcodes(
        barcode_path,
        barcode_path_filter,
        (f_min, f_max),
        rbc,
        front_prefix,
        back_prefix,)
    
    # Perform demultiplexing
    demux_fastq(run_location, name_folder, barcode_path_filter)
    
    variant_result = call_variant(f"{name}", name_folder, temp_fasta_path, barcode_path_filter)
    variant_result["barcode_plate"] = barcode_plate
    variant_result["name"] = name
    variant_result["refseq"] = refseq
    variant_df = pd.concat([variant_df, variant_result])
variant_df.to_csv(variant_csv_path, index=False)

## 4. Create variant and visualization csv files

In [13]:
variant_df

## 5. Visualization
Finally, we'll visualize the results using appropriate visualization functions. This step helps in understanding the distribution and impact of the identified variants.

In [16]:
df_variants, df_vis = create_df_v(variant_df)
layout = generate_platemaps(
            max_combo_data=df_vis,
            result_folder=result_folder,)   
layout

In [19]:
layout

## 6. Upload to LevSeq website

Hold tight, this will be deployed in 1 day :D 
