# W5-CRISPR-Cas9-inframe_deletion

In Workflow 5, CRISPR-Cas9 combines ssDNA bridging and Gibson cloning to create in-frame deletions (Figure 6A). This ensures that the downstream genes are not disrupted, which is crucial for functional studies of gene knockouts. This workflow can be used to study gene function by knocking out specific genes in streptomycetes and observing the resulting phenotypic changes. It can also be used to create markerless deletions, which are important for constructing clean genetic backgrounds for further genetic manipulations. To get started, download the pCRISPR-Cas9 file and a genome file (e.g., S. coelicolor A3), then select the genes to delete. StreptoAIM will generate the necessary primers and plasmid constructs (S5).


In [28]:
import sys
import os
from Bio.Restriction import StuI
from pydna.dseqrecord import Dseqrecord
import pandas as pd
from datetime import datetime


# Ensure the src directory is in the Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), '../../'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

from streptocad.sequence_loading.sequence_loading import (
    load_and_process_gene_sequences,
    load_and_process_genome_sequences,
    load_and_process_plasmid, 
    check_and_convert_input,
    annotate_dseqrecord,
    process_specified_gene_sequences_from_record)

from streptocad.utils import polymerase_dict, create_primer_df_from_dict,ProjectDirectory,extract_metadata_to_dataframe
from streptocad.primers.primer_generation import create_idt_order_dataframe
from streptocad.cloning.ssDNA_bridging import assemble_plasmids_by_ssDNA_bridging,make_ssDNA_oligos
from streptocad.crispr.guideRNAcas3_9_12 import extract_sgRNAs, SgRNAargs
from streptocad.cloning.gibson_cloning import (
    find_up_dw_repair_templates,
    assemble_multiple_plasmids_with_repair_templates_for_deletion,
    update_primer_names
)

from streptocad.cloning.plasmid_processing import check_plasmid_restriction_sites, determine_workflow_order_for_plasmids
from streptocad.primers.primer_generation import checking_primers, primers_to_IDT, find_best_check_primers_from_genome

## INPUT

In [29]:
# Inputs
# 1 Add genome of choice (genbank, fasta)
path_to_genome = '../../data/genomes/Streptomyces_coelicolor_A3_chromosome.gb'
genome = load_and_process_genome_sequences(path_to_genome)[0]

# 2 Add plasmid 
path_to_plasmid = '../../data/plasmids/pCRISPR–Cas9_plasmid_addgene.gbk'
clean_plasmid = load_and_process_plasmid(path_to_plasmid)

# 3 Choose genes to knock out (list)
genes_to_KO = ['SCO5087', 'SCO5089', 'SCO5090','SCO5091', 'SCO5092' ]
#genes_to_KO = ['SCO5087']#, 'SCO5089', 'SCO5090','SCO5091', 'SCO5092' ]

#genes_to_KO = ['80000-100000', '4000-7000', '9000-14000','15000-20000']


#### Advanced settings ####
# 4 Filtering metrics for sgRNAs
gc_upper = 0.72
gc_lower = 0.2
off_target_seed = 13
off_target_upper = 10
cas_type='cas9'
number_of_sgRNAs_per_group = 2

# 5 Choose polymerase and target melting temperature
chosen_polymerase = polymerase_dict['Phusion High-Fidelity DNA Polymerase (GC Buffer)']
melting_temperature = 65
primer_concentration = 0.4 
primer_number_increment = 23
flanking_region = 500

# 6 Choose overlapping sequences for our plasmid we can use the following
#As per the article **"CRISPR–Cas9, CRISPRi and CRISPR-BEST-mediated genetic manipulation in streptomycetes"** we need the following oligoes: 
#CGGTTGGTAGGATCGACGGC **-N20-** GTTTTAGAGCTAGAAATAGC
up_homology = Dseqrecord('CGGTTGGTAGGATCGACGGC')
dw_homology = Dseqrecord('GTTTTAGAGCTAGAAATAGC')

# 7 extra settings
#### Advanced settings ####
repair_length = 1000
overlap_length = 40 

In [30]:
print(clean_plasmid.id)

pCRISPR-Cas9


# Computation

In [31]:
target_dict, genes_to_KO, annotation_input = check_and_convert_input(genes_to_KO)

print(annotation_input)
if annotation_input == True:
    genome = annotate_dseqrecord(genome, target_dict)


len(genome.features)

False


25824

In [32]:
# Initialize SgRNAargs with desired parameters
args = SgRNAargs(genome, 
                genes_to_KO,
                step=['find', 'filter'],
                gc_upper = gc_upper,
                gc_lower = gc_lower,
                off_target_seed = off_target_seed,
                off_target_upper = off_target_upper,
                cas_type='cas9'
                )

sgrna_df = extract_sgRNAs(args)
sgrna_df

Pam was found outside designated locus_tag: SCO5087. To incorporate this extent borders. Skipping to next locus tag.
sgRNA generated were outside the designated border in SCO5087. To incorporate this extent borders. Skipping to next locus tag.
Pam was found outside designated locus_tag: SCO5087. To incorporate this extent borders. Skipping to next locus tag.
sgRNA generated were outside the designated border in SCO5087. To incorporate this extent borders. Skipping to next locus tag.
sgRNA generated were outside the designated border in SCO5087. To incorporate this extent borders. Skipping to next locus tag.
sgRNA generated were outside the designated border in SCO5089. To incorporate this extent borders. Skipping to next locus tag.
sgRNA generated were outside the designated border in SCO5089. To incorporate this extent borders. Skipping to next locus tag.
Pam was found outside designated locus_tag: SCO5090. To incorporate this extent borders. Skipping to next locus tag.
sgRNA generate

Unnamed: 0,strain_name,locus_tag,gene_loc,gene_strand,sgrna_strand,sgrna_loc,gc,pam,sgrna,sgrna_seed_sequence,off_target_count
483,NC_003888.3,SCO5090,5532706,1,1,546,0.65,CGG,GCAGCGACCTGTGGCAGGAA,CCTGTGGCAGGAA,0
485,NC_003888.3,SCO5090,5532706,1,1,537,0.65,TGG,TCATCGAGCGCAGCGACCTG,GCGCAGCGACCTG,0
489,NC_003888.3,SCO5090,5532706,1,1,487,0.65,GGG,CACGTTCGAGGACACGCTCC,GAGGACACGCTCC,0
490,NC_003888.3,SCO5090,5532706,1,1,486,0.60,CGG,TCACGTTCGAGGACACGCTC,CGAGGACACGCTC,0
491,NC_003888.3,SCO5090,5532706,1,1,475,0.60,AGG,GGAGCTGGTCTTCACGTTCG,GTCTTCACGTTCG,0
...,...,...,...,...,...,...,...,...,...,...,...
449,NC_003888.3,SCO5090,5532706,1,1,890,0.65,CGG,ACCCATCTGCGCGAGGTGCT,TGCGCGAGGTGCT,3
200,NC_003888.3,SCO5087,5529801,1,1,706,0.65,CGG,CAGCGCCGACGTGATGTTCG,GACGTGATGTTCG,3
650,NC_003888.3,SCO5091,5533653,1,1,649,0.65,GGG,CGAGGTGATCGACGCCAACC,ATCGACGCCAACC,3
191,NC_003888.3,SCO5087,5529801,1,1,838,0.70,TGG,TACGCGGGACGGCTTCGTGC,GACGGCTTCGTGC,3


In [33]:
# Filter the DataFrame to retain only up to 5 sgRNA sequences per locus_tag
filtered_df = sgrna_df.groupby('locus_tag').head(number_of_sgRNAs_per_group)
filtered_df

Unnamed: 0,strain_name,locus_tag,gene_loc,gene_strand,sgrna_strand,sgrna_loc,gc,pam,sgrna,sgrna_seed_sequence,off_target_count
483,NC_003888.3,SCO5090,5532706,1,1,546,0.65,CGG,GCAGCGACCTGTGGCAGGAA,CCTGTGGCAGGAA,0
485,NC_003888.3,SCO5090,5532706,1,1,537,0.65,TGG,TCATCGAGCGCAGCGACCTG,GCGCAGCGACCTG,0
549,NC_003888.3,SCO5091,5533653,1,-1,48,0.6,CGG,CTCGTAGGCGTAGACACCCT,GCGTAGACACCCT,0
554,NC_003888.3,SCO5091,5533653,1,-1,78,0.7,CGG,GTTGCTGACGCACCAGCCGC,ACGCACCAGCCGC,0
843,NC_003888.3,SCO5092,5534558,1,1,25,0.65,GGG,AGCCGACCAGGGAATGCTCC,CAGGGAATGCTCC,0
760,NC_003888.3,SCO5092,5534558,1,-1,267,0.7,TGG,GGACTTGCGCGCGAAGCGCA,CGCGCGAAGCGCA,0
330,NC_003888.3,SCO5089,5532449,1,1,61,0.7,CGG,CGTGGAGTGCGCCGGTGAGA,TGCGCCGGTGAGA,0
222,NC_003888.3,SCO5087,5529801,1,1,556,0.6,TGG,CCGGCACATGTTCGACTACC,ATGTTCGACTACC,0
207,NC_003888.3,SCO5087,5529801,1,1,656,0.7,CGG,ACCTCGGGCCTGGACTCCGT,GCCTGGACTCCGT,0
329,NC_003888.3,SCO5089,5532449,1,1,65,0.7,CGG,GAGTGCGCCGGTGAGACGGA,CCGGTGAGACGGA,0


## Output

In [34]:
# MAke oligoes
list_of_ssDNAs = make_ssDNA_oligos(filtered_df, upstream_ovh = up_homology,
                      downstream_ovh=dw_homology)
print(list_of_ssDNAs[0].name)

# cut plasmid
from Bio.Restriction import NcoI
linearized_plasmid = sorted(clean_plasmid.cut(NcoI), key=lambda x: len(x), reverse=True)[0]
#print(linearized_plasmid)

sgRNA_vectors = assemble_plasmids_by_ssDNA_bridging(list_of_ssDNAs,linearized_plasmid)
sgRNA_vectors

SCO5090_loc_546


[Contig(o11279),
 Contig(o11279),
 Contig(o11279),
 Contig(o11279),
 Contig(o11279),
 Contig(o11279),
 Contig(o11279),
 Contig(o11279),
 Contig(o11279),
 Contig(o11279)]

In [35]:
# Constructing a meaningful name, ID, and description for the assembled plasmid using user input
targeting_info = []
for index, row in filtered_df.iterrows():
    formatted_str = f"pCas9_{row['locus_tag']}({row['sgrna_loc']})"
    targeting_info.append(formatted_str)

for i in range(len(sgRNA_vectors)):
    sgRNA_vectors[i].name = f'{targeting_info[i]}'
    sgRNA_vectors[i].id = sgRNA_vectors[i].name  
    sgRNA_vectors[i].description = f'CRISPR-Cas9 targeting {", ".join(genes_to_KO)} for single gene knockout, assembled using StreptoCAD.'

In [36]:
print_plasmids = False

if print_plasmids: 
    for vector in sgRNA_vectors: 
        vector.write(f"../../data/plasmids/sgRNA_plasmids_pCRISPR-Cas9/{vector.id}.gb")

### IDT primers

In [37]:
idt_primers=primers_to_IDT(list_of_ssDNAs)
idt_primers

Unnamed: 0,Name,Sequence,Concentration,Purification
0,SCO5090_loc_546,CGGTTGGTAGGATCGACGGCGCAGCGACCTGTGGCAGGAAGTTTTA...,25nm,STD
1,SCO5090_loc_537,CGGTTGGTAGGATCGACGGCTCATCGAGCGCAGCGACCTGGTTTTA...,25nm,STD
2,SCO5091_loc_48,CGGTTGGTAGGATCGACGGCCTCGTAGGCGTAGACACCCTGTTTTA...,25nm,STD
3,SCO5091_loc_78,CGGTTGGTAGGATCGACGGCGTTGCTGACGCACCAGCCGCGTTTTA...,25nm,STD
4,SCO5092_loc_25,CGGTTGGTAGGATCGACGGCAGCCGACCAGGGAATGCTCCGTTTTA...,25nm,STD
5,SCO5092_loc_267,CGGTTGGTAGGATCGACGGCGGACTTGCGCGCGAAGCGCAGTTTTA...,25nm,STD
6,SCO5089_loc_61,CGGTTGGTAGGATCGACGGCCGTGGAGTGCGCCGGTGAGAGTTTTA...,25nm,STD
7,SCO5087_loc_556,CGGTTGGTAGGATCGACGGCCCGGCACATGTTCGACTACCGTTTTA...,25nm,STD
8,SCO5087_loc_656,CGGTTGGTAGGATCGACGGCACCTCGGGCCTGGACTCCGTGTTTTA...,25nm,STD
9,SCO5089_loc_65,CGGTTGGTAGGATCGACGGCGAGTGCGCCGGTGAGACGGAGTTTTA...,25nm,STD


# In frame deletion

In [38]:
in_frame_deletion = True


In [39]:
if in_frame_deletion:
    

    # Make repair templates
    repair_templates_data = find_up_dw_repair_templates(genome, 
                                                        genes_to_KO, 
                                                        target_tm=melting_temperature, 
                                                        primer_tm_kwargs={'conc':primer_concentration, 'prodcode':chosen_polymerase} , 
                                                        repair_length=repair_length, )

    # Digest the plasmids
    processed_records = [Dseqrecord(record, circular=True).cut(StuI)[0] for record in sgRNA_vectors]

    # Rename them appropriately
    for i in range(len(processed_records)):
        processed_records[i].name = sgRNA_vectors[i].name
        print(processed_records[i].name)

    # Assembly 
    assembly_data = assemble_multiple_plasmids_with_repair_templates_for_deletion(genes_to_KO, processed_records, 
                                                                                repair_templates_data, 
                                                                                overlap=overlap_length)
    # updating the primer names to something systematic.
    update_primer_names(assembly_data)
    print(assembly_data)

    # Parse through the primer df
    primer_df = create_primer_df_from_dict(assembly_data)
    unique_df = primer_df.drop_duplicates(keep='first')

    # IDT df
    idt_df = create_idt_order_dataframe(unique_df, concentration="25nm", purification="STD")

    # Contigs
    assembled_contigs = []
    for data in assembly_data: 
        contig_record = data['contig']
        contig_record.id = f"{data['name']}_w_rep"
        contig_record.name = f"{data['name']}_w_rep"
        assembled_contigs.append(contig_record)


pCas9_SCO5090(546)
pCas9_SCO5090(537)
pCas9_SCO5091(48)
pCas9_SCO5091(78)
pCas9_SCO5092(25)
pCas9_SCO5092(267)
pCas9_SCO5089(61)
pCas9_SCO5087(556)
pCas9_SCO5087(656)
pCas9_SCO5089(65)

Processing gene: SCO5087
Checking plasmid: pCas9_SCO5090(546)
Checking plasmid: pCas9_SCO5090(537)
Checking plasmid: pCas9_SCO5091(48)
Checking plasmid: pCas9_SCO5091(78)
Checking plasmid: pCas9_SCO5092(25)
Checking plasmid: pCas9_SCO5092(267)
Checking plasmid: pCas9_SCO5089(61)
Checking plasmid: pCas9_SCO5087(556)
Match found in plasmid: pCas9_SCO5087(556)
Repair template match found for gene: SCO5087
Record added for gene: SCO5087
Checking plasmid: pCas9_SCO5087(656)
Match found in plasmid: pCas9_SCO5087(656)
Repair template match found for gene: SCO5087
Record added for gene: SCO5087
Checking plasmid: pCas9_SCO5089(65)

Processing gene: SCO5089
Checking plasmid: pCas9_SCO5090(546)
Checking plasmid: pCas9_SCO5090(537)
Checking plasmid: pCas9_SCO5091(48)
Checking plasmid: pCas9_SCO5091(78)
Checking pla

In [40]:
if in_frame_deletion:
    print(unique_df)
    workflow_df = determine_workflow_order_for_plasmids(sgRNA_vectors, 
                                                            assembled_contigs,
                                                            ["StuI"], [ "NcoI"])
    print(workflow_df )

   template   direction f_primer_anneal(5-3) r_primer_anneal(5-3)  f_tm  r_tm  \
0   SCO5087    upstream     GACGCCGGAAATCCAG     CCCATCTCCCTTCGAC    56    54   
1   SCO5087  downstream     GCGTCCTCATCACCGG      GAGTAGAGGCGGCCC    58    57   
4   SCO5089    upstream       CCAGGACGCGAAGG   GGCTTTCTCCAGTTCTCG    54    56   
5   SCO5089  downstream        GCCGGCCGGGAGA       GGGGGCCTGCTCGT    59    60   
8   SCO5090    upstream        GCCACGCCCGCCG      GCCGCCTCGGCCAGT    63    64   
9   SCO5090  downstream       CGGTCGAGGTCCGT        ACCGTGCGGGACT    56    55   
12  SCO5091    upstream       GACACGCCGCGAGA        GACGCCGGCCCGG    57    61   
13  SCO5091  downstream  GGAGTTCGAAGATGGCAGC       AGGCCGAGACCCTG    60    55   
16  SCO5092    upstream        ACCGTCGCGGACG  CTTCGAACTCCCTAGGCGA    57    60   
17  SCO5092  downstream        GGCCGGCCGGGAA    ACATCTCCGGGGTCGCG    60    63   

    ta                            f_primer_sequences(5-3)  \
0   58  AAGGCCGCTTTTGCGGGATCTCGTCGAAGGCACTAGAAG

In [41]:
if in_frame_deletion:
    integration_names = filtered_df.apply(lambda row: f"sgRNA_{row['locus_tag']}({row['sgrna_loc']})", axis=1).tolist()
    plasmid_metadata_df = extract_metadata_to_dataframe(assembled_contigs,
                                                        clean_plasmid,
                                                        integration_names)
    workflow_df = determine_workflow_order_for_plasmids(sgRNA_vectors, 
                                                            assembled_contigs,
                                                            ["StuI",], [ "NcoI"])
    
    plasmid_metadata_df = pd.merge(plasmid_metadata_df, workflow_df, on='plasmid_name', how='inner')  # or 'left'/'right' depending on what you need

else: 
    integration_names = filtered_df.apply(lambda row: f"sgRNA_{row['locus_tag']}({row['sgrna_loc']})", axis=1).tolist()
    plasmid_metadata_df = extract_metadata_to_dataframe(sgRNA_vectors,
                                                        clean_plasmid,
                                                        integration_names)

plasmid_metadata_df

Unnamed: 0,plasmid_name,date,original_plasmid,integration,size,sgRNA plasmid,which workflow to proceed with,sgRNA plasmid #StuI sites,repair template plasmid #NcoI sites
0,pCas9_SCO5087(556)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5090(546),13279,pCas9_SCO5087(556),Proceed with sgRNA integration first,1,0
1,pCas9_SCO5087(656)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5090(537),13279,pCas9_SCO5087(656),Proceed with sgRNA integration first,1,0
2,pCas9_SCO5089(61)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5091(48),13279,pCas9_SCO5089(61),Proceed with sgRNA integration first,1,1
3,pCas9_SCO5089(65)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5091(78),13279,pCas9_SCO5089(65),Proceed with sgRNA integration first,1,1
4,pCas9_SCO5090(546)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5092(25),13279,pCas9_SCO5090(546),Proceed with sgRNA integration first,1,4
5,pCas9_SCO5090(537)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5092(267),13279,pCas9_SCO5090(537),Proceed with sgRNA integration first,1,4
6,pCas9_SCO5091(48)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5089(61),13279,pCas9_SCO5091(48),Proceed with sgRNA integration first,1,2
7,pCas9_SCO5091(78)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5087(556),13279,pCas9_SCO5091(78),Proceed with sgRNA integration first,1,2
8,pCas9_SCO5092(25)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5087(656),13279,pCas9_SCO5092(25),Proceed with sgRNA integration first,1,2
9,pCas9_SCO5092(267)_w_rep,2025-08-12,pCRISPR-Cas9,sgRNA_SCO5089(65),13279,pCas9_SCO5092(267),Proceed with sgRNA integration first,1,2


In [42]:
# Getting checking primers
checking_primers_df = find_best_check_primers_from_genome(genome, 
                                       genes_to_KO, 
                                       flanking_region=flanking_region,
                                       target_tm = melting_temperature, 
                                        primer_concentration = primer_concentration, 
                                        polymerase = chosen_polymerase, 
                                        limit=18)
checking_primers_df


Unnamed: 0,locus tag,f_primer_name,r_primer_name,f_primer_sequences(5-3),r_primer_sequences(5-3),f_tm,r_tm,ta,flanking_region,annealing_temperature,...,heterodimer_tm,heterodimer_deltaG (kcal/mol),hairpin_forward_structure_found,hairpin_forward_tm,hairpin_forward_deltaG (kcal/mol),hairpin_reverse_structure_found,hairpin_reverse_tm,hairpin_reverse_deltaG (kcal/mol),f_tm.1,r_tm.1
0,SCO5091,SCO5091_fwd_checking_primer,SCO5091_rev_checking_primer,AGCCGCCCGAGGAGCTGG,CAGGCACCCGCCGCCGAG,69,70,72,502,72,...,-19.574171,-0.275126,False,0.0,0.0,False,0.0,0.0,69,70
1,SCO5092,SCO5092_fwd_checking_primer,SCO5092_rev_checking_primer,ACGTCGGCGAGCGACAGG,CACCCCCCGCACCCACCG,66,70,69,508,69,...,2.724266,1.657074,False,0.0,0.0,False,0.0,0.0,66,70
2,SCO5090,SCO5090_fwd_checking_primer,SCO5090_rev_checking_primer,CTCTACTCCGGCGGCGGC,CGACATGACGACGTCCCCCGC,67,69,70,509,70,...,4.251557,0.508299,False,0.0,0.0,False,0.0,0.0,67,69
3,SCO5087,SCO5087_fwd_checking_primer,SCO5087_rev_checking_primer,CTTTCCGCCGGTCGAGGC,GCGGATCGTACGGCGGGC,65,68,68,524,68,...,32.653761,-1.899417,False,0.0,0.0,False,0.0,0.0,65,68
4,SCO5089,SCO5089_fwd_checking_primer,SCO5089_rev_checking_primer,CTGGAGGACAGCGCCGCG,GTTCCTGCCACAGGTCGCTGC,67,68,70,543,70,...,7.747596,-1.793463,False,0.0,0.0,False,0.0,0.0,67,68


In [43]:
# make column labels unique inside each DF
unique_df = unique_df.loc[:, ~unique_df.columns.duplicated(keep='first')]
checking_primers_df_copy = checking_primers_df.copy()
# rename
checking_primers_df_copy = checking_primers_df_copy.rename(columns={'locus tag': 'template'})
checking_primers_df_copy = checking_primers_df_copy.loc[:, ~checking_primers_df_copy.columns.duplicated(keep='first')]

# now safe to concat rows
pcr_table = pd.concat([unique_df, checking_primers_df_copy], ignore_index=True)


In [44]:
# making the primers into dseqrecords
checking_primers_df_idt = create_idt_order_dataframe(checking_primers_df)
checking_primers_df_idt


Unnamed: 0,Name,Sequence,Concentration,Purification
0,SCO5091_fwd_checking_primer,AGCCGCCCGAGGAGCTGG,25nm,STD
1,SCO5092_fwd_checking_primer,ACGTCGGCGAGCGACAGG,25nm,STD
2,SCO5090_fwd_checking_primer,CTCTACTCCGGCGGCGGC,25nm,STD
3,SCO5087_fwd_checking_primer,CTTTCCGCCGGTCGAGGC,25nm,STD
4,SCO5089_fwd_checking_primer,CTGGAGGACAGCGCCGCG,25nm,STD
5,SCO5091_rev_checking_primer,CAGGCACCCGCCGCCGAG,25nm,STD
6,SCO5092_rev_checking_primer,CACCCCCCGCACCCACCG,25nm,STD
7,SCO5090_rev_checking_primer,CGACATGACGACGTCCCCCGC,25nm,STD
8,SCO5087_rev_checking_primer,GCGGATCGTACGGCGGGC,25nm,STD
9,SCO5089_rev_checking_primer,GTTCCTGCCACAGGTCGCTGC,25nm,STD


In [45]:
idt_df

Unnamed: 0,Name,Sequence,Concentration,Purification
0,SCO5087_repair_up_forwar_p,AAGGCCGCTTTTGCGGGATCTCGTCGAAGGCACTAGAAGGGACGCC...,25nm,STD
1,SCO5087_repair_dw_forwar_p,GGCGGTCGAAGGGAGATGGGGCGTCCTCATCACCGG,25nm,STD
2,SCO5089_repair_up_forwar_p,AAGGCCGCTTTTGCGGGATCTCGTCGAAGGCACTAGAAGGCCAGGA...,25nm,STD
3,SCO5089_repair_dw_forwar_p,CCCGAGAACTGGAGAAAGCCGCCGGCCGGGAGA,25nm,STD
4,SCO5090_repair_up_forwar_p,AAGGCCGCTTTTGCGGGATCTCGTCGAAGGCACTAGAAGGGCCACG...,25nm,STD
5,SCO5090_repair_dw_forwar_p,GGCGCACTGGCCGAGGCGGCCGGTCGAGGTCCGT,25nm,STD
6,SCO5091_repair_up_forwar_p,AAGGCCGCTTTTGCGGGATCTCGTCGAAGGCACTAGAAGGGACACG...,25nm,STD
7,SCO5091_repair_dw_forwar_p,GCCGCCGCCGGGCCGGCGTCGGAGTTCGAAGATGGCAGC,25nm,STD
8,SCO5092_repair_up_forwar_p,AAGGCCGCTTTTGCGGGATCTCGTCGAAGGCACTAGAAGGACCGTC...,25nm,STD
9,SCO5092_repair_dw_forwar_p,CTCGCCTAGGGAGTTCGAAGGGCCGGCCGGGAA,25nm,STD


In [46]:
if in_frame_deletion: 
    full_idt = pd.concat([idt_primers, idt_df, checking_primers_df_idt])
else: 
    full_idt = pd.concat([idt_primers, checking_primers_df_idt])

full_idt

Unnamed: 0,Name,Sequence,Concentration,Purification
0,SCO5090_loc_546,CGGTTGGTAGGATCGACGGCGCAGCGACCTGTGGCAGGAAGTTTTA...,25nm,STD
1,SCO5090_loc_537,CGGTTGGTAGGATCGACGGCTCATCGAGCGCAGCGACCTGGTTTTA...,25nm,STD
2,SCO5091_loc_48,CGGTTGGTAGGATCGACGGCCTCGTAGGCGTAGACACCCTGTTTTA...,25nm,STD
3,SCO5091_loc_78,CGGTTGGTAGGATCGACGGCGTTGCTGACGCACCAGCCGCGTTTTA...,25nm,STD
4,SCO5092_loc_25,CGGTTGGTAGGATCGACGGCAGCCGACCAGGGAATGCTCCGTTTTA...,25nm,STD
5,SCO5092_loc_267,CGGTTGGTAGGATCGACGGCGGACTTGCGCGCGAAGCGCAGTTTTA...,25nm,STD
6,SCO5089_loc_61,CGGTTGGTAGGATCGACGGCCGTGGAGTGCGCCGGTGAGAGTTTTA...,25nm,STD
7,SCO5087_loc_556,CGGTTGGTAGGATCGACGGCCCGGCACATGTTCGACTACCGTTTTA...,25nm,STD
8,SCO5087_loc_656,CGGTTGGTAGGATCGACGGCACCTCGGGCCTGGACTCCGTGTTTTA...,25nm,STD
9,SCO5089_loc_65,CGGTTGGTAGGATCGACGGCGAGTGCGCCGGTGAGACGGAGTTTTA...,25nm,STD


## Folder with all the generated I/O

In [47]:
input_files = [
    {"name": "input_genome.gb", "content": genome},
    {"name": "input_plasmid.gb", "content": clean_plasmid}
]

if in_frame_deletion: 
    output_files = [
        {"name": "cBEST_w_sgRNAs.gb", "content": assembled_contigs}, # LIST OF Dseqrecords
        {"name": "primer_df.csv", "content": primer_df},
        {"name": "full_idt.csv", "content": full_idt},
        {"name": "sgrna_df.csv", "content": sgrna_df},
        {"name": "filtered_df.csv", "content": filtered_df},
        {"name": "plasmid_metadata_df.csv", "content": plasmid_metadata_df},

    ]
else: 
    output_files = [
        {"name": "cBEST_w_sgRNAs.gb", "content": sgRNA_vectors}, # LIST OF Dseqrecords
        {"name": "full_idt.csv", "content": full_idt},
        {"name": "sgrna_df.csv", "content": sgrna_df},
        {"name": "filtered_df.csv", "content": filtered_df},
        {"name": "plasmid_metadata_df.csv", "content": plasmid_metadata_df},
        {"name": "workflow_order_df.csv", "content": workflow_df},

        workflow_df

    ]

input_values = {
    "genes_to_knockout": genes_to_KO,


    "filtering_metrics": {
        "gc_upper": gc_upper,
        "gc_lower": gc_lower,
        "off_target_seed": off_target_seed,
        "off_target_upper": off_target_upper,
        "cas_type": cas_type,
        "number_of_sgRNAs_per_group": number_of_sgRNAs_per_group,

    },  
    "polymerase_settings": {
        "chosen_polymerase": chosen_polymerase,
        "melting_temperature": melting_temperature,
        "primer_concentration": primer_concentration,
        "primer_number_increment": primer_number_increment,
        "flanking_region": flanking_region
    },
    "overlapping_sequences": {
        "up_homology": str(up_homology),
        "dw_homology": str(dw_homology)
    }
}


# Paths to Markdown files
markdown_file_paths = [
    "../../protocols/conjugation_protcol.md",
    "../../protocols/single_target_crispr_plasmid_protcol.md"

]


timestamp = datetime.utcnow().isoformat()

# Create project directory structure
project_directory = ProjectDirectory(
    project_name=f"CRISPR_cas9_plasmid_workflow_{timestamp}",
    input_files=input_files,
    output_files=output_files,
    input_values=input_values,
    markdown_file_paths=markdown_file_paths
)

# DO You want to save the folder? 
save_zip_folder = False 

if save_zip_folder: 
    # Generate the project directory structure and get the zip content
    zip_content = project_directory.create_directory_structure(create_directories=False)

    # Save the zip file to disk (optional)
    with open("project_structure.zip", "wb") as f:
        f.write(zip_content)