# W4-CRISPRi-workflow

Transcriptional interference is another powerful tool for assessing the function of genes, by expressing a guide RNA that targets a dead-Cas9/sgRNA complex to bind to the + (Watson) strand of the promoter and 5’-UTR of the targeted gene, thereby sterically blocking binding and progression of RNA polymerases to the promoter region. Workflow 4 uses CRISPR interference (CRISPRi) with ssDNA bridging to reversibly inactivate genes transcriptionally12,29 (Figure 5A). This approach targets regions upstream of the Transcriptional Start Site (TSS), using a dCas9-sgRNA complex positioned near the TSS (default 100bp upstream) to sterically hinder transcription. This allows for the functional study of genes through controlled knockdown. To get started, users can download the pCRISPR-dCas9 plasmid and the S. coelicolor (A3) genome. StreptoAIM will then generate all necessary components, including primers and plasmids.

In [1]:
import sys
import os
from pydna.dseqrecord import Dseqrecord
from datetime import datetime


# Ensure the src directory is in the Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), '../../'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

from streptocad.sequence_loading.sequence_loading import (
    load_and_process_gene_sequences,
    load_and_process_genome_sequences,
    load_and_process_plasmid, 
    check_and_convert_input,
    annotate_dseqrecord,
    process_specified_gene_sequences_from_record)


from streptocad.cloning.ssDNA_bridging import assemble_plasmids_by_ssDNA_bridging, make_ssDNA_oligos
from streptocad.crispr.guideRNA_crispri import extract_sgRNAs_for_crispri, SgRNAargs
from streptocad.primers.primer_generation import  primers_to_IDT
from streptocad.utils import ProjectDirectory,extract_metadata_to_dataframe

## INPUT

In [2]:
# Inputs
# 1 Add genome of choice (genbank, fasta)
path_to_genome = '../../data/genomes/Streptomyces_coelicolor_A3_chromosome.gb'
genome = load_and_process_genome_sequences(path_to_genome)[0]

# 2 Add plasmid 
path_to_plasmid = '../../data/plasmids/pCRISPR–Cas9_plasmid_addgene.gbk'
clean_plasmid = load_and_process_plasmid(path_to_plasmid)

# 3 Choose genes to knock out (list)
genes_to_KO = ['SCO5087']#, 'SCO5089', 'SCO5090']

# negative strand example 
#genes_to_KO =['SCO0007']


#### Advanced settings ####
# 4 Filtering metrics for sgRNAs
gc_upper = 0.9999
gc_lower = 0.0001
off_target_seed = 13
off_target_upper = 10
cas_type='cas9'
number_of_sgRNAs_per_group = 5
extension_to_promoter_region=200

# 6 Choose overlapping sequences for our plasmid we can use the following
#As per the article **"CRISPR–Cas9, CRISPRi and CRISPR-BEST-mediated genetic manipulation in streptomycetes"** we need the following oligoes: 
#CGGTTGGTAGGATCGACGGC **-N20-** GTTTTAGAGCTAGAAATAGC
up_homology = Dseqrecord('CGGTTGGTAGGATCGACGGC')
dw_homology = Dseqrecord('GTTTTAGAGCTAGAAATAGC')

In [3]:
print(clean_plasmid.id)


pCRISPR-Cas9


# Computation

In [4]:
target_dict, genes_to_KO, annotation_input = check_and_convert_input(genes_to_KO)

print(annotation_input)
if annotation_input == True:
    genome = annotate_dseqrecord(genome, target_dict)


len(genome.features)

False


25824

In [16]:
# Initialize SgRNAArgs with desired parameters
args = SgRNAargs(genome, 
                genes_to_KO,
                step=['find', 'filter'],
                gc_upper = gc_upper,
                gc_lower = gc_lower,
                off_target_seed = off_target_seed,
                off_target_upper = off_target_upper,
                cas_type='cas9',
                target_non_template_strand = False, 
                extension_to_promoter_region=200,
                upstream_tss=100,
                dwstream_tss=80,
                  )

sgrna_df = extract_sgRNAs_for_crispri(args)
sgrna_df

Pam was found outside designated locus_tag: SCO5087. To incorporate this extent borders. Skipping to next locus tag.
sgRNA generated were outside the designated border in SCO5087. To incorporate this extent borders. Skipping to next locus tag.
Pam was found outside designated locus_tag: SCO5087. To incorporate this extent borders. Skipping to next locus tag.
sgRNA generated were outside the designated border in SCO5087. To incorporate this extent borders. Skipping to next locus tag.
sgRNA generated were outside the designated border in SCO5087. To incorporate this extent borders. Skipping to next locus tag.




Unnamed: 0,strain_name,locus_tag,gene_loc,gene_strand,sgrna_strand,sgrna_loc,gc,pam,sgrna,sgrna_seed_sequence,off_target_count,region
304,NC_003888.3,SCO5087,5529601,1,-1,-90,0.65,AGG,AGGGGAACACATGGCCACGC,CACATGGCCACGC,0,upstream
305,NC_003888.3,SCO5087,5529601,1,-1,-81,0.65,TGG,GAGGCAGGGAGGGGAACACA,GGAGGGGAACACA,0,upstream
325,NC_003888.3,SCO5087,5529601,1,1,-80,0.7,TGG,GTGTTCCCCTCCCTGCCTCG,CCTCCCTGCCTCG,0,upstream
306,NC_003888.3,SCO5087,5529601,1,-1,-72,0.7,GGG,AGGGACCACGAGGCAGGGAG,ACGAGGCAGGGAG,0,upstream
307,NC_003888.3,SCO5087,5529601,1,-1,-71,0.7,GGG,GAGGGACCACGAGGCAGGGA,CACGAGGCAGGGA,0,upstream
308,NC_003888.3,SCO5087,5529601,1,-1,-70,0.7,AGG,TGAGGGACCACGAGGCAGGG,CCACGAGGCAGGG,0,upstream
309,NC_003888.3,SCO5087,5529601,1,-1,-67,0.7,GGG,GCGTGAGGGACCACGAGGCA,GGACCACGAGGCA,0,upstream
310,NC_003888.3,SCO5087,5529601,1,-1,-66,0.75,AGG,CGCGTGAGGGACCACGAGGC,GGGACCACGAGGC,0,upstream
311,NC_003888.3,SCO5087,5529601,1,-1,-62,0.7,AGG,TGAGCGCGTGAGGGACCACG,GTGAGGGACCACG,0,upstream
324,NC_003888.3,SCO5087,5529601,1,1,-58,0.65,TGG,GTCCCTCACGCGCTCAGCTT,ACGCGCTCAGCTT,0,upstream


In [17]:
len(sgrna_df)

41

In [7]:
# CRISPyweb output
import pandas as pd
crispy_web_df = pd.read_csv('../../tests/test_files/CRISPy-web_CRISPRi_region_SCO5087_s_coelicolorA3_0mismatch.csv').sort_values(by=['Start'])
crispy_web_df

Unnamed: 0,ID,Start,End,Strand,ORF,Sequence,PAM,C to T mutations,A to G mutations,0bp mismatches,1bp mismatches,2bp mismatches
4,CY00000011,6,29,-1,-,AGGGGAACACATGGCCACGC,AGG,,,0,0,50
0,CY00000003,15,38,-1,-,GAGGCAGGGAGGGGAACACA,TGG,,,0,0,11
10,CY00000020,24,47,-1,-,AGGGACCACGAGGCAGGGAG,GGG,,,0,2,30
8,CY00000018,25,48,-1,-,GAGGGACCACGAGGCAGGGA,GGG,,,0,1,34
2,CY00000008,26,49,-1,-,TGAGGGACCACGAGGCAGGG,AGG,,,0,0,26
11,CY00000021,29,52,-1,-,GCGTGAGGGACCACGAGGCA,GGG,,,0,2,36
14,CY00000027,30,53,-1,-,CGCGTGAGGGACCACGAGGC,AGG,,,0,5,117
13,CY00000026,34,57,-1,-,TGAGCGCGTGAGGGACCACG,AGG,,,0,4,36
12,CY00000025,43,66,-1,-,GCCCAAAGCTGAGCGCGTGA,GGG,,,0,3,59
3,CY00000009,44,67,-1,-,CGCCCAAAGCTGAGCGCGTG,AGG,,,0,0,29


In [8]:
len(crispy_web_df)

17

In [9]:
# Filter the DataFrame to retain only up to 5 sgRNA sequences per locus_tag
filtered_df = sgrna_df.groupby('locus_tag').head(number_of_sgRNAs_per_group)
filtered_df

Unnamed: 0,strain_name,locus_tag,gene_loc,gene_strand,sgrna_strand,sgrna_loc,gc,pam,sgrna,sgrna_seed_sequence,off_target_count,region
304,NC_003888.3,SCO5087,5529601,1,-1,-90,0.65,AGG,AGGGGAACACATGGCCACGC,CACATGGCCACGC,0,upstream
305,NC_003888.3,SCO5087,5529601,1,-1,-81,0.65,TGG,GAGGCAGGGAGGGGAACACA,GGAGGGGAACACA,0,upstream
325,NC_003888.3,SCO5087,5529601,1,1,-80,0.7,TGG,GTGTTCCCCTCCCTGCCTCG,CCTCCCTGCCTCG,0,upstream
306,NC_003888.3,SCO5087,5529601,1,-1,-72,0.7,GGG,AGGGACCACGAGGCAGGGAG,ACGAGGCAGGGAG,0,upstream
307,NC_003888.3,SCO5087,5529601,1,-1,-71,0.7,GGG,GAGGGACCACGAGGCAGGGA,CACGAGGCAGGGA,0,upstream


## Output

In [10]:
# MAke oligoes
list_of_ssDNAs = make_ssDNA_oligos(filtered_df, upstream_ovh = up_homology,
                      downstream_ovh=dw_homology)
print(list_of_ssDNAs[0].name)

# cut plasmid
from Bio.Restriction import NcoI
linearized_plasmid = sorted(clean_plasmid.cut(NcoI), key=lambda x: len(x), reverse=True)[0]
#print(linearized_plasmid)

sgRNA_vectors = assemble_plasmids_by_ssDNA_bridging(list_of_ssDNAs,linearized_plasmid)
sgRNA_vectors

SCO5087_loc_-90


[Contig(o11279),
 Contig(o11279),
 Contig(o11279),
 Contig(o11279),
 Contig(o11279)]

In [11]:
# Constructing a meaningful name, ID, and description for the assembled plasmid using user input
targeting_info = []
for index, row in filtered_df.iterrows():
    formatted_str = f"CRISPRi_{row['locus_tag']}_p{row['sgrna_loc']}"
    targeting_info.append(formatted_str)

for i in range(len(sgRNA_vectors)):
    sgRNA_vectors[i].name = f'p{targeting_info[i]}_#{i+1}'
    sgRNA_vectors[i].id = sgRNA_vectors[i].name  # Using the same value for ID as for name for simplicity
    sgRNA_vectors[i].description = f'Assembled plasmid targeting {", ".join(genes_to_KO)} for single gene KNOCK-DOWN, assembled using StreptoCAD.'


In [12]:
print_plasmids = False

if print_plasmids: 
    for vector in sgRNA_vectors: 
        vector.write(f"../../data/plasmids/sgRNA_plasmids_pCRISPR–dCas9/{vector.id}.gb")

In [13]:
integration_names = filtered_df.apply(lambda row: f"sgRNA_{row['locus_tag']}({row['sgrna_loc']})", axis=1).tolist()
plasmid_metadata_df = extract_metadata_to_dataframe(sgRNA_vectors,
                                                    clean_plasmid,
                                                    integration_names)

plasmid_metadata_df

Unnamed: 0,plasmid_name,date,original_plasmid,integration,size
0,pCRISPRi_SCO5087_p-90_#1,2025-05-28,pCRISPR-Cas9,sgRNA_SCO5087(-90),11279
1,pCRISPRi_SCO5087_p-81_#2,2025-05-28,pCRISPR-Cas9,sgRNA_SCO5087(-81),11279
2,pCRISPRi_SCO5087_p-80_#3,2025-05-28,pCRISPR-Cas9,sgRNA_SCO5087(-80),11279
3,pCRISPRi_SCO5087_p-72_#4,2025-05-28,pCRISPR-Cas9,sgRNA_SCO5087(-72),11279
4,pCRISPRi_SCO5087_p-71_#5,2025-05-28,pCRISPR-Cas9,sgRNA_SCO5087(-71),11279


### IDT primers

In [14]:
idt_primers=primers_to_IDT(list_of_ssDNAs)
idt_primers

Unnamed: 0,Name,Sequence,Concentration,Purification
0,SCO5087_loc_-90,CGGTTGGTAGGATCGACGGCAGGGGAACACATGGCCACGCGTTTTA...,25nm,STD
1,SCO5087_loc_-81,CGGTTGGTAGGATCGACGGCGAGGCAGGGAGGGGAACACAGTTTTA...,25nm,STD
2,SCO5087_loc_-80,CGGTTGGTAGGATCGACGGCGTGTTCCCCTCCCTGCCTCGGTTTTA...,25nm,STD
3,SCO5087_loc_-72,CGGTTGGTAGGATCGACGGCAGGGACCACGAGGCAGGGAGGTTTTA...,25nm,STD
4,SCO5087_loc_-71,CGGTTGGTAGGATCGACGGCGAGGGACCACGAGGCAGGGAGTTTTA...,25nm,STD


## Folder with all the generated I/O

In [15]:
generate_data_folder = True 

if generate_data_folder:
    input_files = [
        {"name": "input_genome.gb", "content": genome},
        {"name": "input_plasmid.gb", "content": clean_plasmid}
    ]

    output_files = [
        {"name": "cBEST_w_sgRNAs.gb", "content": sgRNA_vectors}, # LIST OF Dseqrecords
        {"name": "full_idt.csv", "content": idt_primers},
        {"name": "sgrna_df.csv", "content": sgrna_df},
        {"name": "filtered_df.csv", "content": filtered_df},
        {"name": "plasmid_metadata_df.csv", "content": plasmid_metadata_df},
    ]

    input_values = {
        "genes_to_knockout": genes_to_KO,


        "filtering_metrics": {
            "gc_upper": gc_upper,
            "gc_lower": gc_lower,
            "off_target_seed": off_target_seed,
            "off_target_upper": off_target_upper,
            "cas_type": cas_type,
            "number_of_sgRNAs_per_group": number_of_sgRNAs_per_group,
            'extension_to_promoter_region':extension_to_promoter_region,

        },
        "overlapping_sequences": {
            "up_homology": str(up_homology),
            "dw_homology": str(dw_homology)
        }
    }


    # Paths to Markdown files
    markdown_file_paths = [
        "../../protocols/conjugation_protcol.md",
        "../../protocols/single_target_crispr_plasmid_protcol.md"

    ]



    # Data and time
    timestamp = datetime.utcnow().isoformat()

    # Create project directory structure
    project_directory = ProjectDirectory(
        project_name=f"CRISPRi_workflow_{timestamp}",
        input_files=input_files,
        output_files=output_files,
        input_values=input_values,
        markdown_file_paths=markdown_file_paths
    )


    # DO You want to save the folder? 
    save_zip_folder = False 

    if save_zip_folder: 
        # Generate the project directory structure and get the zip content
        zip_content = project_directory.create_directory_structure(create_directories=False)

        # Save the zip file to disk (optional)
        with open("project_structure.zip", "wb") as f:
            f.write(zip_content)