# This code creates HCR probe pairs from targets, listed in corresponding .csv file

If python environment is not available, create it with this command and select it (kernel):
```
conda create -n hcr_probe_generator -c bioconda biopython numpy=1.23.5 pandas=1.3.5 blast openpyxl
```

In [2]:
# import modules
from maker37cb_mod import maker
import os
import pandas as pd
from contextlib import redirect_stdout

Define project name (name of your input csv file) and make sure file paths are correct:

In [None]:
project_name = "20241121_HCR_probes_Sunny"

in_path = "/home/mstemmer/repos/HCR_probe_generator/targets/"
out_path = "/home/mstemmer/repos/HCR_probe_generator/generated_probes/"
reference_path= "/home/mstemmer/repos/HCR_probe_generator/references/"

Place your input.csv file into the targets folder (same name as project) \
The reference file (into reference folder) should be an unpacked .fa file, specified also in the input .csv file. \
--> See example_input.csv file for required headers/columns! 

Required headers (all others columns will be ignored): 
```
'short','gene_name','amplifier','reference','sequence'
```
'short': abbreviated species name (e.g. 'dr' for Danio rerio)

Sort out file structures \
Check, if input file is correct!

In [None]:
input_csv = f"{project_name}.csv"
os.makedirs(f'{out_path}{project_name}', exist_ok=True)
out_project = os.path.join(f'{out_path}{project_name}')

# output folder to your generated HCR probes
print(f'You will find your HCR probes here: {out_project}')

# show input csv file with relevant columns
in_file = os.path.join(f'{in_path}{input_csv}')
input_df = pd.read_csv(in_file)
input_df = input_df[['short','gene_name','amplifier','reference','sequence']]
print()
print(f'All correct in the input csv file?')
input_df

You will find your HCR probes here: /home/mstemmer/repos/HCR_probe_generator/generated_probes/20241121_HCR_probes_Sunny

All correct in the input csv file?


Unnamed: 0,short,gene_name,amplifier,reference,sequence
0,dr,calb2b,B1,Danio_rerio.GRCz11.cdna.all.fa,ATGGCGAATAAAGCACCAGAGCCCATTTCTCTGCATTTGGCGGAAC...
1,dr,trpm5,B2,Danio_rerio.GRCz11.cdna.all.fa,ATGGTCGAGAAGTCCAGTGAGAGATTTGATAAACAGATGGCCGGGC...
2,dr,plcb2,B3,Danio_rerio.GRCz11.cdna.all.fa,ATGAGCAGAAACAGACACTCGCTGCAGGAGCCCGACATCAAAGACT...
3,dr,tas1r3,B5,Danio_rerio.GRCz11.cdna.all.fa,ATGCTTCTACTGAGGATGAAGAACAAGTGGACTTTTCTGGTGCTCT...


Run HCR probe generator over all rows in input_csv file. \
Code will try to generate 33 pairs for each target. If that can't be reached, the generator will re-run without trying to reach that maximum.

In [None]:
for index, row in input_df.iterrows():
    print(f"--> Working on {row['short']}_{row['amplifier']}_{row['gene_name']}")
    outfile = os.path.join(f"{out_project}/{row['short']}_{row['amplifier']}_{row['gene_name']}_probes.csv")
    
    with open(os.path.join(f"{out_project}/{row['short']}_{row['amplifier']}_{row['gene_name']}_log.txt"), 'w') as f:
        with redirect_stdout(f):
            try:
                pause = 12
                polyAT = 5
                polyCG = 5
                choose = "n"
                BlastProbes = "y"
                dropout = "y"
                show = "y"
                report = "y"
                maxprobe = "y"
                numbr = 0
                db = f"{reference_path}/{row['reference']}"
                maker(row['gene_name'],row['sequence'],row['amplifier'],pause,choose,polyAT,polyCG,BlastProbes,db,dropout,show,report,maxprobe,numbr,outfile)
            except IndexError:
                maxprobe="n"
                maker(row['gene_name'],row['sequence'],row['amplifier'],pause,choose,polyAT,polyCG,BlastProbes,db,dropout,show,report,maxprobe,numbr,outfile)


# Fuse all probes into single .csv and .xlsx file for IDT order

print('Fusing probes...')
all_probes = os.path.join(f"{out_project}/{project_name}_all_probes")
all_probes_df = pd.DataFrame({'Pool name': [], 'Sequence': []})

for index, row in input_df.iterrows():
    print(f"--> Fusing {row['gene_name']}")
    
    probes = os.path.join(f"{out_project}/{row['short']}_{row['amplifier']}_{row['gene_name']}_probes.csv")

    probes_df = pd.read_csv(probes)
    print(probes_df.shape)
    
    # all_probes_df = 
    all_probes_df = all_probes_df.append(probes_df)
all_probes_df.to_csv(f'{all_probes}.csv', index=None)
all_probes_df.to_excel(f'{all_probes}.xlsx', index=None)

--> Working on dr_B1_calb2b
--> Working on dr_B2_trpm5
--> Working on dr_B3_plcb2
--> Working on dr_B5_tas1r3
