evalzoo requires a config `yml` file for each batch of profiles. Using the `match_rep_df.csv`, create these profiles.

Additionally, each plate will require a unique config file for each batch. 

1. Load match_rep_df to get access to batch names and plate names (Assay_Plate_Barcode)
2. Iterate through each batch+plate and update both the replicate and match config files
   1. Update `experiment.data_path` and `experiment.plates` with the batch and **all** plates within that batch information, respectively
3. Save the two config files in an identical folder structure 
   1. `batch_plate.yml`
4. Run evalzoo


After evalzoo we will have results that are stored within a folder that's named with a random hash (unsure why). However, we want these results to be 


experiment:
  data_path: "/input/Scope1_Nikon_10X"
  input_structure: "{data_path}/{{plate}}/{{plate}}_normalized_feature_select_negcon_batch.{extension}"
  extension: csv.gz
  plates:
    - BR00117060a10x

In [63]:
import yaml
import pandas as pd
import os

In [5]:
match_rep_df = pd.read_csv("../checkpoints/match_rep_df.csv")

match_rep_df = match_rep_df[match_rep_df["sphering"] == True]

match_rep_df

Unnamed: 0,Vendor,Batch,Plate_Map_Name,Assay_Plate_Barcode,Modality,Images_per_well,Sites-SubSampled,Binning,Magnification,Number_of_channels,...,Size_MB_std,sphering,value_95_replicating,Percent_Replicating,channel_names,brightfield_z_plane_used,feature_channels_found,Percent_Matching,value_95_matching,cell_count
0,MolDev,Scope1_MolDev_10X,JUMP-MOA_compound_platemap,Plate2_PCO_6ch_4site_10XPA,Widefield,4,,1,10,6,...,0.000144,True,0.191908,60.000000,"Actin, DNA, ER, Golgi, Mito, RNA",,"Actin, DNA, ER, Golgi, Mito, RNA",23.255814,0.288099,2014937
2,MolDev,Scope1_MolDev_10X,JUMP-MOA_compound_platemap,Plate3_PCO_6ch_4site_10XPA_Crest,Confocal,4,,1,10,6,...,0.000183,True,0.269617,62.222222,"Actin, DNA, ER, Golgi, Mito, RNA",,"Actin, DNA, ER, Golgi, Mito, RNA",18.604651,0.398249,2413350
4,MolDev,Scope1_MolDev_10X_4siteZ,JUMP-MOA_compound_platemap,Plate3_PCO_6ch_4site_10XPA_Crestz,Confocal,4,,1,10,6,...,0.000142,True,0.205121,66.666667,"Actin, DNA, ER, Golgi, Mito, RNA",,"Actin, DNA, ER, Golgi, Mito, RNA",23.255814,0.363114,2381443
6,MolDev,Scope1_MolDev_20X_4site,JUMP-MOA_compound_platemap,Plate3_PCO_6ch_4site_20XPA_Crestz,Confocal,4,,1,20,6,...,0.000114,True,0.182630,57.777778,"Actin, DNA, ER, Golgi, Mito, RNA",,"Actin, DNA, ER, Golgi, Mito, RNA",18.604651,0.279178,527841
8,MolDev,Scope1_MolDev_20X_9site,JUMP-MOA_compound_platemap,Plate2_PCO_6ch_9site_20XPA,Widefield,9,,1,20,6,...,0.000153,True,0.184205,67.777778,"Actin, DNA, ER, Golgi, Mito, RNA",,"Actin, DNA, ER, Golgi, Mito, RNA",23.255814,0.291127,1101611
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
352,Yokogawa_US,4siteSubSample_Scope1_Yokogawa_US_20X_5Ch,JUMP-MOA_compound_platemap,BRO0117056_20x,Confocal,9,4.0,1,20,5,...,0.000044,True,0.174914,57.777778,"AGP, DNA, ER, Mito, RNA",,"AGP, DNA, ER, Mito, RNA",23.255814,0.244983,544244
354,Yokogawa_US,4siteSubSample_Scope1_Yokogawa_US_20X_5Ch_12Z,JUMP-MOA_compound_platemap,BRO0117056_20xb,Confocal,9,4.0,1,20,5,...,0.000044,True,0.157136,60.000000,"AGP, DNA, ER, Mito, RNA",,"AGP, DNA, ER, Mito, RNA",20.930233,0.227059,543826
356,Yokogawa_US,4siteSubSample_Scope1_Yokogawa_US_20X_6Ch_BRO0...,JUMP-MOA_compound_platemap,BRO0117059_20X,Confocal,9,4.0,1,20,6,...,0.000583,True,0.179268,58.888889,"AGP, BrightField, DNA, ER, Mito, RNA",Z08,"AGP, BrightField, DNA, ER, Mito, RNA",20.930233,0.253483,489099
358,Yokogawa_US,4siteSubSample_Scope1_Yokogawa_US_20X_6Ch_BRO0...,JUMP-MOA_compound_platemap,BRO01177034_20x,Confocal,9,4.0,1,20,6,...,0.000014,True,0.139090,56.666667,"AGP, BrightField, DNA, ER, Mito, RNA",Z17,"AGP, BrightField, DNA, ER, Mito, RNA",18.604651,0.193171,452567


In [105]:
with open("params/within_matches.yml") as f:
    match_yaml = yaml.safe_load(f)

with open("params/within_replicates.yml") as f:
    rep_yaml = yaml.safe_load(f)

for batch, grouped_df in match_rep_df.groupby("Batch")[["Batch", "Assay_Plate_Barcode"]]:
    plates = grouped_df["Assay_Plate_Barcode"].tolist()

    out_rep_yaml = rep_yaml
    out_rep_yaml["experiment"]["data_path"] = f"/input/profiles/{batch}"
    out_rep_yaml["experiment"]["plates"] = plates

    out_match_yaml = match_yaml
    out_match_yaml["experiment"]["data_path"] = f"/input/profiles/{batch}"
    out_match_yaml["experiment"]["plates"] = plates

    save_dir = os.path.join("params", batch)

    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    with open(os.path.join(save_dir, f'{batch}_replicates_config.yaml'), 'w') as f:
        yaml.dump(out_rep_yaml, f, sort_keys=False)

    with open(os.path.join(save_dir, f'{batch}_matches_config.yaml'), 'w') as f:
        yaml.dump(out_match_yaml, f, sort_keys=False)
