# Replicate Figures 1, 2 and 3

To replicate SATURN results for frog and zebrafish embryogenesis you need to run SATURN 30 times with different seeds.

To more easily do this analysis, we have provided a python script that will run SATURN a certain number of times.


**NOTE: run the Train SATURN vignette first, `Vignettes/frog_zebrafish_embryogenesis/Train SATURN.ipynb`**

In [1]:
# Make a path fixed copy of the run file from the vignette
import pandas as pd
run_df = pd.read_csv("data/frog_zebrafish_run.csv")
run_df["path"] = ["Vignettes/frog_zebrafish_embryogenesis/" + path for path in run_df["path"] ]
run_df.to_csv("data/frog_zebrafish_run_multi.csv", index=False)

# Run the 30 seeds

*This will take a while*

In [None]:
!cd ../../ ; python3 saturn_multiple_seeds.py \
                --run=Vignettes/frog_zebrafish_embryogenesis/data/frog_zebrafish_run_multi.csv \
                --gpus 1 3 4 \
                --seeds=3

['1', '3', '4']
  0%|                                                     | 0/3 [00:00<?, ?it/s]RUNNING SEED: 0 ON GPU:1
RUNNING SEED: 1 ON GPU:3
RUNNING SEED: 2 ON GPU:4
Global seed set to 0
Global seed set to 0
Global seed set to 0
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Epoch 200: L1 Loss 0.0 Rank Loss 12.251686096191406, Avg Loss frog: 1862, Avg Lo
100%|█████████████████████████████████████████| 157/157 [00:17<00:00,  9.18it/s]
Epoch 200: L1 Loss 0.0 Rank Loss 12.439830780029297, Avg Loss frog: 1862, Avg Lo
100%|█████████████████████████████████████████| 157/157 [00:17<00:00,  8.90it/s]
Epoch 200: L1 Loss 0.0 Rank Loss 12.111682891845703, Avg Loss frog: 1862, Avg Lo
100%|█████████████████████████████████████████| 157/157 [00:17<00:00,  9.05it/s]
1

# Score the 30 seeds

We now need to score each SATURN run. First, we create a csv file mapping each run to a path.

In [None]:
from glob import glob

fz_adatas = glob("../multiple_seeds_results/saturn_results/*.h5ad")
fz_adatas = [path.replace("..", "Vignettes") for path in fz_adatas if "pretrain" not in path and "frog" in path]
seeds = [path.split("_")[-1].replace(".h5ad", "") for path in fz_adatas]
fz_adatas, seeds

import pandas as pd
score_df = pd.DataFrame()
score_df["seed"] = seeds
score_df["path"] = fz_adatas
display(score_df.head())
score_df.to_csv("./data/fz_multi_seeds.csv", index=False)

In [None]:
!cd ../../ ; python3 score_adata.py --adata=Vignettes/frog_zebrafish_embryogenesis/data/fz_multi_seeds.csv --scores=1 \
                                 --multiple_files --species1=zebrafish --species2=frog --label=labels2 \
                                 --ct_map=Vignettes/frog_zebrafish_embryogenesis/data/frog_zebrafish_cell_type_map.csv

The script will save a copy with scores to `"./data/fz_multi_seeds_scores.csv"`

In [None]:
pd.read_csv("./data/fz_multi_seeds_scores.csv")