# Run many seeds of SATURN

To replicate SATURN results for frog and zebrafish embryogenesis you need to run SATURN 30 times with different seeds.

To more easily do this analysis, we have provided a python script that will run SATURN a certain number of times.


**NOTE: run the Train SATURN vignette first, `Vignettes/frog_zebrafish_embryogenesis/Train SATURN.ipynb`**

In [1]:
# Make a path fixed copy of the run file from the vignette
import pandas as pd
run_df = pd.read_csv("data/frog_zebrafish_run.csv")
run_df["path"] = ["Vignettes/frog_zebrafish_embryogenesis/" + path for path in run_df["path"] ]
run_df.to_csv("data/frog_zebrafish_run_multi.csv", index=False)

# Run the 30 seeds

*This will take a while*

In [2]:
!cd ../../ ; python3 saturn_multiple_seeds.py \
                --run=Vignettes/frog_zebrafish_embryogenesis/data/frog_zebrafish_run_multi.csv --embedding_model=ESM2 \
                --gpus 5 6 7 8 \
                --seeds=30 --pe_sim_penalty=0.2

['5', '6', '7', '8']
  0%|                                                    | 0/30 [00:00<?, ?it/s]RUNNING SEED: 0 ON GPU:5
RUNNING SEED: 1 ON GPU:6
RUNNING SEED: 2 ON GPU:7
RUNNING SEED: 3 ON GPU:8
Global seed set to 0
Global seed set to 0
Global seed set to 0
Global seed set to 0
  new_rank_zero_deprecation(
  return new_rank_zero_deprecation(*args, **kwargs)
  new_rank_zero_deprecation(
  return new_rank_zero_deprecation(*args, **kwargs)
  new_rank_zero_deprecation(
  return new_rank_zero_deprecation(*args, **kwargs)
  new_rank_zero_deprecation(
  return new_rank_zero_deprecation(*args, **kwargs)
Epoch 200: L1 Loss 0.0 Rank Loss 2.628314733505249, Avg Loss frog: 1855, Avg Los
100%|█████████████████████████████████████████| 157/157 [00:22<00:00,  6.85it/s]
Epoch 200: L1 Loss 0.0 Rank Loss 2.6406631469726562, Avg Loss frog: 1858, Avg Lo
Epoch 200: L1 Loss 0.0 Rank Loss 2.688323736190796, Avg Loss frog: 1856, Avg Los
Epoch 200: L1 Loss 0.0 Rank Loss 2.667391538619995, Avg Loss frog: 

Epoch 200: L1 Loss 0.0 Rank Loss 2.6968400478363037, Avg Loss frog: 1856, Avg Lo
100%|█████████████████████████████████████████| 157/157 [00:22<00:00,  6.84it/s]
100%|█████████████████████████████████████████| 157/157 [00:10<00:00, 14.52it/s]
 53%|███████████████████▋                 | 16/30 [9:46:13<9:28:14, 2435.29s/it]RUNNING SEED: 16 ON GPU:5
Global seed set to 0
  new_rank_zero_deprecation(
  return new_rank_zero_deprecation(*args, **kwargs)
Epoch 200: L1 Loss 0.0 Rank Loss 2.648280620574951, Avg Loss frog: 1855, Avg Los
100%|█████████████████████████████████████████| 157/157 [00:23<00:00,  6.75it/s]
100%|█████████████████████████████████████████| 157/157 [00:10<00:00, 14.92it/s]
 57%|████████████████████▍               | 17/30 [10:38:02<9:31:20, 2636.95s/it]RUNNING SEED: 17 ON GPU:7
Epoch 72: L1 Loss 0.0 Rank Loss 3.037513494491577, Avg Loss frog: 1941, Avg LossGlobal seed set to 0
  new_rank_zero_deprecation(
  return new_rank_zero_deprecation(*args, **kwargs)
Epoch 200: L1 Loss

  new_rank_zero_deprecation(
  return new_rank_zero_deprecation(*args, **kwargs)
Epoch 200: L1 Loss 0.0 Rank Loss 2.623051404953003, Avg Loss frog: 1855, Avg Los
100%|█████████████████████████████████████████| 157/157 [00:23<00:00,  6.66it/s]
Epoch 200: L1 Loss 0.0 Rank Loss 2.655632734298706, Avg Loss frog: 1856, Avg Los
100%|█████████████████████████████████████████| 157/157 [00:25<00:00,  6.18it/s]
Epoch 200: L1 Loss 0.0 Rank Loss 2.7219409942626953, Avg Loss frog: 1855, Avg Lo
100%|█████████████████████████████████████████| 157/157 [00:23<00:00,  6.81it/s]
Epoch 200: L1 Loss 0.0 Rank Loss 2.6708030700683594, Avg Loss frog: 1858, Avg Lo
100%|█████████████████████████████████████████| 157/157 [00:23<00:00,  6.61it/s]
100%|█████████████████████████████████████████| 157/157 [00:10<00:00, 14.90it/s]
 80%|████████████████████████████▊       | 24/30 [16:11:13<5:42:56, 3429.36s/it]RUNNING SEED: 24 ON GPU:7
Global seed set to 0
  new_rank_zero_deprecation(
  return new_rank_zero_deprecation

# Score the 30 seeds

We now need to score each SATURN run. First, we create a csv file mapping each run to a path.

In [3]:
from glob import glob

fz_adatas = glob("../multiple_seeds_results/saturn_results/*ESM2*2000*8000*default*.h5ad")
fz_adatas = [path.replace("..", "Vignettes") for path in fz_adatas if "pretrain" not in path and "frog" in path]
seeds = [path.split("_")[-1].replace(".h5ad", "") for path in fz_adatas]
fz_adatas, seeds

import pandas as pd
score_df = pd.DataFrame()
score_df["seed"] = seeds
score_df["path"] = fz_adatas
display(score_df.head())
print(len(score_df))
score_df.to_csv("./data/fz_multi_seeds.csv", index=False)

Unnamed: 0,seed,path
0,27,Vignettes/multiple_seeds_results/saturn_result...
1,0,Vignettes/multiple_seeds_results/saturn_result...
2,16,Vignettes/multiple_seeds_results/saturn_result...
3,4,Vignettes/multiple_seeds_results/saturn_result...
4,12,Vignettes/multiple_seeds_results/saturn_result...


30


In [4]:
!cd ../../ ; python3 score_adata.py --adata=Vignettes/frog_zebrafish_embryogenesis/data/fz_multi_seeds.csv --scores=1 \
                                 --multiple_files --species1=zebrafish --species2=frog --label=labels2 \
                                 --ct_map=Vignettes/frog_zebrafish_embryogenesis/data/frog_zebrafish_cell_type_map.csv

  if species_1 or species_2 is "human":
  elif species_1 or species_2 is "zebrafish":
0
100%|███████████████████████████████████████████| 30/30 [10:46<00:00, 21.55s/it]
100%|███████████████████████████████████████████| 30/30 [07:48<00:00, 15.61s/it]
Vignettes/frog_zebrafish_embryogenesis/data/fz_multi_seeds_scores.csv
    seed  ...              Label
0     27  ...  zebrafish to frog
1      0  ...  zebrafish to frog
2     16  ...  zebrafish to frog
3      4  ...  zebrafish to frog
4     12  ...  zebrafish to frog
5     23  ...  zebrafish to frog
6      8  ...  zebrafish to frog
7      9  ...  zebrafish to frog
8     26  ...  zebrafish to frog
9      1  ...  zebrafish to frog
10    17  ...  zebrafish to frog
11     5  ...  zebrafish to frog
12    13  ...  zebrafish to frog
13    22  ...  zebrafish to frog
14    19  ...  zebrafish to frog
15    28  ...  zebrafish to frog
16    24  ...  zebrafish to frog
17     3  ...  zebrafish to frog
18    15  ...  zebrafish to frog
19     7  ...  zebra

The script will save a copy with scores to `"./data/fz_multi_seeds_scores.csv"`

In [2]:
df = pd.read_csv("./data/fz_multi_seeds_scores.csv")
df

Unnamed: 0,seed,path,Logistic Regression,Balanced Regression,Reannotation,Label
0,27,Vignettes/multiple_seeds_results/saturn_result...,0.841399,0.516604,,zebrafish to frog
1,0,Vignettes/multiple_seeds_results/saturn_result...,0.854026,0.542394,,zebrafish to frog
2,16,Vignettes/multiple_seeds_results/saturn_result...,0.865828,0.529514,,zebrafish to frog
3,4,Vignettes/multiple_seeds_results/saturn_result...,0.849311,0.508423,,zebrafish to frog
4,12,Vignettes/multiple_seeds_results/saturn_result...,0.865219,0.534301,,zebrafish to frog
5,23,Vignettes/multiple_seeds_results/saturn_result...,0.85906,0.543006,,zebrafish to frog
6,8,Vignettes/multiple_seeds_results/saturn_result...,0.855965,0.52698,,zebrafish to frog
7,9,Vignettes/multiple_seeds_results/saturn_result...,0.733708,0.438924,,zebrafish to frog
8,26,Vignettes/multiple_seeds_results/saturn_result...,0.868066,0.537993,,zebrafish to frog
9,1,Vignettes/multiple_seeds_results/saturn_result...,0.860288,0.528257,,zebrafish to frog
