This notebook goes over how to use `STATE` using `helical`.

# Download Example Data

We start by using the helical downloader to obtain an example huggingface dataset. 

In [1]:
from helical.utils.downloader import Downloader
from pathlib import Path

downloader = Downloader()
downloader.download_via_link(
    Path("yolksac_human.h5ad"),
    "https://huggingface.co/datasets/helical-ai/yolksac_human/resolve/main/data/17_04_24_YolkSacRaw_F158_WE_annots.h5ad?download=true",)

  from .autonotebook import tqdm as notebook_tqdm

INFO:datasets:PyTorch version 2.6.0 available.
INFO:datasets:Polars version 1.33.0 available.
INFO:helical.utils.downloader:Starting to download: 'https://huggingface.co/datasets/helical-ai/yolksac_human/resolve/main/data/17_04_24_YolkSacRaw_F158_WE_annots.h5ad?download=true'
yolksac_human.h5ad: 100%|██████████| 553M/553M [00:04<00:00, 116MB/s]  


# STATE Embeddings

Using the STATE model we can obtain single cell transcriptome embeddings. We first slice the dataset for demonstration purposes.

In [2]:
# load the data 
import scanpy as sc

adata = sc.read_h5ad("yolksac_human.h5ad")
# for demonstration we subset to 10 cells and 2000 genes
adata = adata[:10, :2000].copy()

print(adata.shape)
n_cells = adata.n_obs
print(n_cells)

(10, 2000)
10


Initialise the model - this will download the relevant files needed in `.cache/helical/state/`. It will download the necessary files when run the first time so will take slightly longer. 


In [3]:
from helical.models.state import StateConfig    
from helical.models.state import StateEmbed

state_config = StateConfig(batch_size=16)
state_embed = StateEmbed(configurer=state_config)

INFO:helical.models.state.state_embeddings:Using model checkpoint: /home/rasched/.cache/helical/models/state/state_embed/se600m_model_weights.pt
INFO:helical.models.state.state_embeddings:Successfully loaded model


We process the data by calling `state_embed.process_data` and pass this into `state_embed.get_embeddings` to get the final embeddings.

In [4]:
processed_data = state_embed.process_data(adata=adata)
embeddings = state_embed.get_embeddings(processed_data)

# note that the STATE model returns a numpy array of shape (n_cells, 1024)
print(embeddings.shape)
print(type(embeddings))

# store the embeddings in adata.obsm['state_emb']
adata.obsm['state_emb'] = embeddings

INFO:helical.models.state.state_embeddings:Auto-detected gene column: var.index (overlap: 113/19790 protein embeddings, 5.7% of genes)
INFO:/home/rasched/final_helical_with_state/helical/helical/models/state/model_dir/embed_utils/loader.py:113 genes mapped to embedding file (out of 2000)
INFO:/home/rasched/final_helical_with_state/helical/helical/models/state/model_dir/embed_utils/loader.py:113 genes mapped to embedding file (out of 2000)
Encoding: 100%|██████████| 1/1 [00:00<00:00,  1.26it/s]

(10, 2058)
<class 'numpy.ndarray'>





# STATE Perturbations

To use the perturbation model you can either pass in embeddings by specifiyng the `embed_key` arguement in `stateConfig` or use the deafult `None` value in which case the expression values are used (`adata.X`).

For use of previous embeddings, the `embed_key` must exist in `adata.obsm[<embed_key>]` otherwise an error will be thrown. When set to `None` the model uses `adata.X`.

Let's create some dummy data for the previous example.

In [5]:
import numpy as np
# some default control and non-control perturbations
perturbations = [
    "[('DMSO_TF', 0.0, 'uM')]",  # Control
    "[('Aspirin', 0.5, 'uM')]",
    "[('Dexamethasone', 1.0, 'uM')]",
]

n_cells = adata.n_obs
# we assign perturbations to cells randomly
adata.obs['target_gene'] = np.random.choice(perturbations, size=n_cells)
adata.obs['cell_type'] = adata.obs['LVL1']  # Use your cell type column
# we can also add a batch variable to take into account batch effects
batch_labels = np.random.choice(['batch_1', 'batch_2', 'batch_3', 'batch_4'], size=n_cells)
adata.obs['batch_var'] = batch_labels

config = StateConfig(
    embed_key=None,
    pert_col="target_gene",
    celltype_col="cell_type",
    control_pert="[('DMSO_TF', 0.0, 'uM')]",
    output_path="yolksac_perturbed.h5ad",
)


Now we can run the perturbation model.

In [6]:
from helical.models.state import StateTransitionModel

state_transition = StateTransitionModel(configurer=config)

# again we process the data and get the perturbed embeddings
processed_data = state_transition.process_data(adata)
perturbed_embeds = state_transition.get_embeddings(processed_data)

print(perturbed_embeds.shape)

INFO:helical.models.state.state_transition:Using checkpoint: /home/rasched/.cache/helical/models/state/state_transition/final.ckpt
INFO:helical.models.state.model_dir.perturb_utils.base:Loaded decoder from checkpoint decoder_cfg: {'latent_dim': 2000, 'gene_dim': 2000, 'hidden_dims': [1024, 1024, 512], 'dropout': 0.1, 'residual_decoder': False}
INFO:helical.models.state.state_transition:Model device: cuda:0
INFO:helical.models.state.state_transition:Model cell_set_len (max sequence length): 256
INFO:helical.models.state.state_transition:Model uses batch encoder: True
INFO:helical.models.state.state_transition:Model output space: gene
INFO:helical.models.state.state_transition:Using adata.X as input features
INFO:helical.models.state.state_transition:Cells: total=10, control=5, non-control=5
INFO:helical.models.state.state_transition:Running virtual experiment (homogeneous per-perturbation forward passes; controls included)...
Group ERYTHROID:   0%|          | 0/2 [00:00<?, ?it/s, Pert: 

(10, 2000)


# Finetuning STATE

We can finetune the STATE perturbation embeddings using an additional head for downstream classification and regression. Below is a dummy example using data above to get you started.

In [7]:
from helical.models.state import StateFineTuningModel

# Dummy cell types and labels for demonstration
cell_types = list(adata.obs['LVL1'])
label_set = set(cell_types)
print(f"Found {len(label_set)} unique cell types:")

config = StateConfig(
    embed_key=None,
    pert_col="target_gene",
    celltype_col="cell_type",
    control_pert="[('DMSO_TF', 0.0, 'uM')]",
    batch_size=8,
)

# Create the fine-tuning model - we use a classification head for demonstration
model = StateFineTuningModel(
    configurer=config, 
    fine_tuning_head="classification", 
    output_size=len(label_set),
)

# Process the data for training - returns a dataset object
dataset = model.process_data(adata)

# Create a dictionary mapping the classes to unique integers for training
class_id_dict = dict(zip(label_set, [i for i in range(len(label_set))]))

# Convert cell type labels to integers
cell_type_labels = [class_id_dict[ct] for ct in cell_types]

print(f"Class mapping: {class_id_dict}")

# Fine-tune
model.train(train_input_data=dataset, train_labels=cell_type_labels)

Found 3 unique cell types:


INFO:helical.models.state.state_transition:Using checkpoint: /home/rasched/.cache/helical/models/state/state_transition/final.ckpt
INFO:helical.models.state.model_dir.perturb_utils.base:Loaded decoder from checkpoint decoder_cfg: {'latent_dim': 2000, 'gene_dim': 2000, 'hidden_dims': [1024, 1024, 512], 'dropout': 0.1, 'residual_decoder': False}
INFO:helical.models.state.state_transition:Model device: cuda:0
INFO:helical.models.state.state_transition:Model cell_set_len (max sequence length): 256
INFO:helical.models.state.state_transition:Model uses batch encoder: True
INFO:helical.models.state.state_transition:Model output space: gene
INFO:helical.models.state.fine_tuning_model:Backbone frozen: True
INFO:helical.models.state.fine_tuning_model:Processing data for state model fine-tuning.
INFO:helical.models.state.state_transition:Using adata.X as input features
INFO:helical.models.state.state_transition:Cells: total=10, control=5, non-control=5
INFO:helical.models.state.state_transition:R

Class mapping: {'MYELOID': 0, 'STROMA': 1, 'ERYTHROID': 2}


Fine-Tuning: epoch 1/1: 100%|██████████| 2/2 [00:00<00:00, 56.81it/s, loss=1.15]
INFO:helical.models.state.fine_tuning_model:Fine-Tuning Complete. Epochs: 1


# Training STATE for the Virtual Cell Challenge

We use data from the Virtual Cell Challenge for model training and downstream inference. For this we require the VCC dataset as in the colab notebook by the authors. See the relevant code snippet for the entire dataset in the below colab notebook:

[STATE Colab Notebook](https://colab.research.google.com/drive/1QKOtYP7bMpdgDJEipDxaJqOchv7oQ-_l)

For demonstration we have created a subset of the data. We also need to change the filepath in `starter.toml` to point to the correct dataset location (see top of file), but this is done below in the code. Start by downloading the data:

In [8]:
from helical.utils.downloader import Downloader
from helical.constants.paths import CACHE_DIR_HELICAL
import toml
from pathlib import Path

downloader = Downloader()
downloader.download_via_name("state/sample_vcc_data/config.yaml")
downloader.download_via_name("state/sample_vcc_data/starter.toml")
downloader.download_via_name("state/sample_vcc_data/gene_names.csv")
downloader.download_via_name("state/sample_vcc_data/ESM2_pert_features.pt")
downloader.download_via_name("state/sample_vcc_data/hepg2_mini.h5")
downloader.download_via_name("state/sample_vcc_data/rpe1_mini.h5")
downloader.download_via_name("state/sample_vcc_data/test.h5ad")

toml.dump({**toml.load(open(Path(CACHE_DIR_HELICAL, "state/sample_vcc_data/starter.toml"))),**{"datasets": {"replogle_h1": str(Path(CACHE_DIR_HELICAL, "state/sample_vcc_data").absolute() / "{rpe1_mini,hepg2_mini}.h5")}},},open(Path(CACHE_DIR_HELICAL, "state/sample_vcc_data/starter.toml"), "w"))

INFO:helical.utils.downloader:Downloading 'state/sample_vcc_data/starter.toml'
INFO:helical.utils.downloader:Starting to download: 'https://helicalpackage.s3.eu-west-2.amazonaws.com/state/sample_vcc_data/starter.toml'
starter.toml: 100%|██████████| 465/465 [00:00<00:00, 4.19MB/s]
INFO:helical.utils.downloader:File saved to: '/home/rasched/.cache/helical/models/state/sample_vcc_data/starter.toml'


'[datasets]\nreplogle_h1 = "/home/rasched/.cache/helical/models/state/sample_vcc_data/{rpe1_mini,hepg2_mini}.h5"\n\n[training]\nreplogle_h1 = "train"\n\n[zeroshot]\n"replogle_h1.hepg2" = "test"\n\n[fewshot]\n'

We use the `stateTransitionTrainModel` class and initialise training configurations using the `config.yaml` file in the sample directory. You can edit these based on your training preferences. Currently this is set to one epoch for demonstration.

In [9]:
# we can then train the model and perform inference on a held out test set
from helical.models.state import StateTransitionTrainModel
from omegaconf import OmegaConf

train_configs = OmegaConf.load(Path(CACHE_DIR_HELICAL, "state/sample_vcc_data/config.yaml"))
# set the correct paths for the data
train_configs.data.kwargs.toml_config_path = str(CACHE_DIR_HELICAL / "state/sample_vcc_data/starter.toml")
train_configs.data.kwargs.perturbation_features_file = str(CACHE_DIR_HELICAL / "state/sample_vcc_data/ESM2_pert_features.pt")

state_train = StateTransitionTrainModel(configurer=train_configs)
state_train.train() 
state_train.predict() 

INFO: Seed set to 42
INFO:lightning.fabric.utilities.seed:Seed set to 42
INFO:cell_load.config:Configuration validation passed
INFO:cell_load.data_modules.perturbation_dataloader:Initializing DataModule: batch_size=16, workers=4, random_seed=42


/home/rasched/.cache/helical/models/state/sample_vcc_data/{rpe1_mini,hepg2_mini}.h5


INFO:cell_load.data_modules.perturbation_dataloader:Set 2 missing perturbations to zero vectors.
INFO:cell_load.data_modules.perturbation_dataloader:Loaded custom perturbation featurizations for 19792 perturbations.
INFO:cell_load.data_modules.perturbation_dataloader:Processing dataset replogle_h1:
INFO:cell_load.data_modules.perturbation_dataloader:  - Training dataset: True
INFO:cell_load.data_modules.perturbation_dataloader:  - Zeroshot cell types: ['hepg2']
INFO:cell_load.data_modules.perturbation_dataloader:  - Fewshot cell types: []
Processing replogle_h1: 100%|██████████| 2/2 [00:00<00:00, 386.55it/s]
INFO:cell_load.data_modules.perturbation_dataloader:

INFO:cell_load.data_modules.perturbation_dataloader:Done! Train / Val / Test splits: 1 / 0 / 1


Processed rpe1_mini: 100 train, 0 val, 0 test
Processed hepg2_mini: 0 train, 0 val, 100 test
Model created. Estimated params size: 0.61 GB and 650505936 parameters


INFO:helical.models.state.state_train:Loggers and callbacks set up.
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:helical.models.state.state_train:Starting trainer fit.
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO: 
  | Name                 | Type                    | Params | Mode 
-------------------------------------------------------------------------
0 | loss_fn              | SamplesLoss             | 0      | train
1 | pert_encoder         | Sequential              | 4.8 M  | train
2 | basal_encoder        | Linear                  | 12.2 M | train
3 | t

Trainer built successfully
Sanity Checking: |          | 0/? [00:00<?, ?it/s]

INFO:cell_load.data_modules.samplers:Creating perturbation batch sampler with metadata caching (using codes)...
INFO:cell_load.data_modules.samplers:Total # cells 100. Cell set size mean / std before resampling: 4.76 / 11.85.
INFO:cell_load.data_modules.samplers:Creating meta-batches with cell_sentence_len=128...
INFO:cell_load.data_modules.samplers:Of all batches, 0 were full and 21 were partial.
INFO:cell_load.data_modules.samplers:Sampler created with 2 batches in 0.00 seconds.
INFO:cell_load.data_modules.samplers:Of all batches, 0 were full and 21 were partial.



Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]




Sanity Checking DataLoader 0:  50%|█████     | 1/2 [00:00<00:00, 15.72it/s]




                                                                           

INFO:cell_load.data_modules.samplers:Creating perturbation batch sampler with metadata caching (using codes)...
INFO:cell_load.data_modules.samplers:Total # cells 100. Cell set size mean / std before resampling: 4.55 / 12.04.
INFO:cell_load.data_modules.samplers:Creating meta-batches with cell_sentence_len=128...
INFO:cell_load.data_modules.samplers:Of all batches, 0 were full and 22 were partial.
INFO:cell_load.data_modules.samplers:Sampler created with 2 batches in 0.00 seconds.




INFO:cell_load.data_modules.samplers:Of all batches, 0 were full and 22 were partial.



Epoch 0: 100%|██████████| 2/2 [00:01<00:00,  1.28it/s, v_num=0]

INFO: `Trainer.fit` stopped: `max_epochs=1` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=1` reached.


Epoch 0: 100%|██████████| 2/2 [00:01<00:00,  1.28it/s, v_num=0]
Training completed, saving final checkpoint...


INFO:cell_load.data_modules.samplers:Creating perturbation batch sampler with metadata caching (using codes)...
INFO:cell_load.data_modules.samplers:Total # cells 100. Cell set size mean / std before resampling: 4.76 / 11.85.
INFO:cell_load.data_modules.samplers:Creating meta-batches with cell_sentence_len=128...
INFO:cell_load.data_modules.samplers:Of all batches, 21 were full and 0 were partial.
INFO:cell_load.data_modules.samplers:Sampler created with 21 batches in 0.00 seconds.
INFO:helical.models.state.state_train:Loading model from sample_run/first_run/final.ckpt
INFO:helical.models.state.state_train:Model loaded successfully.
INFO:helical.models.state.state_train:Generating predictions on test set using manual loop...
Predicting:   0%|          | 0/21 [00:00<?, ?batch/s]INFO:cell_load.data_modules.samplers:Of all batches, 21 were full and 0 were partial.
Predicting: 100%|██████████| 21/21 [00:01<00:00, 17.99batch/s]
INFO:helical.models.state.state_train:Creating anndatas from pr

The trained model will be saved to the `sample_vcc_data/first_run` directory, alongside the necessary files and checkpoints to intialise a new model. We can initialise `stateTransitionModel` as before and run inference.

In [10]:
from helical.models.state import StateTransitionModel
from helical.models.state import StateConfig
import scanpy as sc

adata = sc.read_h5ad(Path(CACHE_DIR_HELICAL, "state/sample_vcc_data/test.h5ad"))

state_config = StateConfig(
    output_path = "sample_run/prediction.h5ad",
    perturb_dir = "sample_run/first_run",
    pert_col = "target_gene",
)

state_transition = StateTransitionModel(configurer=state_config)
processed_data = state_transition.process_data(adata)
embeds = state_transition.get_embeddings(processed_data)

INFO:helical.models.state.state_transition:Using checkpoint: sample_run/first_run/final.ckpt
INFO:helical.models.state.state_transition:Model device: cpu
INFO:helical.models.state.state_transition:Model cell_set_len (max sequence length): 128
INFO:helical.models.state.state_transition:Model uses batch encoder: False
INFO:helical.models.state.state_transition:Model output space: all
INFO:helical.models.state.state_transition:Grouping by cell type column: cell_type
INFO:helical.models.state.state_transition:Using adata.X as input features
INFO:helical.models.state.state_transition:Cells: total=100, control=50, non-control=50
INFO:helical.models.state.state_transition:Running virtual experiment (homogeneous per-perturbation forward passes; controls included)...
Group H1: 100%|██████████| 26/26 [00:00<00:00, 87.77it/s, Pert: non-targeting           ]
INFO:helical.models.state.state_transition:--Complete--
Input cells: 100, Control simulated: 50, Treated simulated: 50
INFO:helical.models.st

Now you can use the `cell-eval` package to create a submission to the Virtual Cell Challenge (generates a `.vcc` file).

In [11]:
gene_file = CACHE_DIR_HELICAL / "state/sample_vcc_data/gene_names.csv"
input_file = "sample_run/prediction.h5ad"

! pip install cell-eval
! cell-eval prep -i {input_file} -g {gene_file}

INFO:cell_eval._cli._prep:Reading input anndata
INFO:cell_eval._cli._prep:Reading gene list
INFO:cell_eval._cli._prep:Preparing anndata
INFO:cell_eval._cli._prep:Using 32-bit float encoding
INFO:cell_eval._cli._prep:Setting data to sparse if not already
INFO:cell_eval._cli._prep:Simplifying obs dataframe
INFO:cell_eval._cli._prep:Simplifying var dataframe
INFO:cell_eval._cli._prep:Creating final minimal AnnData object
INFO:cell_eval._cli._prep:Applying normlog transformation if required
INFO:cell_eval._evaluator:Input is found to be log-normalized already - skipping transformation.
INFO:cell_eval._cli._prep:Writing h5ad output to /tmp/tmpvilc9i_j/pred.h5ad
INFO:cell_eval._cli._prep:Zstd compressing /tmp/tmpvilc9i_j/pred.h5ad
/tmp/tmpvilc9i_j/pred.h5ad : 19.31%   (  7.50 MiB =>   1.45 MiB, /tmp/tmpvilc9i_j/pred.h5ad.zst) 
INFO:cell_eval._cli._prep:Packing files into sample_run/prediction.prep.vcc
INFO:cell_eval._cli._prep:Done


In [None]:
# import scanpy as sc
# import numpy as np

# def create_balanced_mini_dataset(input_path, output_path, n_cells=100):
#     """
#     Create a mini dataset that preserves both control and perturbation cells
#     """
#     adata = sc.read_h5ad(input_path)
    
#     # Find control and perturbation cells
#     control_mask = adata.obs['target_gene'] == 'non-targeting'
#     pert_mask = ~control_mask
    
#     control_indices = np.where(control_mask)[0]
#     pert_indices = np.where(pert_mask)[0]
    
#     print(f"Original: {len(control_indices)} control, {len(pert_indices)} perturbation cells")
    
#     # Sample proportionally
#     n_control = min(n_cells // 2, len(control_indices))
#     n_pert = min(n_cells - n_control, len(pert_indices))
    
#     # If we need more cells, fill with the remaining type
#     if n_control + n_pert < n_cells:
#         if len(control_indices) > n_control:
#             n_control = min(n_cells, len(control_indices))
#             n_pert = 0
#         elif len(pert_indices) > n_pert:
#             n_pert = min(n_cells, len(pert_indices))
#             n_control = 0
    
#     # Sample indices
#     np.random.seed(42)
#     sampled_control = np.random.choice(control_indices, size=n_control, replace=False) if n_control > 0 else np.array([])
#     sampled_pert = np.random.choice(pert_indices, size=n_pert, replace=False) if n_pert > 0 else np.array([])
    
#     # Combine and create new dataset
#     all_sampled = np.concatenate([sampled_control, sampled_pert])
#     adata_mini = adata[all_sampled, :].copy()
    
#     print(f"Mini dataset: {len(sampled_control)} control, {len(sampled_pert)} perturbation cells")
#     print(f"Total: {adata_mini.shape}")
    
#     adata_mini.write_h5ad(output_path)
#     return adata_mini

# # Create a balanced mini dataset
# mini_val = create_balanced_mini_dataset("competition_support_set/competition_val_template.h5ad", "competition_support_set/mini_val_balanced.h5ad", n_cells=100)

# import scanpy as sc
# import anndata as ad

# def truncate_adata_file_complete(input_path, output_path, max_cells=100, max_genes=None):
#     """
#     Truncate an AnnData file and handle ALL fields properly
#     """
#     print(f"Loading {input_path}...")
#     adata = sc.read_h5ad(input_path)
    
#     print(f"Original shape: {adata.shape}")
#     print(f"Original obsm keys: {list(adata.obsm.keys())}")
    
#     # Truncate cells
#     if max_cells and adata.n_obs > max_cells:
#         print(f"Truncating to {max_cells} cells...")
        
#         # Truncate main data
#         adata = adata[:max_cells, :].copy()
        
#         # Manually truncate obsm fields that might not be handled properly
#         for key in adata.obsm.keys():
#             matrix = adata.obsm[key]
#             if hasattr(matrix, 'shape') and len(matrix.shape) > 0:
#                 if matrix.shape[0] > max_cells:
#                     print(f"Truncating obsm['{key}'] from {matrix.shape} to ({max_cells}, {matrix.shape[1] if len(matrix.shape) > 1 else 'N/A'})")
#                     adata.obsm[key] = matrix[:max_cells]
    
#     # Truncate genes (optional)
#     if max_genes and adata.n_vars > max_genes:
#         print(f"Truncating to {max_genes} genes...")
        
#         # Truncate main data
#         adata = adata[:, :max_genes].copy()
        
#         # Manually truncate varm fields
#         for key in adata.varm.keys():
#             matrix = adata.varm[key]
#             if hasattr(matrix, 'shape') and len(matrix.shape) > 0:
#                 if matrix.shape[0] > max_genes:
#                     print(f"Truncating varm['{key}'] from {matrix.shape} to ({max_genes}, {matrix.shape[1] if len(matrix.shape) > 1 else 'N/A'})")
#                     adata.varm[key] = matrix[:max_genes]
    
#     print(f"New shape: {adata.shape}")
#     print(f"New obsm keys: {list(adata.obsm.keys())}")
    
#     # Save truncated file
#     print(f"Saving to {output_path}...")
#     adata.write_h5ad(output_path)
    
#     return adata

# # Create mini version of rpe1.h5
# rpe1_mini = truncate_adata_file_complete('sample_vcc_data/rpe1.h5', 'sample_vcc_data/rpe1_mini.h5', max_cells=100)
# rpe1_mini = truncate_adata_file_complete('sample_vcc_data/hepg2.h5', 'sample_vcc_data/hepg2_mini.h5', max_cells=100)
