# Step-by-step Guide: Run Cell Maps VNN programmatically in your project

This tutorial demonstrates how to use the cellmaps_vnn package programmatically, providing full control and flexibility to integrate it into your own Python projects or pipelines. Rather than relying on command-line usage, you will learn how to perform all key steps—such as training, predicting, annotating, and interpreting VNN (Virtual Neuron Network) models—through direct Python function calls.

#### This is especially useful when:
- Automating workflows across many datasets
- Integrating VNN functionality into larger analysis pipelines
- Running in cloud environments or Jupyter notebooks
- Customizing and extending the VNN pipeline

#### Tutorial
This guide walks you through integrating cellmaps_vnn into a Python project using its core programmatic interface via CellmapsvnnRunner. All examples use the demo data provided in the examples/ directory.

### Installation

It is highly recommended to create conda virtual environment and run jupyter from there.

`conda create -n vnn_env python=3.11`

`conda activate vnn_env`

To install Cellmaps Pipeline run:

`pip install cellmaps_vnn`

Exit the notebook and reopen it in `vnn_env` environtment.

### Setup and Input Data

In [5]:
from cellmaps_vnn.train import VNNTrain
from cellmaps_vnn.predict import VNNPredict
from cellmaps_vnn.annotate import VNNAnnotate
from cellmaps_vnn.runner import CellmapsvnnRunner

# Input directory for training (directory with hierarchy.cx2)
inputdir = '../examples/'

# Training and test data
training_data_path = '../examples/training_data.txt'
test_data = '../examples/test_data.txt'

# Cell feature data
gene2id = '../examples/gene2ind.txt'
cell2id = '../examples/cell2ind.txt'
mutations = '../examples/cell2mutation.txt'
cn_deletions = '../examples/cell2cndeletion.txt'
cn_amplifications = '../examples/cell2cnamplification.txt'

# Output directories
train_outdir = './out_train'
predict_outdir = './out_predict'
annotate_outdir = './out_annotate'

# Optionally: Specify desired parameters. 
# This is only example, the list of all parameters is available at 
# https://cellmaps-vnn.readthedocs.io/en/latest/usage_command_line.html
epoch = 30
lr = 0.0005
zscore_method = 'auc'

### Training of VNN Model

In this step, we train a Visible Neural Network (VNN) using cell features and drug response data. The output includes the trained model and logs.

What this does:
- Loads cell and gene data (IDs and features).
- Uses training response data (e.g., drug sensitivity).
- Builds and trains a biologically structured neural network guided by a functional hierarchy.
- Outputs a trained model, ready for prediction.

In [6]:
train_cmd = VNNTrain(
    outdir=train_outdir,
    inputdir=inputdir,
    training_data=training_data_path,
    gene2id=gene2id,
    cell2id=cell2id,
    mutations=mutations,
    cn_deletions=cn_deletions,
    cn_amplifications=cn_amplifications,
    epoch=epoch,
    lr=lr,
    zscore_method=zscore_method
)

runner = CellmapsvnnRunner(
    outdir=train_outdir,
    command=train_cmd,
    inputdir=inputdir
)

runner.run()

No hierarchy parent in the input directory. Cannot copy.


0

### Predict with Trained Model

Now that we have a trained model, we can apply it to new test data (e.g., unseen cell lines) to generate predicted outcomes. This step uses the trained VNN model to make predictions on test samples and automatically performs interpretation using biological knowledge.

What this does:
- Uses the trained model to make predictions on test samples.
- Inputs the same molecular features (genes, mutations, CNVs).
- Outputs a file with predicted values for each test sample and feature of interest (e.g., drug response).
- Computes interpretation scores:
    - RLIPP scores (Relative Importance of Predictor Performance) quantify how much each biological term (e.g., pathway, process) contributes to the model's predictions.
    - Gene importance scores estimate which genes most influence the prediction. (**NOT IMPLEMENTED**: for now it generates random scores)

In [7]:
predict_cmd = VNNPredict(
    outdir=predict_outdir,
    inputdir=train_outdir,  # use model from training output
    predict_data=test_data,
    gene2id=gene2id,
    cell2id=cell2id,
    mutations=mutations,
    cn_deletions=cn_deletions,
    cn_amplifications=cn_amplifications,
    zscore_method=zscore_method
)

runner = CellmapsvnnRunner(
    outdir=predict_outdir,
    command=predict_cmd,
    inputdir=train_outdir
)

runner.run()

Starting prediction process
Starting score calculation




Prediction and interpretation executed successfully


FAIRSCAPE hidden files registration:   0%|▍                                                                                                                                         | 5/1698 [00:00<04:50,  5.82it/s]FAIRSCAPE cannot handle too many files, skipping rest
FAIRSCAPE hidden files registration:   0%|▍                                                                                                                                         | 5/1698 [00:01<05:49,  4.84it/s]


0

### Annotate Hierarchy with Importance Scores

In the final step, we enrich the biological hierarchy used during model training with importance scores derived from the predictions.

What this does:
- Reads model interpretation results (e.g., RLIPP scores) from the prediction output.
- Annotates a biological hierarchy (e.g., pathway network) by assigning term-level importance scores such as: P_rho, C_rho, and RLIPP for each system or pathway node.
- Optionally filters annotations by disease, or aggregates scores across diseases if not specified.
- Adds edge-level scores to visualize importance propagation along the hierarchy.
- Outputs a styled, annotated CX2 network suitable for NDEx visualization and biological interpretation.

In [8]:
annotate_cmd = VNNAnnotate(
    outdir=annotate_outdir,
    model_predictions=[predict_outdir],
    hierarchy=hierarchy_file
)

runner = CellmapsvnnRunner(
    outdir=annotate_outdir,
    command=annotate_cmd,
    inputdir=predict_outdir
)

runner.run()

0

#### Optional: Upload to NDEx
If NDEx credentials are provided, the annotated hierarchy and associated interactomes (gene subnetworks for key systems) can be uploaded for public sharing and visualization in Cytoscape Web.

In [13]:
annotate_vis_outdir = 'out_annotate_vis'

annotate_cmd = VNNAnnotate(
    outdir=annotate_vis_outdir,
    model_predictions=[predict_outdir],
    hierarchy=hierarchy_file,
    parent_network='0b7b8aee-332f-11ef-9621-005056ae23aa' # UUID of interactome
    ndexserver='ndexbio.org',
    ndexuser='USER',               # replace with your NDEx username
    ndexpassword='PASSWORD',       # replace with your NDEx password
    visibility=True                # Make uploaded networks public (optional)
)

runner = CellmapsvnnRunner(
    outdir=annotate_vis_outdir,
    command=annotate_cmd,
    inputdir=predict_outdir
)

runner.run()