# **scNET: Learning Context-Specific Gene and Cell Embeddings by Integrating Single-Cell Gene Expression Data with Protein-Protein Interaction Information**

## **Creating the workspace**

**Mount your drive**

In [4]:
from google.colab import drive
drive.mount("/content/gdrive")

PATH = "/content/gdrive/MyDrive"
%cd $PATH

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
/content/gdrive/MyDrive


**Clone GitHub repository**


In [8]:
#!git clone https://github.com/madilabcode/scNET
%cd "./scNET"

/content/gdrive/MyDrive/scNET


**Download example data - as an h5ad object**

In [None]:
import gdown
download_url = f'https://drive.google.com/uc?id=1YxQL1_BcbeAqVS7KhqzFzIJIyZjtLCcG'
output_path = './Data/example.h5ad'
gdown.download(download_url, output_path, quiet=False)

Downloading...
From (original): https://drive.google.com/uc?id=1YxQL1_BcbeAqVS7KhqzFzIJIyZjtLCcG
From (redirected): https://drive.google.com/uc?id=1YxQL1_BcbeAqVS7KhqzFzIJIyZjtLCcG&confirm=t&uuid=19763705-368a-437c-b0fe-f41dc2af6796
To: /content/gdrive/MyDrive/scNET/Data/example.h5ad
100%|██████████| 967M/967M [00:09<00:00, 98.0MB/s]


'./Data/example.h5ad'

**Downlaod Packages**

In [2]:
# Add this in a Google Colab cell to install the correct version of Pytorch Geometric.
import torch

def format_pytorch_version(version):
  return version.split('+')[0]

TORCH_version = torch.__version__
TORCH = format_pytorch_version(TORCH_version)

def format_cuda_version(version):
  return 'cu' + version.replace('.', '')

CUDA_version = torch.version.cuda
CUDA = format_cuda_version(CUDA_version)

#!pip install torch-scatter     -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
#!pip install torch-sparse      -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
#!pip install torch-cluster     -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
#!pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-geometric
!pip install scanpy
!pip install mygene
!pip install gseapy
!pip install matplotlib datashader bokeh holoviews scikit-image colorcet igraph leidenalg

Collecting torch-geometric
  Downloading torch_geometric-2.5.3-py3-none-any.whl (1.1 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m0.6/1.1 MB[0m [31m16.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch-geometric
Successfully installed torch-geometric-2.5.3
Collecting scanpy
  Downloading scanpy-1.10.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m29.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting anndata>=0.8 (from scanpy)
  Downloading anndata-0.10.7-py3-none-any.whl (122 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 kB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
Collecting legacy-api-wrap>=1.4 (from

**Imports**

In [None]:
import os
import torch
from main import main
import warnings
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
torch.manual_seed(42)

<torch._C.Generator at 0x7ea69872e870>

## **Train the model - the model, embeddings, and KNN will be saved**


### **The following parameters need to be provided to the main function in order to start training the model:**

**path:** The directory for the h5ad object.

**human_flag:** Is the data from human (True) or mouse (False)?

**pre_processing_flag:** Should a basic pre-processing pipeline be applied to the object?

**number_of_batches:** The number of batches for the training.

**split_cells:** If the number of cells is too large, we should split the cells into subsets to reduce GPU memory consumption. If true, the cells will be split.

**model_name:** A unique name for the project, which will be used in the outputs of the model.

### **The function will save the following:**

**cell_embedding:** A new dense representation of the cells under the Embedding folder.

**gene_embedding:** A new dense representation of the genes under the Embedding folder.

**trained_model:** The complete trained model under the Models folder.

**new_knn:** The new and pruned KNN network, but only if the split_cells flag is set to False.

In [None]:
main(path = r"./Data/example.h5ad", pre_processing_flag=False, human_flag=True,number_of_batches=10, split_cells= True, max_epoch=50, model_name = "_example")

**Example of using the gene embedding for creating co-embedded network and performing KEGG prediction**

**Load all the relevant embeddings**

In [None]:
import coEmbeddedNetwork as cen
gene_embedding, cell_embedding, node_feature = cen.load_embeddings("_example")

**We can plot the genes UMAP**

In [None]:
cen.plot_gene_umap_clustring(gene_embedding)

**Creating the co-embedded network, is it modular?**

In [None]:
graph, mod = cen.build_co_embeded_network(gene_embedding, node_feature)
mod

**Now we can use the network to predict KEGG pathway membership in a cross-validation**

In [None]:
cen.predict_kegg(gene_embedding,node_feature)