# Cell-cell communication with scSeqComm
This notebook outlines the steps of inference, analysis and visualization of cell-cell communication for a single-cell RNA sequencing data using **scSeqComm**. 

For comprehensive instructions and detailed descriptions of the methods, please refer to the official [DOCUMENTATION](https://sysbiobig.gitlab.io/scSeqComm/articles/scseqcomm.html). 
The source code is available on GitHub: [sysbiobig/scseqcomm](https://gitlab.com/sysbiobig/scseqcomm).

## About scSeqComm
scSeqComm is a computational method developed to identify and quantify both intercellular and intracellular signaling events from single-cell RNA-sequencing (scRNA-seq) data. In addition to detecting communication between cell types, it provides a functional characterization of the inferred signaling, linking molecular interactions to known biological processes. A key contribution of scSeqComm is its ability to assess the downstream impact of intercellular communication within the receiving cells.
The central idea behind scSeqComm is to:
- Strengthen the evidence supporting inferred intercellular signaling events, and
- Enhance biological interpretability by functionally characterizing the intracellular response in target cells.

## Load the required libraries

In [1]:
library(scSeqComm)
library(Seurat)



Loading required package: SeuratObject

Loading required package: sp


Attaching package: 'SeuratObject'


The following objects are masked from 'package:base':

    intersect, t




## Data input 
scSeqComm requires five mandatory input files:
- **counts file**: a normalized gene expression matrix.
- **metadata file**: a table linking each cell barcode to its corresponding cell type or cluster label.
- **ligand-receptor pairs**: a set of putative ligand–receptor pairs used for inference.
- **transcriptional regulatory network**: set of  set of transcription factors (TFs) and their target genes (TGs), to be used in intracellular signaling computation.
- **receptor-transcription factor a-priori association**: a set of receptor-transcription factor (R-TF) a-priori association to be used in intracellular signaling computation.

### Gene expression data matrix
The gene expression data used for cell–cell communication analysis must be provided as a matrix where genes are represented by rows and cells by columns. It is essential that the row names correspond to gene names and the column names correspond to unique cell identifiers. Before analysis, the data must be normalized (e.g., library-size normalization and then log-transformed with a pseudocount of 1) is required. 
The normalized gene expression matrix can be provided as matrix object (not recommended for large datasets), dgCMatrix (recommended), or big.matrix object.



In [None]:
seurat_object <- readRDS("/Tutorial_ISMBECCB2025/data_vt3/first/2306scRNAseq_HMI_Ischemic.rds")

gene_expr <- seurat_object[["RNA"]]@data # normalized data matrix

### Cell metadata
The cell group information must be provided as dataframe object. There must be at least 2 columns, namely “Cell_ID” and “Cluster_ID” with cell identifiers and clustering/cell type assignment respectively. 

In [None]:
cell_metadata <- data.frame(Cell_ID = row.names(seurat_object@meta.data), 
                            Cluster_ID = seurat_object@meta.data$cell_type)

### Ligand-receptor pairs
Users must provide as input a set of ligand–receptor pairs to be used in intercellular signaling computation. For user convenience, scSeqComm includes 27 ligand–receptor pair databases derived from literature data of human (16 databases) and mouse (11 databases) species. Users can also specify their own ligand–receptor pairs in the form of an R data.frame. The data frame must contain two columns named “ligand” and “receptor”, with one ligand–receptor pair per row.

In [None]:
# print available resources for human
available_LR_pairs(species = "human")

# let's use ConnectomeDB 2020 database
LRdb <- LR_pairs_ConnectomeDB_2020
head(LRdb)

### Transcriptional regulatory network
To enable downstream analysis of intracellular signaling responses, users must provide a transcriptional regulatory network as input. This network consists of transcription factors (TFs) and their corresponding target genes (TGs).
For convenience, scSeqComm includes several curated TF–TG databases derived from literature sources. 
Alternatively, users can provide their own custom TF–TG network as an R named list, where each list element represents a transcription factor and contains a vector of its associated target genes.

In [None]:
# print available resources for human
available_TF_TG(species = "human")

# let's use Dorothea database
TFTGdb <- TF_TG_Dorothea

# and print the first few entries
print(head(TFTGdb,2))

### Receptor-Transcription factor a-priori association from gene signaling networks
For intracellular signaling inference, users must provide a data frame describing receptor–transcription factor (R–TF) a priori associations.
scSeqComm includes four precomputed R–TF association datasets for human and mouse, derived from KEGG and Reactome signaling pathways. These associations are computed using the Personalized PageRank (PPR) algorithm, where receptors serve as seed nodes to rank transcription factors based on pathway connectivity.

Users can also supply a custom R–TF association as an R data.frame with four required columns:
- receptor: receptor gene name
- pathway: pathway name
- tf: transcription factor name
- tf_PPR: association score (e.g., PPR score)

In [None]:
# print available resources for human
available_TF_PPR(species = "human")

#let's use the pre-computed R-TF association from KEGG database
RTF_association <- TF_PPR_KEGG_human
head(RTF_association)

# Inference of cell-cell communication
scSeqComm infers and quantifies both intercellular and intracellular signaling using a combination of statistical methods and network science techniques. 

Intercellular Signaling key concepts
- **Ligand score and receptor score**: measuring how much the observed ligand (receptor) average expression level is high compared to the average expression levels observable by chance for random genes in the same cluster
- **Intercellular score**: defined as the minimum between ligand score in the source cluster and receptor score in receiver cluster.

Intracellular analysis key concepts:
- **Graph topology**: computing the personalized PageRank scores of each TF node using as seed nodes the receptor node. For each receptor and pathway, scSeqComm use it as a measure of how much the given receptor R is associated with the given TF in pathway p 
- **Transcriptomic evidence of intracellular signaling**: performance of statistical tests to score the activity of transcription factor based on its DEG target gene
- **Intracellular score**: For each receptor-pathway combination within a receiving cell cluster, the final score is calculated as a weighted mean of the receptor–TF network score (PageRank) and the transcriptomic evidence score. 


It takes few minutes to run.

In [None]:
scSeqComm_res <- scSeqComm_analyze(gene_expr = gene_expr,                   # normalized gene expression matrix
                                  cell_metadata =  cell_metadata,           # A data.frame containing metadata information for each

                                  inter_signaling = T,                      # wheter comppute intercellular signaling
                                  LR_pairs_DB = LRdb,                       #A ligand-receptor pairs database in the form of a data.frame
                                  inter_scores = "scSeqComm",               # Intercellular signaling scoring schemes to be computed (default "scSeqComm")
                                  min_cells = 30,                           # Minimum number of cells that a cluster (i.e. cell type) should be composed of to compute scSeqComm score

                                  intra_signaling = T,                      # wheter comppute intracellular signaling
                                  TF_reg_DB = TFTGdb,                       # A transcriptional regulatory networks database in the form of a named list.
                                  R_TF_association = RTF_association,       # Receptor-transcription factor a-priori association from gene signaling networks in the form of a data.frame
                                  count_thr = 1,                            # A number representing the threshold to be used to limit DE testing genes. Genes which has an expression level > threshold in at least 25% of cells in one of the two group of cells are tested for DE. 

                                  N_cores = 2,                              # Number of core to be used during parallel computation.
                                  backend = "doParallel")                   # A character string specifying the parallel backend package to be used during parallel computation,i.e. one between "doParallel" or "doMC" (Default: "doParallel")

## Description of the output
The output of scSeqComm_analyze() is a list of three elements:
- **comm_results**: a dataframe containing detailed information on both intercellular and intracellular signaling for each ligand–receptor pair across all pairs of cell clusters.
- **LR_pairs_DB_scrnaseq**: a subset of input ligand-receptor pairs database LR_pairs_DB that includes only ligands and receptors that are present in the input scRNA-seq data
- **TF_reg_DB_scrnaseq**: a subset of input transcriptional regulatory networks database TF_reg_DB that includes only target genes that are present in the input scRNA-seq data


In [None]:
names(scSeqComm_res)

The main output is the dataframe (comm_results) quantifying the ongoing cellular communication in terms of 
- Evidence of intercellular signaling between source and target clusters.
- Evidence of intracellular signaling within the receiving cell cluster.

**Key columns related to intercellular communication**
The comm_results dataframe includes several key columns that describe intercellular signaling between cell clusters. These include: 
- *ligand*: Name of the ligand gene.
- *cluster_L*: Name of the source (ligand-expressing) cell cluster.
- *receptor*: Name of the receptor gene.
- *cluster_R*: Name of the receiver (receptor-expressing) cell cluster.
- *L_score_S_lr*: Ligand score — measures the expression level of the ligand in the source cluster, relative to random gene expression.
- *R_score_S_lr*: Receptor score — measures the expression level of the receptor in the receiver cluster, relative to random gene expression.
- *S_inter*: Intercellular signaling score — a value between 0 and 1 representing the strength of inferred intercellular communication between the source and receiver cell clusters


**Key columns related to intracellular communication**
The comm_results dataframe also includes columns describing the downstream intracellular signaling events triggered in the receiver cell cluster. These columns include:
- *receptor*: Name of the receptor gene involved in the signaling.
- *cluster_R*: Name of the receiver (receptor-expressing) cell cluster.
- *pathway*: Name of the signaling pathway (e.g., from KEGG or Reactome) linking the receptor to downstream transcriptional responses.
- *S_intra*: Intracellular signaling score — a value between 0 and 1 indicating the strength of inferred intracellular signaling.
- *genes*: DE target gene downstream of the receptor through the specified pathway.
- *up_genes* / *down_genes*: Lists of up-regulated and down-regulated target genes (from differential expression analysis) connected to the receptor via the pathway.

> A cellular communication can trigger different cellular responses in the receiving cell through different pathways: thus, to a ligand-receptor pair and cell cluster pair can be associated to multiple S_intra scores. Receptors with no known downstream signaling genes will have NA values in columns related to intracellular evidence.

In [None]:
head(scSeqComm_res$comm_results)

## Interpreting and visualizing the outputs
The results of scSeqComm_analyze() are multidimensional, involving multiple ligand–receptor pairs, numerous cell clusters, several signaling pathways. Therefore, scSeqComm provides to users a set of functions to select and summarize results of interest.

### Prioritizing Key Interactions
Often users want to prioritize their analyses, focusing on the strongest signals and, eventually, on the cellular communications of interest. The function scSeqComm_select() selects a subset of the inferred intercellular and intracellular signaling, by providing as input the selection criteria to be used for filtering the corresponding columns of the input data.frame (e.g. comm_results).
This function enables flexible subsetting of results based on custom selection criteria, such as: 
- Minimum S_inter or S_intra score thresholds
- Specific ligand or receptor genes
- Specific Ssender and receiver cell types
- Specific pathway names

In [None]:
## select cellular communication 
head(scSeqComm_select(scSeqComm_res$comm_results,                   # Data.frame comm_results obtained as output of scSeqComm_analyze() function.
                                  S_inter = 0.5,                    # A numeric value of S_inter score used as threshold
                                  S_intra = 0.5,                    # A numeric value of S_intra score used as threshold
                                  ligand = c('TGFB1'),              # A string or an array of strings representing ligand names to select.
                                  receptor = c("TGFBR2", "TGFBR3"), # A string or an array of strings representing receptor names to select.
                                  cluster_L = "CM",                 # A string or an array of strings representing "sender" (i.e. ligand expressing) cell cluster names to select.
                                  cluster_R = "Fib"))              # A string or an array of strings representing "receiver" (i.e. receptor expressing) cell cluster names to select.


In addition to filtering cellular communication results, scSeqComm allows users to summarize intracellular signaling evidence to facilitate interpretation and visualization. A given ligand–receptor pair between two cell clusters may be associated with multiple instances of intracellular signaling, each corresponding to a different KEGG or Reactome pathway that includes the receptor. As a result, several S_intra scores can be linked to the same receptor within a specific receiving cluster.

The function scSeqComm_summarize_S_intra() helps reduce this complexity by summarizing the intracellular signaling results. It does so by selecting the highest S_intra score among all the values associated with each receptor in a receiving cluster. This summarization provides a streamlined view of the results, where each ligand–receptor pair and cluster pair is represented by a single row corresponding to the strongest intracellular signaling evidence.

This approach enables users to interpret results at the ligand–receptor level and simplifies downstream analysis and visualization.

In [None]:
#summarise S_intra score for each ligand-receptor pair and cell cluster couple as max S_intra values
inter_max_intra_scores <- scSeqComm_summaryze_S_intra(scSeqComm_res$comm_results)


### Heatmap and Chordiagram: Visualizing Ligand–Receptor Interaction Counts

To help users explore the overall intensity of cell–cell communication, scSeqComm provides two visualization functions: scSeqComm_chordiag_cardinality() and scSeqComm_heatmap_cardinality(). Both functions display the cardinality (i.e., count) of ligand–receptor (LR) interactions between pairs of cell clusters.

While they represent the same underlying data, each function uses a distinct visualization technique:
- scSeqComm_chordiag_cardinality() generates a bipartite interactive chord diagram. This diagram illustrates signaling as arcs connecting sender and receiver clusters, where the arc width reflects the number of LR interactions. The outer arcs represent the total number of incoming and outgoing signals per cluster. This visualization is displayed in the RStudio Viewer panel.
- scSeqComm_heatmap_cardinality() produces a heatmap, where each cell represents the number of LR interactions from a source cluster (row) to a target cluster (column), providing a compact and quantitative overview.

These visualizations provide intuitive insights into the structure of intercellular communication networks, highlighting the most active sending and receiving cell types within the system.

Since the visualization operates at the ligand–receptor (LR) pair level, it is recommended to use the summarized version of intracellular signaling. 

In [None]:
## select the strongest communicatiom
selected_comm <- scSeqComm_select(inter_max_intra_scores,           # Data.frame comm_results obtained as output of scSeqComm_analyze() function.
                                  S_inter = 0.8,                    # A numeric value of S_inter score used as threshold
                                  S_intra = 0.8)                    # A numeric value of S_intra score used as threshold


scSeqComm_heatmap_cardinality(data = selected_comm,                                                                 # A dataframe containing results of signaling analysis.
                              title = "Ongoing cellular communication (inter- and intra-cellular evidence)")        # A character string to be used as title of plot

In [None]:
scSeqComm_chorddiag_cardinality(data = selected_comm)

### Dot Plot: Visualizing Relevant Ligand–Receptor Pairs
To support the interpretation of both intercellular and intracellular signaling, scSeqComm provides several plotting functions that combine two visualization techniques: proportional area charts (dot size) and heatmaps (dot color) into a single, intuitive dot plot.
The functions scSeqComm_plot_scores(), scSeqComm_plot_LR_pair(), and scSeqComm_plot_cluster_pair() allow users to visualize the combined signaling evidence inferred by the analysis.

- Intercellular signaling evidence (S_inter score) using dot size
- Intracellular signaling evidence (S_intra score) using dot color

A grey dot indicates active intercellular communication (S_inter > 0), but no available intracellular evidence (i.e., S_intra is NA), typically due to the absence of known downstream pathways for the receptor.

#### Multiple LR pairs across multiple cell-type pair
scSeqComm_plot_scores() function provides a global view of inferred signaling between all cell clusters by showing the S_inter and (maximum) S_intra scores for each ligand–receptor pair. Users can customize the plot layout by grouping the axes:
- Rows can represent receptors (default) or ligands
- Columns can represent receiver clusters (default) or sender clusters

These flexible options allow users to explore signaling patterns from different biological perspectives—focusing on signal origin, target response, or specific molecular components.

In [None]:
# subset communication
selected_comm = scSeqComm_select(inter_max_intra_scores, 
                                ligand = c("TGFB1", "SPP1", "COL1A1"),
                                cluster_L = c("CM", "Fib"))


scSeqComm_plot_scores(data = selected_comm,                             # A data frame containing results of signaling analysis.
                      title = "Intercellular and intracellular",        # A character string to be used as title of plot.
                      facet_grid_x = "cluster_L",                       # A variable defining faceting group on columns dimension.
                      facet_grid_y = "ligand")                          # A variable defining faceting group on row dimension.

#### One Ligand-receptor pair or One cell-type pair
Users can also visualize S_inter and (maximal) S_intra scores in more focused ways using the following functions:
- scSeqComm_plot_LR_pair() allows the visualization of signaling activity for a specific ligand–receptor pair across all sender–receiver cell cluster combinations.
- scSeqComm_plot_cluster_pair() focuses on a specific sender–receiver cell cluster pair, displaying the signaling activity across all ligand–receptor pairs involved in that interaction.

These targeted visualizations are especially useful for investigating specific communication pathways or validating hypotheses about particular cell–cell interactions.

In [None]:
scSeqComm_plot_LR_pair(data = inter_max_intra_scores,               # A data frame containing results of signaling analysis.
                       title = "COL1A1 - ITGA2",                      # A character string to be used as title of plot.
                       selected_LR_pair = "COL1A1 - ITGA2")           # String or array of strings containing ligand-receptor pairs to plot.

In [None]:
# plot scores contained in "selected_comm" for a selected cluster couple 
scSeqComm_plot_cluster_pair(data = selected_comm,                     # A data frame containing results of signaling analysis.
                       title = "Cardiomyocyte --> Fibroblast",        # A character string to be used as title of plot.
                       selected_cluster_pair = "CM --> Fib")          # String or array of strings containg cell clusters pairs to plot.

### Dot Plot: Pathway-Level Analysis
Similarly to scSeqComm_plot_scores() function, scSeqComm_plot_scores_pathway() function plots S_inter and S_intra scores for each ligand receptor pair and cell cluster couple, as well as information about pathways to which the intracellular evidence of the ongoing cellular communication is related.


In [None]:
# set of ligand-receptor pairs of interest
pairs <- c("TGFB1 - TGFBR1","SPP1 - ITGA4")

# select communication involving above LR pairs
selected_comm <- scSeqComm_select(scSeqComm_res$comm_results, 
                                  LR_pair = pairs,
                                  cluster_L = "CM")

# plots scores and pathway info
options(repr.plot.width = 10, repr.plot.height = 6)
scSeqComm_plot_scores_pathway(data = selected_comm,             # A data frame containing results of signaling analysis.
                              title = "Pathway analysis")       # A character string to be used as title of plot.

### Functionally characterize cellular response


scSeqComm enables users to functionally characterize the cellular response in receiving cells using the function scSeqComm_GO_analysis().

This function performs a Gene Ontology (GO) enrichment analysis on the set of differentially expressed genes downstream of receptors, as listed in the list_genes column of comm_results. The enrichment identifies biological processes potentially activated as a result of inferred cell–cell communication.

To ensure statistical robustness, users must provide a background gene list through the geneUniverse parameter. We recommend using the complete set of target genes included in the transcriptional regulatory network derived from the input scRNA-seq dataset (i.e., all genes appearing as targets in TF_reg_DB_scrnaseq).

Before running GO analysis, users may optionally apply scSeqComm_select() to focus on a subset of cellular communications of interest—for example, by filtering for interactions with high S_inter and S_intra scores or by selecting specific ligand–receptor pairs. This targeted selection allows a more focused functional interpretation of signaling events with strong transcriptomic support.

Additionally, to investigate how a specific receiver cell type responds to incoming signals, you can filter the communication results accordingly using scSeqComm_select(). This enables GO analysis and other downstream interpretations to be tailored to the cellular context of interest.

In [None]:
# gene universe is all the target genes in dataset
geneUniverse <- unique(unlist(scSeqComm_res$TF_reg_DB_scrnaseq))

# CAF communication with strong intercellular and intracellular evidence
Fib_comm <- scSeqComm_select(data = scSeqComm_res$comm_results,
                             cluster_R = "Fib",
                             S_inter = 0.8, S_intra = 0.8,
                             ligand = "TGFB1",
                             NA_included = FALSE)

# GO analysis of CAF communication
Fib_cell_functional_response <- scSeqComm_GO_analysis(results_signaling = Fib_comm,            # Data.frame containing results of signaling analysis
                                                     geneUniverse = geneUniverse,            # background genes used to compare genes of interest (DE genes)
                                                     col = "genes",                          # Character string specifying the column name of results_signaling where the gene set to be used for GO enrichment analysis is located.
                                                     method = "general",                     # A character string specifying if the analysis should be computed once considering all genes associated to a list of LRpairs ("general") or associated to each receptor in the LR pairs separately ("specific").
                                                     package = "clusterProfiler",            # Character vector specifying the package to use for G0 analysis, one between "topGO" (default) and "clusterProfiler".
                                                     OrgDb = "org.Hs.eg.db")                 # OrgDb object, i.e. Organism databases for mapping gene identifiers.

head(Fib_cell_functional_response)

Similarly, GO enrichment analysis can be performed independently for each receptor by setting method = "specific" in the scSeqComm_GO_analysis() function. This approach provides a functional characterization of the cellular response in each receiver cluster triggered by individual receptors.

The output is a list of data frames, each containing enrichment analysis results corresponding to a specific receptor. This allows users to explore receptor-specific biological processes activated downstream of cell–cell communication events.

In [None]:
Fib_cell_functional_response_rec <- scSeqComm_GO_analysis(results_signaling = Fib_comm,         # Data.frame containing results of signaling analysis
                                                        geneUniverse = geneUniverse,            # background genes used to compare genes of interest (DE genes)
                                                        col = "genes",                          # Character string specifying the column name of results_signaling where the gene set to be used for GO enrichment analysis is located.
                                                        method = "specific",                    # A character string specifying if the analysis should be computed once considering all genes associated to a list of LRpairs ("general") or associated to each receptor in the LR pairs separately ("specific").
                                                        package = "clusterProfiler",            # Character vector specifying the package to use for G0 analysis, one between "topGO" (default) and "clusterProfiler".
                                                        OrgDb = "org.Hs.eg.db")                 # OrgDb object, i.e. Organism databases for mapping gene identifiers.

names(Fib_cell_functional_response_rec)                                            

Users can visualize the GO enrichment results in conjunction with the related ligand–receptor interactions by combining them with the corresponding dot plot. This integrated view helps contextualize the functional impact of specific signaling events, linking upstream ligand–receptor communication to downstream biological processes.

In [None]:
options(repr.plot.width = 12, repr.plot.height = 6)
scSeqComm_plot_scores (data = Fib_comm,                                                              # A data frame containing results of signaling analysis.
                       title = "intercellular, intracellular and functional analysis",              # A character string to be used as title of plot.
                       annotation_GO = Fib_cell_functional_response,                                 # A data.frame or a list of data.frame with results of enrichment analysis.
                       cutoff = 0.05,                                                               # A numeric value defining the significance value for GO terms.
                       topGO = 5)                                                                   # A numeric value defining the maximum number of terms to be visualized.

# Unexplored feature
- **CClens**:
an R/shiny app that assists scientists in the interpretation and exploration of multi-dimensional cell-cell communication data from large-scale scRNA-seq dataset through an interactive and user-friendly interface, exploiting custom and flexible filtering options and multiple visualization tools. The platform accepts the results of scSeqCommDiff or other cell-cell communication inference methods, as well as custom analyses performed by the user.

- **Comparative analysis with scSeqCommDiff**
scSeqComm now enables the inferrence of differential cell-cell communication across two experimental conditions via scSeqComm_differential() function. Check the [tutorial](https://sysbiobig.gitlab.io/scSeqComm/articles/scseqcomm.html).


# Citation

- **scSeqComm**: *Identify, quantify and characterize cellular communication from single-cell RNA sequencing data with scSeqComm. Baruzzo G. et al., Bioinformatics. 2022*
