# Cell-cell communication with CellChat
This notebook outlines the steps of inference, analysis and visualization of cell-cell communication for a single-cell RNA sequencing data using **CellChat**. 

For comprehensive instructions and detailed descriptions of the methods, please refer to the official [vignette](https://htmlpreview.github.io/?https://github.com/jinworks/CellChat/blob/master/tutorial/CellChat-vignette.html). A step-by-step protocol for cell-cell communication analysis using CellChat is now available at [Jin et al., Nature Protocols 2024](https://www.nature.com/articles/s41596-024-01045-4). 

The source code is available on GitHub: [jinworks/CellChat](https://github.com/jinworks/CellChat).

## About CellChat and CellChatDB
CellChat is an R package designed for inference, analysis, and visualization of cell-cell communication from single-cell data. CellChat aims to enable users to identify and interpret cell-cell communication within an easily interpretable framework, with the emphasis of clear, attractive, and interpretable visualizations.

CellChatDB is a manually curated database of literature-supported ligand-receptor interactions in mutiple species, leading to a comprehensive recapitulation of known molecular interaction mechanisms including multi-subunit structure of ligand-receptor complexes and co-factors.

# Load the required libraries

In [1]:
library(CellChat)
library(patchwork)
library(Seurat)
options(stringsAsFactors = FALSE)

Loading required package: dplyr


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: igraph


Attaching package: ‘igraph’


The following objects are masked from ‘package:dplyr’:

    as_data_frame, groups, union


The following objects are masked from ‘package:stats’:

    decompose, spectrum


The following object is masked from ‘package:base’:

    union


Loading required package: ggplot2

Attaching SeuratObject

‘SeuratObject’ was built under R 4.3.0 but the current version is
4.3.1; it is recomended that you reinstall ‘SeuratObject’ as the ABI
for R may have changed

Seurat v4 was just loaded with SeuratObject v5; disabling v5 assays and
validation routines, and ensuring assays work in strict v3/v4
compatibility mode



## Data input 
To initialize a CellChat object, it requires three mandatory inputs:
- **counts matrix**: a normalized gene expression data.
- **metadata**: a table linking each cell to its assigned cell labels
- **ligand-receptor database**: containing all putative ligand–receptor pairs used for inference.

### Gene expression data matrix
The gene expression data must be provided as matrix with genes in rows with rownames and cells in columns with colnames. Normalized data (e.g., library-size normalization and then log-transformed with a pseudocount of 1) is required as input for CellChat analysis. If user provides count data, they provide a normalizeData function to account for library size and then do log-transformed.


In [6]:
seurat_object <- readRDS("../../../../data_vt3/first/2306scRNAseq_HMI_Ischemic.rds")

gene_expr <- seurat_object[["RNA"]]@data # normalized data matrix

### Metadata
The cell group information must be provided as dataframe with one column with the cell labels and rownames identifing the cell.

In [7]:
cell_metadata <- data.frame(labels = seurat_object@meta.data$cell_type, row.names = row.names(seurat_object@meta.data)) # cell metadata
head(cell_metadata)

Unnamed: 0_level_0,labels
Unnamed: 0_level_1,<chr>
AAACCCAAGTATGCAA-1_2_1_1_1_1_1_1_1,Fib
AAACCCACAGAGAAAG-1_2_1_1_1_1_1_1_1,Fib
AAACCCAGTCTTCCGT-1_2_1_1_1_1_1_1_1,Fib
AAACCCATCAAGGCTT-1_2_1_1_1_1_1_1_1,Fib
AAACGAAAGCTGTGCC-1_2_1_1_1_1_1_1_1,PC
AAACGAACAGGTCCCA-1_2_1_1_1_1_1_1_1,Fib


In [8]:
# save space
rm(seurat_object)
gc()

Unnamed: 0,used,(Mb),gc trigger,(Mb).1,max used,(Mb).2
Ncells,5668991,302.8,8833537,471.8,8450899,451.4
Vcells,38713435,295.4,118328399,902.8,98491020,751.5


### Create a CellChat object
To begin the analysis, we need to create a CellChat object, which serves as the central structure to store input data and hold the results of the cell–cell communication inference.

In [9]:
cellchat <- createCellChat(object = gene_expr,            # a normalized data matrix
                           meta = cell_metadata,          # a data frame (rows are cells with rownames) consisting of cell information,
                           group.by = "labels")           # a char name of the variable in meta data, defining cell groups
cellchat

[1] "Create a CellChat object from a data matrix"


“The 'meta' data does not have a column named `samples`. We now add this column and all cells are assumed to belong to `sample1`! 
”


Set cell identities for the new CellChat object 
The cell groups used for CellChat analysis are  Adipo, CM, Endo, Fib, Lymphoid, Mast, Myeloid, PC, vSMCs 


An object of class CellChat created from a single dataset 
 27416 genes.
 14961 cells. 
CellChat analysis of single cell RNA-seq data! 

### Ligand-receptor interaction database: CellChatDB
CellChatDB is a manually curated database of literature-supported ligand-receptor interactions in both human and mouse. CellChatDB v2 contains ~3,300 validated molecular interactions, including ~40% of secrete autocrine/paracrine signaling interactions, ~17% of extracellular matrix (ECM)-receptor interactions, ~13% of cell-cell contact interactions and ~30% non-protein signaling.
Key features of CellChatDB include:
- **Multimeric ligand-receptor complexes**: accurately represents the multisubunit structure of ligand–receptor complexes.
- **Co-Factors**: includes both soluble and membrane-bound stimulatory or inhibitory co-factors, such as agonists, antagonists, and co-receptors.
- **Metabolic and synaptic signaling**:  incorporates over 1,000 protein and non-protein interactions. For molecules not directly related to genes, CellChat v2 use those molecules’ key mediators or enzymes for potential communication mediated by non-proteins.


Users can extend CellChatDB by adding their own curated ligand–receptor pairs. For detailed instructions, refer to the official [tutorial](https://htmlpreview.github.io/?https://github.com/jinworks/CellChat/blob/master/tutorial/Update-CellChatDB.html) on updating the CellChatDB ligand–receptor interaction database.

In [10]:
CellChatDB <- CellChatDB.human

# Show the structure of the database
dplyr::glimpse(CellChatDB$interaction)

Rows: 3,233
Columns: 28
$ interaction_name         [3m[90m<chr>[39m[23m "TGFB1_TGFBR1_TGFBR2", "TGFB2_TGFBR1_TGFBR2",…
$ pathway_name             [3m[90m<chr>[39m[23m "TGFb", "TGFb", "TGFb", "TGFb", "TGFb", "TGFb…
$ ligand                   [3m[90m<chr>[39m[23m "TGFB1", "TGFB2", "TGFB3", "TGFB1", "TGFB1", …
$ receptor                 [3m[90m<chr>[39m[23m "TGFbR1_R2", "TGFbR1_R2", "TGFbR1_R2", "ACVR1…
$ agonist                  [3m[90m<chr>[39m[23m "TGFb agonist", "TGFb agonist", "TGFb agonist…
$ antagonist               [3m[90m<chr>[39m[23m "TGFb antagonist", "TGFb antagonist", "TGFb a…
$ co_A_receptor            [3m[90m<chr>[39m[23m "", "", "", "", "", "", "", "", "", "", "", "…
$ co_I_receptor            [3m[90m<chr>[39m[23m "TGFb inhibition receptor", "TGFb inhibition …
$ annotation               [3m[90m<chr>[39m[23m "Secreted Signaling", "Secreted Signaling", "…
$ interaction_name_2       [3m[90m<chr>[39m[23m "TGFB1 - (TGFBR1+TGFBR2)", "TGFB2

In [11]:
# use all CellChatDB except for "Non-protein Signaling" for cell-cell communication analysis
CellChatDB.use <- subsetDB(CellChatDB,              # database
                           key = "annotation",      # A character vector indicating columns names of database
                           non_protein = FALSE)     # whether to use the non-protein signaling for CellChat analysis (e.g. synaptic signaling interactions)

# set the used database in the object
cellchat@DB <- CellChatDB.use

# Inference of cell-cell communication

CellChat infers biologically significant cell–cell communication events by assigning a probability value to each interaction and performing permutation testing to identify cell-type pair–specific signaling interactions. It models the probability of communication by integrating gene expression data with prior knowledge of interactions between signaling ligands, receptors, and their cofactors, based on the law of mass action.

To infer cell state–specific communications, CellChat follows a multi-step procedure:

- **Identification of Overexpressed Ligands and Receptors**:
To detect cell group–specific signaling, CellChat first identifies differentially expressed signaling genes across all cell groups. Only ligand–receptor pairs for which either the ligand or the receptor is overexpressed are considered for downstream analysis.

- **Average Gene Expression Estimation**: By default, CellChat uses a robust mean method called the trimean, which incorporates the first, second (median), and third quartiles of expression levels. Alternatively, users can choose a truncated mean approach for computing average expression.

- **Intercellular Communication Probability**:
Signaling probability is modeled using the law of mass action, based on the average expression of ligands and receptors in the respective cell groups. This model also incorporates the effects of:         
    - Co-stimulatory and co-inhibitory receptors that modulate receptor activation
    - Extracellular agonists and antagonists that influence ligand–receptor binding
    - Cell group proportions, which are taken into account for cell abundance

- **Multi-Subunit Complexes**:
    For ligand or receptor complexes composed of multiple subunits, CellChat calculates the geometric mean of subunit expression to estimate complex-level activity.

- **Statistical Testing**:
Cell group labels are randomly permuted (default: 100 permutations), and communication probabilities are recomputed to generate a null distribution. A p-value is then calculated as the proportion of null values equal to or exceeding the observed communication probability. This p-value reflects the specificity of a given interaction for a particular cell-type pair.

This takes quite some time to run (**around 2 hours**).

In [None]:
# subset the expression data of signaling genes for saving computation cost
cellchat <- subsetData(cellchat) 

# set parallelism
future::plan("multisession", workers = 10)
options(future.globals.maxSize= 800*1024^2)

# identify overexpressed genes and ligand-receptor pairs
cellchat <- identifyOverExpressedGenes(cellchat)
cellchat <- identifyOverExpressedInteractions(cellchat)

# infer cellular communication network
cellchat <- computeCommunProb(
    object = cellchat,                  # CellChat object
    type = "triMean",                   # Methods for computing the average gene expression per cell group.
    population.size = TRUE,             # Whether consider the proportion of cells in each group across all sequenced cells,
    nboot = 100)

The number of highly variable ligand-receptor pairs used for signaling inference is 1326 
triMean is used for calculating the average gene expression per cell group. 
[1] ">>> Run CellChat on sc/snRNA-seq data <<< [2025-07-15 10:04:00.465005]"


Users can filter out the cell-cell communication if there are only few cells in certain cell groups. By default, the minimum number of cells required in each cell group for cell-cell communication is 10.

In [None]:
cellchat <- filterCommunication(cellchat, min.cells = 10)

# Description of output

The inference results are stored in the net slot of the CellChat object. This slot contains a list of two three-dimensional arrays, structured as follows:
- Dimensions: Source cell group (sender) x Target cell group (receiver) x Ligand–receptor interaction pair
- Contents:
    - 'prob': Contains interaction probabilities, representing the strength of communication between cell type pairs.
    - 'pval': Contains p-values indicating the statistical significance of each predicted interaction.

These arrays allow users to extract and explore both the magnitude and confidence of inferred cell–cell communication for any specific source–target–interaction combination.

In [None]:
# View the structure of net
head(cellchat@net)

# Extract the communication probability matrix for the first ligand–receptor (LR) pair
head(cellchat@net$prob[,,856])

## Interpreting and visualizing the outputs

### Extract the inferred cellular communication network
CellChat’s subsetCommunication() function allows you to extract and filter the inferred ligand–receptor interactions between cell groups. The output is a data.frame containing details such as source, target, ligand, receptor, interaction probability, p-value, and pathway.

You can filter interactions based on sender/receiver cell groups, signaling pathways, or statistical thresholds like p-values or differential expression metrics for ligand and receptors.



In [None]:
# all inferred cell-cell communications at level of ligands/receptors
head(subsetCommunication(cellchat))

# subset the cell-cell communication
head(subsetCommunication(cellchat,
                        sources.use = c("CM", "Fib"),               # a vector giving the index or the name of source cell groups
                        targets.use = c("Adipo", "Endo"),           # a vector giving the index or the name of target cell groups
                        signaling = c("EGF")))                      # a character vector giving the name of signaling pathways of interest


### Basic analysis and plotting
CellChat computes the overall cell–cell communication network by either: i) Counting the number of interactions, or ii) Summarizing the communication probabilities across ligand–receptor pairs.

Return an updated CellChat object with net
- obj@net\$count is a matrix: rows and columns are sources and targets respectively, and elements are the number of interactions between any two cell groups. 
- obj@net$weight is also a matrix containing the interaction weights between any two cell groups

> To evaluate communication at the signaling pathway level, CellChat provides the computeCommunProbPathway() function. This function summarizes the communication probabilities of all ligand–receptor interactions associated with a given signaling pathway, based on annotations in the ligand–receptor database. The output is stored in the netP slot of the CellChat object. **It is important to note that this analysis only captures intercellular communication and does not model intracellular signaling pathways.**



In [None]:
cellchat <- aggregateNet(cellchat)
cellchat@net$count

### Circle plot: Visualizing Interaction Counts or Strength
These aggregated interactions can be visualized using a circle plot, where each node represents a cell group and the edges indicate communication between them. The width of the edges reflects the strength of the interaction ("counts" or "weight"), while the size of each node can be scaled according to the number of cells in the corresponding group.

In [None]:
# Calculate the size (number of cells) of each cell group
groupSize <- as.numeric(table(cellchat@idents))

# Generate a circle plot to visualize the number of interactions between cell groups
netVisual_circle(cellchat@net$count,                # A weighted matrix representing the connections
                vertex.weight = groupSize,          # Set node sizes
                weight.scale = T,                   # Scale edge weights for visualization clarity
                label.edge = T,                     # Whether or not shows the label of edges
                title.name = "Number of interactions")          # Title of the plot

# Generate a circle plot to visualize the weights between cell groups
netVisual_circle(cellchat@net$weight,                # A weighted matrix representing the connections
                vertex.weight = groupSize,          # Set node sizes
                weight.scale = T,                   # Scale edge weights for visualization clarity
                label.edge = T,                     # Whether or not shows the label of edges
                title.name = "Interaction weights/strength")        # Title of the plot


### Heatmap: Visualizing Interaction Counts or Strength
The heatmap provides a detailed view of cell–cell communication by displaying either the number of interactions or the interaction strength (communication probability) between each pair of cell groups. 
To aid interpretation, two summary bar plots are included:
- The top bar plot shows the total incoming interactions for each target group by summing the values in each column.
- The right-side bar plot displays the total outgoing interactions from each source group by summing the values in each row.

The heatmap can be particularly useful for identifying dominant communication hubs and asymmetric interactions.
Depending on the analysis focus, users can choose to display: 'counts' or 'weights'

In [None]:
netVisual_heatmap(
  object = cellchat,                          # CellChat object
  measure = c("count"),                       # "count" (number of interactions) or "weight" (total interaction stregth)
  signaling = NULL,                           # a character vector giving the name of signaling networks
  slot.name = c( "net"),                      # the slot name of object: 
                                                 # slot.name = "net" when visualizing links at the level of ligands/receptors; 
                                                 # slot.name = "netP" when visualizing links at the level of signaling pathways
  title.name = "Number of interactions",      # the name of the title
  cluster.rows = TRUE,                       # whether cluster rows
  cluster.cols = TRUE,                       # whether cluster columns
)

### Bubble plot: Significant Ligand-Receptor interactions
To visualize significant ligand–receptor (L–R) interactions between cell groups, CellChat provides the netVisual_bubble function. This plot effectively summarizes how different cell types communicate through specific signaling molecules.
Each bubble in the plot represents a ligand–receptor pair involved in communication from a source (sending) group to a target (receiving) group.
- The x-axis denotes the interacting cell group pair
- The y-axis lists the L–R pairs.
- Bubble color indicates the interaction strength (i.e., the communication probability).
- Bubble size reflects the statistical significance
You can customize the plot to filter for specific signaling pathways, senders, or receivers.

In [None]:
netVisual_bubble(cellchat,                                  # CellChat object
                sources.use = "Fib",                        # a vector giving the index or the name of source cell groups
                targets.use = c("CM", "Endo", "Myeloid"),   # a vector giving the index or the name of target cell groups.
                thresh = 0.05,                              # threshold of the p-value for determining significant interaction
                signaling = "COLLAGEN",                     # a character vector giving the name of signaling pathways of interest
                title.name = ""	                            # title name of the plot
                ) 

### Chord diagram
Similar to Bubble plot, CellChat provides a function netVisual_chord_gene for drawing Chord diagram to show all the interactions (L-R pairs or signaling pathways) from some cell groups to other cell groups.
- each sector on the outer ring represents either a ligand, receptor, or signaling pathway, depending on the chosen level of analysis.
- the color of each edge is based on the source (sender) group
- the thickness of the edge reflects the interaction strength (i.e. thicker edge line indicates a stronger signal)
- each inner bar is colored according to the target group that receives the signal.

In [None]:
netVisual_chord_gene(cellchat,                                  # CellChat object
                    signaling = NULL,                           # a character vector giving the name of signaling networks
                    slot.name = "net",                          # the slot name of object: 
                                                                # slot.name = "net" when visualizing links at the level of ligands/receptors; 
                                                                # slot.name = "netP" when visualizing links at the level of signaling pathways
                    sources.use = "Fib",                        # a vector giving the index or the name of source cell groups
                    targets.use = c("CM", "Endo"),              # a vector giving the index or the name of target cell groups.
                    thresh = 0.05,                              # threshold of the p-value for determining significant interaction
                    title.name = ""	                            # title name of the plot
                    )                      

# Unexplored feature
- **Multiple visualization and Network Interpretation**
Beyond basic interaction plots, CellChat offers advanced tools for exploring high-order features of inferred communication networks. It integrates methods from social network analysis, pattern recognition, and manifold learning to provide a deeper understanding of the structure and organization of cell–cell communication.

- **Comparative analysis**
CellChat supports comparative analysis to assess how cell–cell communication networks change across different biological conditions (e.g., disease vs. healthy, treated vs. untreated). Using quantitative contrasts and joint manifold learning, it can identify significant differences in signaling activity, communication strength, and network topology.

- **Spatially resolved omics data**:
CellChat allow the inference of spatially proximal cell-cell communication between interacting cell groups from spatially resolved transcriptomics

# Citation

- **CellChat v1**: *Inference and analysis of cell-cell communication using CellChat. Suoqin Jin et al., Nat. Comm. 2021* li
- **CellChat v2**: *CellChat for systematic analysis of cell–cell communication from single-cell transcriptomics. Suoqin Jin et al. Nat Protoc. 2024*
