Skip to content

Characterizing dynamics of cells and gene expression during injury‐induced skin:

Ranojoy edited this page May 3, 2024 · 1 revision

This vignette highlights a detailed example of the workflow for performing various quality control and differential analysis using the cellSight package on a single cell case vs control data. The cellSight package is an amalgamation of various packages namely Seurat and Tweedieverse, but also describes the varied difference that makes the cellSight package unique and important in its self.

Install and load packages

Install Bioconductor packages before installing cellSight on the R console:

# Install Bioconductor Manager and required packages
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("DelayedMatrixStats", "glmGamPoi", "metap", "multtest"))

To install cellSight using devtools:

# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

# Install cellSight from GitHub
devtools::install_github("omicsEye/cellSight")
#Load cellSight
library(cellSight)

Input

cellSight requires two inputs for performing its tasks. The first is the location of the single-cell data and the second is the location of the output file. The location of the single-cell data consists of three important components :

Barcodes : Barcodes are unique identifiers assigned to individual cells during the scRNA-seq process. Each cell in the experiment is tagged with a distinct barcode, allowing researchers to differentiate and track individual cells throughout the sequencing process. Barcodes are crucial for identifying and quantifying gene expression in individual cells. They help associate the sequenced reads with specific cells, generating single-cell expression profiles. The features/genes refer to the genes or genomic elements whose expression is measured in the scRNA-seq experiment.

Features/Genes : Each feature corresponds to a specific gene, and the expression levels of these genes are quantified for each individual cell. Features define the biological entities under investigation. The feature information includes gene names or identifiers, and the expression levels across all cells are recorded in the expression matrix.

Matrix : Finally, the matrix is a two-dimensional table where rows represent features (genes) and columns represent individual cells. Each entry in the matrix contains the expression level of a specific gene in a particular cell. The matrix is the core representation of the scRNA-seq data. It captures the gene expression patterns for each cell, providing a comprehensive view of the transcriptional landscape at the single-cell level. This matrix is often sparse, meaning that many entries are zero, as not all genes are expressed in every cell.

The scRNA-seq data from 10x Genomics consists of a matrix where rows correspond to genes (features), columns correspond to individual cells identified by unique barcodes, and each entry in the matrix represents the expression level of a specific gene in a particular cell. Thus joining the above-mentioned three elements together gives us the complete data.

The data used in this example is collected from the skin tissue sample of 2 mice 1-day post wound and the other 2 from normal injured mice to understand the various factors that play an important role in wound healing. In this case vs control setup, the purpose was to unearth the various important genes that are quintessential for healing.

# Supplying the location for both the input and the desired output location 
cellSight("C:/Users/ranoj/Box/snRNA_CellRanger_Wound_nonWound/data/","C:/Users/ranoj/Desktop/wound")

img-1

The output produces 4 qc plots that are used to assess the quality of single-cell RNA sequencing (scRNA-seq) data. These plots provide visualizations that help researchers identify any potential issues or biases in the dataset before proceeding with downstream analyses.

  1. Gene Detection Rate: This metric indicates how many unique genes are detected in each cell. Low gene detection rates could indicate poor quality cells or technical issues during library preparation or sequencing.

  2. Total Counts per Cell: This reflects the total number of reads or UMIs (Unique Molecular Identifiers) detected in each cell. Extreme variations in total counts per cell may suggest issues such as cell lysis, doublet capture, or differences in library preparation efficiency.

  3. Percentage of Mitochondrial Genes: Mitochondrial genes are often used as a proxy for cell stress or damage because they are highly expressed when cells are stressed or dying. A high percentage of mitochondrial genes in a cell may indicate poor quality.

  4. Relationship Between QC Metrics: QC plots often visualize the relationships between different QC metrics. For example, a scatterplot of total counts per cell versus the percentage of mitochondrial genes can reveal patterns such as an increase in mitochondrial gene percentage with decreasing total counts, indicating cell stress or damage.

Screen Shot 2024-05-03 at 1 06 11 PM

image image image image image image image image image image image

The output shows the integrated UMAP plot of the 4 different datasets. The data were integrated using anchor genes and after the integration, the package created the Umap plots for the integrated dataset. The UMAP plot in cellSight shows a two-dimensional visualization of single-cell RNA sequencing data, where cells are arranged based on their transcriptional similarity, facilitating the identification of distinct cell populations and their relationships within the dataset. Alongside the UMAP plots cellSight also produces plots detailing how well the different datasets were integrated showing the robustness of the integration technique. Screen Shot 2024-05-03 at 1 19 51 PM image