Skip to content

Detailed outline to use the individual function of cellSight on available data:

Ranojoy edited this page Mar 25, 2024 · 13 revisions

In this vignette, we highlight the detailed steps of using the individual functions available in cellSight. The data used for this example is taken from NCBI genome expression omnibus(GEO). A total of five samples were examined in this analysis. Two samples were obtained from young subjects, serving as replicates (y1=25y, y2=27y), and three samples were obtained from older subjects, also serving as replicates (o1=53y, o2=70y, o3=69y). The hypothesis in this study was to show fibroblasts play a crucial role in maintaining the structure and functionality of human skin, with distinct variations observed in different dermal layers. Despite their diverse functions, a comprehensive analysis of these variations is lacking.

Install and load packages

Install Bioconductor packages before installing cellSight on the R console:

# Install Bioconductor Manager and required packages
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("DelayedMatrixStats", "glmGamPoi", "metap", "multtest"))

To install cellSight using devtools:

# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

# Install cellSight from GitHub
devtools::install_github("omicsEye/cellSight")
#Load cellSight
library(cellSight)

Loading the data from NCBI GEO:

library(R.utils)
library(Seurat)
url <- "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE130973&format=file&file=GSE130973%5Fseurat%5Fanalysis%5Flyko%2Erds%2Egz"
filename <- "GSE130973_seurat_analysis_lyko.rds.gz"

# Define the destination directory
destination_dir <- "C:/Users/ranoj/Desktop/test"

# Create the destination path
destination_path <- file.path(destination_dir, filename)

# Download the file
GET(url, write_disk(destination_path,overwrite = TRUE))

# Unzip the file
gunzip(destination_path)

# Optional: Remove the original compressed file
file.remove(destination_path)
# Define the path to the unzipped RDS file
unzipped_file <- "C:/Users/ranoj/Desktop/test/GSE130973_seurat_analysis_lyko.rds"

# Read the RDS file
seurat_data <- readRDS(unzipped_file)

# Now, 'seurat_data' contains the contents of the unzipped RDS file
# You can work with the Seurat object or data as needed

image

integrated_seurat_object <-UpdateSeuratObject(seurat_data)
# Assuming you have a Seurat object named integrated_seurat_object

# Assuming 'integrated_seurat_object' is your integrated Seurat object

# Assuming 'integrated_seurat_object' is your integrated Seurat object

# Get unique sample IDs
subjects <- unique(integrated_seurat_object$subj)

# Create a list to store individual Seurat objects
individual_seurat_objects <- list()

# Loop through each sample ID and extract the data
for (subject in subjects) {
  # Subsetting the integrated Seurat object for each sample
   subset_seurat_object <- SplitObject(integrated_seurat_object, split.by = "subj")[[subject]]

  # Store the subset Seurat object in the list
  individual_seurat_objects[[subject]] <- subset_seurat_object

}

# Access individual Seurat objects by sample ID
# For example, individual_seurat_objects[["Sample1"]], individual_seurat_objects[["Sample2"]], etc.

image image

Running individual function in cellSight:

library(cellSight)
qc_plots<- qc_plots(individual_seurat_objects,"C:/Users/ranoj/Desktop/second_example/")

image

Quality Control (QC) Plots in single-cell QC plots are essential to assess the quality of your scRNA-seq experiment and identify potential issues that may impact downstream analyses.

Overview of QC Plots cellSight offers several QC plots to help you evaluate different aspects of your single-cell dataset:

  1. Gene Expression Metrics Violin Plots: Display the distribution of gene expression across cells, aiding in the identification of highly variable genes. Box Plots: Show the spread and central tendency of gene expression, highlighting potential outliers.
  2. Cell-Level Metrics Scatter Plots: Visualize relationships between important metrics like the number of detected genes and total counts per cell. Feature Plots: Display expression levels of specific genes across all cells, allowing identification of potential outliers.
  3. Mitochondrial Content Mitochondrial Content Plots: Evaluate the percentage of mitochondrial genes in individual cells, identifying potential stress or low-quality cells.

Output

Image 1 Image 2 Image 3 Image 4 Image 5 Image 1 Image 2 Image 3 Image 4 Image 5 Image 1 Image 2 Image 3 Image 4 Image 5

filtered_data <-filtering(individual_seurat_objects,"C:/Users/ranoj/Desktop/second_example/")

image

The filtering step typically involves the following criteria:

Cell Quality Metrics: Cells may be filtered based on metrics such as total number of genes detected per cell, total counts per cell, and the percentage of mitochondrial genes. Cells with unusually high or low values for these metrics may be indicative of poor quality or technical issues.

Gene Expression Thresholds: Cells expressing an insufficient number of genes may be excluded. This helps remove potential ambient RNA contamination or low-quality cells with minimal transcriptional activity.

Mitochondrial Gene Content: Cells with high percentages of mitochondrial genes may indicate stress or damage. Filtering out such cells helps improve the overall quality of the dataset.

sctransformed_data <- sctransform_integration(filtered_data,"C:/Users/ranoj/Desktop/second_example/")

image image image image image image image image

The above function performs two important tasks: normalization using sctransform and integration. In the intricate realm of single-cell RNA sequencing (scRNA-seq), advancements in data processing and integration methodologies have become pivotal for unraveling the complexities inherent in cellular transcriptomes. One such transformative approach, SCTransform, serves as a beacon for addressing challenges like high dropout rates and low counts within scRNA-seq datasets. Simultaneously, the process of integration emerges as a crucial endeavor, offering a means to harmonize diverse datasets from distinct experimental conditions. Together, these methodologies elevate the precision and interpretability of scRNA-seq analyses, enabling researchers to glean nuanced insights into the molecular intricacies of individual cells.

  1. SCTransform: sctransform is a method used for normalizing and transforming single-cell RNA-seq data. It is particularly beneficial for addressing challenges such as high dropout rates and low counts inherent in single-cell datasets. The method aims to stabilize variance across expression levels, making the data more amenable to downstream analyses, such as clustering and differential expression. Applying sctransform to a single-cell RNA-seq dataset allows you to obtain transformed expression values that are more suitable for statistical analyses and visualization.

  2. Integration: Integration, in the context of single-cell RNA-seq data, refers to the process of combining or aligning multiple datasets to enable joint analysis. This is often necessary when dealing with data from different batches, experiments, or conditions. Integration methods aim to reduce batch effects and allow for a more accurate comparison of cells across datasets. The goal is to harmonize datasets, making them comparable and facilitating the identification of shared biological signals.

# Assuming 'sctransformed_data' is your Seurat object
#Changing the variable names to sample and type to create the plots
sctransformed_data$sample <- sctransformed_data$age
sctransformed_data$type <- sctransformed_data$subj

#Running the function to find the clusters present in the data
pca_clusters <- pca_clustering(sctransformed_data,"C:/Users/ranoj/Desktop/second_example/")

image image image image image

Output

Image 1 Image 2 Image 3 Image 4 Image 5 Image 6