## **8 Single cell RNA-seq analysis using Seurat** ## 

- > This vignette should introduce you to some typical tasks, using Seurat (version 3) eco-system. 
- > Seurat vignettes are available here; however, they default to the current latest Seurat version (version 4). 
- > Previous vignettes are available from here.

- > Let’s now load all the libraries that will be needed for the tutorial.

In [2]:
library(Seurat)
library(ggplot2)
library(SingleR)
library(dplyr)
library(celldex)
library(RColorBrewer)
library(SingleCellExperiment)

### **8.1 Basic quality control and filtering** ###

- > We start the analysis after two preliminary steps have been completed: 
    - > 1) ambient RNA correction using soupX; 
    - >2) doublet detection using scrublet. Both vignettes can be found in this repository.

- > To start the analysis, let’s read in the SoupX-corrected matrices (see QC Chapter). 
- > SoupX output only has gene symbols available, so no additional options are needed. 
- > If starting from typical Cell Ranger output, it’s possible to choose if you want to use Ensemble ID or gene symbol for the count matrix. 
- > This is done using gene.column option; default is ‘2,’ which is gene symbol.

In [5]:
adj.matrix <- Read10X("/home/asus/Desktop/CHRF_Project/Single_cell/15.Single_Cell_RNA_Seq_Using_R/scRNA.seq.course/data/update/soupX_pbmc10k_filt")

- > After this, we will make a Seurat object. 
- > Seurat object summary shows us that 1) number of cells (“samples”) approximately matches the description of each dataset (10194); 2) there are 36601 genes (features) in the reference.

In [6]:
srat <- CreateSeuratObject(adj.matrix,project = "pbmc10k") 
srat

An object of class Seurat 
36601 features across 10194 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)

- > Let’s erase adj.matrix from memory to save RAM, and look at the Seurat object a bit closer. str commant allows us to see all fields of the class:

In [8]:
adj.matrix <- NULL
str(srat)

Formal class 'Seurat' [package "SeuratObject"] with 13 slots
  ..@ assays      :List of 1
  .. ..$ RNA:Formal class 'Assay' [package "SeuratObject"] with 8 slots
  .. .. .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. .. .. ..@ i       : int [1:24330253] 25 30 32 42 43 44 51 59 60 62 ...
  .. .. .. .. .. ..@ p       : int [1:10195] 0 4803 7036 11360 11703 15846 18178 20413 22584 27802 ...
  .. .. .. .. .. ..@ Dim     : int [1:2] 36601 10194
  .. .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. .. ..$ : chr [1:36601] "MIR1302-2HG" "FAM138A" "OR4F5" "AL627309.1" ...
  .. .. .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. .. .. .. .. ..@ x       : num [1:24330253] 1 2 1 1 1 3 1 1 1 1 ...
  .. .. .. .. .. ..@ factors : list()
  .. .. .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. .. .. ..@ i       : int [1:24330253] 25 30 32 42 4

- > Meta.data is the most important field for next steps. 
- > It can be acessed using both @ and [[]] operators. 
- > Right now it has 3 fields per celL: dataset ID, number of UMI reads detected per cell (nCount_RNA), and the number of expressed (detected) genes per same cell (nFeature_RNA).

In [9]:
meta <- srat@meta.data
dim(meta)

In [10]:
head(meta)

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA
Unnamed: 0_level_1,<fct>,<dbl>,<int>
AAACCCACATAACTCG-1,pbmc10k,22196,4734
AAACCCACATGTAACC-1,pbmc10k,7630,2191
AAACCCAGTGAGTCAG-1,pbmc10k,21358,4246
AAACCCAGTGCTTATG-1,pbmc10k,857,342
AAACGAACAGTCAGTT-1,pbmc10k,15007,4075
AAACGAACATTCGGGC-1,pbmc10k,9855,2285


In [11]:
summary(meta$nCount_RNA)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    499    5549    7574    8902   10730   90732 

In [12]:
summary(meta$nFeature_RNA)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     47    1725    2113    2348    2948    7265 

Let’s add several more values useful in diagnostics of cell quality. Michochondrial genes are useful indicators of cell state. For mouse datasets, change pattern to “Mt-,” or explicitly list gene IDs with the features = … option.