# Gene expression

The present study is based on the 10X scRNA-seq dataset published by the Allen Institute for Brain Science and publicly available at: https://portal.brain-map.org/atlases-and-data/RNA-seq/mouse-whole-cortex-and-hippocampus-10x. The data was then clustered, and cluster names were assigned based on the Allen Institute proposal for cell type nomenclature (https://portal.brain-map.org/explore/classes/nomenclature). The topology of the taxonomy allowed to define the sex of the mouse from which the cells were isolated, the regions of interest, cell classes (glutamatergic, GABAergic or Non-Neuronal) and subclasses. This information was stored in the metadata table. The metadata was used to subset cells of the hippocampus region from the gene expression matrix. We selected for 13 subclasses of hippocampal cells. The hippocampus gene count matrix was pre-processed in R v3.6.1 according to the Seurat v3.1.5 standard pre-processing workflow for quality control, normalization, and analysis of scRNA-seq data (cf. 10XHip2021_Pre.Processing). 

# Description
Here we describe how we obtained the figures for gene expression.


# Data availability

cf. README to download the processed and clustered Seurat object '10XHip2021_seurat.object.rds'

### Load data and required packages

In [None]:
# Required libraries
library(dplyr)
library(Seurat)
library(tidyverse)
library(ggplot2)

In [None]:
# Seurat object
hip <- readRDS("10XHip2021_seurat.object.rds")

In [None]:
# Use 'subclass' as idents
Idents(object = hip) <- "subclass"

### Map: reference for t-SNE reduction with all cells clustered by cell type/subclass

In [None]:
DimPlot(hip, 
        reduction = "tsne", 
        label = TRUE, 
        pt.size=0.5, 
        label.size = 6, 
        cols=c('CA1-ProS'='skyblue', 'CA2'='lightseagreen', 'CA3'='steelblue','DG'='slategray2',
        'Lamp5'='violetred4','Pvalb'='mediumvioletred','Sncg'='palevioletred1','Sst'='pink1',
        'Vip'='palevioletred3', 'Endo'='forestgreen', 'Micro-PVM'='yellowgreen', 'Oligo'='orange2',
        'Astro'='sienna3')) + 
        xlim(-40,40) + ylim(-40,40) + 
        theme(axis.title.x=element_text(size=40), 
              axis.title.y=element_text(size=40), 
              axis.text.x = element_text(size = 40), 
              axis.text.y = element_text(size = 40)) + 
        NoLegend()

### Gene expression: t-SNE reduction (FeaturePlot)

In [None]:
gene = "any_gene_of_interest" # e.g "Nr3c1"

# tSNE (scaled relative expression - from 1 to 3 = low/medium/high)

FeaturePlot(hip, 
            features = gene, 
            cols = c("lightgrey","mediumpurple1","blue"), 
            coord.fixed = TRUE) + 
            xlim(-40,40) + ylim(-40,40) + 
            theme(axis.title.x=element_text(size=40),
                  title = element_text(size=48), 
                  axis.title.y=element_text(size=40), 
                  axis.text.x = element_text(size = 40), 
                  axis.text.y = element_text(size = 40), 
                  legend.title = element_text(size=40),
                  legend.text = element_text(size=20))

### Gene expression: violin plot (VlnPlot)

In [None]:
gene = "any_gene_of_interest" # e.g "Nr3c1"

# Violin plot (log-normalized expression level)

VlnPlot(hip, 
        features = gene, 
        cols = c('CA1-ProS'='skyblue', 'CA2'='lightseagreen', 'CA3'='steelblue', 
          'DG'='slategray2','Lamp5'='violetred4','Pvalb'='mediumvioletred','Sncg'='palevioletred1','Sst'='pink1',
          'Vip'='palevioletred3','Endo'='forestgreen', 'Micro-PVM'='yellowgreen', 'Oligo'='orange2', 
          'Astro'='sienna3'), pt.size = 0.1) + 
        theme(axis.title.x=element_blank(),
              title = element_text(size=48), 
              axis.title.y=element_text(size=40), 
              axis.text.x = element_text(size = 40), 
              axis.text.y = element_text(size = 40), 
              legend.title = element_text(size=40),
              legend.text = element_text(size=20)) + 
        NoLegend()

### Statistics on Nr3c1 and Nr3c2 expression

##### For all hippocampus

In [None]:
# Average expression data for genes of interest such as Nr3c1 and Nr3c2
genes = c("Nr3c1","Nr3c2")
dotplot = DotPlot(hip, features = genes, assay="RNA")

data = dotplot$data # 'data' will contain the average expression for all genes of interest in all cell types

In [None]:
# Create one vector per gene, with the average expression for each cell type/subclass
Nr3c1 <- data$avg.exp[data$features.plot == "Nr3c1"]
Nr3c2 <- data$avg.exp[data$features.plot == "Nr3c2"]

In [None]:
# Wilcox test to determined the significance of Nr3c1 and Nr3c2 expression 
# differential expression throughout all cells

wilcox.test(Nr3c1, Nr3c2, paired = TRUE, alternative = "two.sided")

##### Per cell type/subclass

In [None]:
# Select the data for a specific cell type/subclass
data.region <- data[data$id == "subclass_of_interest",]
Nr3c1 <- data.region$avg.exp[data.region$features.plot == "Nr3c1"]
Nr3c2 <- data.region$avg.exp[data.region$features.plot == "Nr3c2"]

# Calculate the log-normalized ratio between Nr3c1 and Nr3c2 expression

log2(Nr3c2/Nr3c1) # When Nr3c2 > Nr3c1, or the other way if Nr3c2 < Nr3c1

In [None]:
# Test significance of the differential expression
# Genes of interest expression in all cells throughout all cell types/subclasses
data.Nr3c1 = VlnPlot(hip, features = "Nr3c1", assay="RNA")$data
data.Nr3c2 = VlnPlot(hip, features = "Nr3c2", assay="RNA")$data 

# Extract cell type/subclass specific data
data.Nr3c1 <- data.Nr3c1[data.Nr3c1$ident == "subclass_of_interest",]
data.Nr3c2 <- data.Nr3c2[data.Nr3c2$ident == "subclass_of_interest",]

# Select data columns containing feature expression per cell (the column uses the gene name)
Nr3c1 <- data.Nr3c1$Nr3c1
Nr3c2 <- data.Nr3c2$Nr3c2

# Wilcox test
wilcox.test(Nr3c1, Nr3c2, paired = TRUE, alternative = "two.sided")

### Differential expression analysis

In [None]:
wilcox.DEA <- FindAllMarkers(object = hip, test.use = 'wilcox', 
                                    logfc.threshold = 0.25, min.pct = 0.1, only.pos = TRUE)

In [None]:
write.table(wilcox.DEA, file='/path/file_name.tsv', quote=FALSE, sep='\t')

### Comparable expression of several genes of interest: DotPlot

In [None]:
genes = c("all_genes_of_interest")

# Dot plot displaying percentage of positive cells (pct.exp) and the z-score (avg.exp.scaled)

DotPlot(hip, features = genes, assay="RNA") + 
        geom_point(mapping = aes_string(size = 'pct.exp', color = 'avg.exp.scaled')) +
        guides(color = guide_colorbar(title = 'z-score'), 
        size = guide_legend(title = 'Percentage of positive cells')) + 
        scale_colour_gradient2(low = "white", mid = "#a6bddb", high = "#253494") +
        scale_size(range = c(1,8), breaks = c(0,25,50,75,100)) + 
        theme(axis.title.x=element_blank(),
              title = element_text(size=48), 
              axis.title.y=element_blank(),
              axis.text.x = element_text(size = 20, angle = 90, vjust = 0.7), 
              axis.text.y = element_text(size = 20), 
              legend.title = element_text(size=20),
              legend.text = element_text(size=20))

### Percentage of positive cells for a gene throughout all cell types/subclasses 

In [49]:
genes = c("all_genes_of_interest")
data = DotPlot(hip, features = genes)$data

# Compare all cell types/subclasses for one gene
ggplot(data, aes(fill=id, y=pct.exp, x=features.plot)) + 
       geom_bar(position="dodge", stat="identity") + 
       scale_fill_manual("Cell types", values = c('CA1-ProS'='skyblue', 'CA2'='lightseagreen','CA3'='steelblue',
                         'DG'='slategray2','Lamp5'='violetred4','Pvalb'='mediumvioletred','Sncg'='palevioletred1',
                         'Sst'='pink1','Vip'='palevioletred3','Endo'='forestgreen','Micro-PVM'='yellowgreen', 
                         'Oligo'='orange2','Astro'='sienna3')) + 
       ylab("Percentage of positive cells") + 
       ylim(0,100) + 
       theme_minimal() + 
       theme(axis.title.x=element_blank(), 
             axis.title.y=element_text(size=24), 
             axis.text.x = element_text(size = 20), 
             axis.text.y = element_text(size = 20))

# Compare two genes within one cell type/subclass
ggplot(data, aes(fill=features.plot, y=pct.exp, x=id)) + 
       geom_bar(position="dodge", stat="identity") + 
       scale_fill_manual("Features", values = c("gene1" = "gray47","gene2" = "gray87")) + 
       ylab("Percentage of positive cells") + 
       ylim(0,100) + 
       theme_minimal() + 
       theme(axis.title.x=element_blank(),
             title = element_text(size=40), 
             axis.title.y= element_blank(),
             axis.text.x = element_text(size = 20, angle = 45, hjust = 1), 
             axis.text.y = element_text(size = 20), 
             legend.title = element_text(size=20),
             legend.text = element_text(size=20))

### Number of cells per cell type

In [None]:
# The table gives the number of cells per cell type/subclass
table(Idents(hip))

# Create one vector for cell types/subclasses (cells) and one for the number of cells (counts)
cells <- c("DG", "CA1-ProS","CA3","Lamp5","Astro","Oligo","Sncg","Vip","CA2","Sst","Micro-PVM","Endo","Pvalb")
counts <- c(58566,13221,1899,1372,488,465,279,261,143,111,74,73,49)

# Create dataframe with counts per cell types/subclasses
data <- data.frame("cells" = cells,"counts" = counts)

# Bar plot number of cells
ggplot(data, mapping = aes(x=reorder(cells,counts), counts)) + 
       geom_bar(position="dodge", stat="identity")+  
       ylim(0,60000) + 
       theme_minimal() + 
       theme(axis.title.x=element_blank(), 
             axis.title.y=element_blank(), 
             axis.text.x = element_text(size = 40), 
             axis.text.y = element_text(size = 40)) + 
       coord_flip()

### Split violin plots by features - Allows direct comparison between two genes such as Nr3c1 and Nr3c2

In [None]:
# Add one column for cell name in metadata based on metadata rownames
hip@meta.data$cell_name <- rownames(hip@meta.data)

In [None]:
# Create dataset for Nr3c1
gene = "Nr3c1"
vlnplot = VlnPlot(data, features = gene)
data.Nr3c1 <- vlnplot$data$Nr3c1 # Nr3c1 expression for all cells
Nr3c1 <- hip
Nr3c1@meta.data$exp <- data.Nr3c1
Nr3c1@meta.data$feature <- "Nr3c1"

In [None]:
# Create dataset for Nr3c2
gene = "Nr3c2"
vlnplot = VlnPlot(data, features = gene)
data.Nr3c2 <- vlnplot$data$Nr3c2
Nr3c2 <- hip
Nr3c2@meta.data$exp <- data.Nr3c2
Nr3c2@meta.data$feature <- "Nr3c2"

In [None]:
# Combine the 2 datasets
hip.combined <- merge(Nr3c1, y = Nr3c2, add.cell.ids = c("GR", "MR"), project = "hip.combined")
Idents(object = hip.combined) <- "subclass"
table(hip.combined$feature) # There should be the same number of cells for both feature

In [None]:
# Violin plot

VlnPlot(hip.combined, 
        features = "exp", 
        pt.size = 0.1, 
        split.by = "feature",
        cols = c("Nr3c1" = "gray47","Nr3c2" = "gray87")) + 
        theme(axis.title.x=element_blank(),
              title = element_text(size=40), 
              axis.title.y= element_blank(), 
              axis.text.x = element_text(size = 20), 
              axis.text.y = element_text(size = 20), 
              legend.title = element_text(size=20),
              legend.text = element_text(size=20)) + 
        ggtitle("Expression level")

### Split violin plots using metadata such as the sex of the mouse from which the cells were isolated from.

In [None]:
# Group by subclass
Idents(object = hip) <- "subclass"

# Split violin plot
VlnPlot(hip, features = "Ar", pt.size = 0.1, split.by = "sex", 
        cols = c("M" = "mistyrose2", "F" = "azure2")) + 
        theme(axis.title.x=element_blank(),
              title = element_text(size=48), 
              axis.title.y=element_text(size=40), 
              axis.text.x = element_text(size = 40), 
              axis.text.y = element_text(size = 40), 
              legend.title = element_text(size=40),
              legend.text = element_text(size=20)) + 
        NoLegend()

### Gene expression within one cell type/subclass - Deeper clustering

In [None]:
# Group by subclass
Idents(object = hip) <- "subclass"

# Subset a subclass of interest ('SOI')
SOI_cells <- hip@meta.data$cell_name[hip@meta.data$subclass =='any_SOI']
hip.SOI <- subset(hip, cells = SOI_cells)

# Look into the subclusters
Idents(object = hip.SOI) <- "cluster"
table(Idents(hip.SOI))

# Map of the cell type/subclass with all subclusters. 
# As an example, in our paper the selected SOI was the Dentate Gyrus (DG):
DimPlot(hip.SOI, 
        reduction = "tsne", 
        label = FALSE, 
        pt.size=0.5, 
        cols=c('120_DG'='royalblue4','121_DG'='turquoise4','122_DG'='seashell4',
               '123_DG'='paleturquoise2','124_DG'='slategray','125_DG'='slategray2')) + 
        xlim(-40,40) + ylim(-40,40) + 
        theme(axis.title.x=element_text(size=40), 
              axis.title.y=element_text(size=40), 
              axis.text.x = element_text(size = 40), 
              axis.text.y = element_text(size = 40))

# Looking at one gene of interest with a t-SNE reduction

gene = "any_gene_of_interest"

FeaturePlot(hip.SOI, 
            features = gene, 
            cols = c("lightgrey","mediumpurple1","blue"), 
            coord.fixed = TRUE) + 
            xlim(-40,40) + ylim(-40,40) + 
            theme(axis.title.x=element_text(size=40),
                  title = element_text(size=48), 
                  axis.title.y=element_text(size=40), 
                  axis.text.x = element_text(size = 40), 
                  axis.text.y = element_text(size = 40), 
                  legend.title = element_text(size=40),
                  legend.text = element_text(size=20))

# Looking at one gene of interest with a violin plot

gene = "any_gene_of_interest"

VlnPlot(hip.SOI, 
        features = gene, 
        cols = c('120_DG'='royalblue4','121_DG'='turquoise4','122_DG'='seashell4',
                 '123_DG'='paleturquoise2','124_DG'='slategray','125_DG'='slategray2'), 
        pt.size = 0.1) + 
        theme(axis.title.x=element_blank(),
              title = element_text(size=48), 
              axis.title.y=element_text(size=40), 
              axis.text.x = element_text(size = 40), 
              axis.text.y = element_text(size = 40), 
              legend.title = element_text(size=40),
              legend.text = element_text(size=20)) + 
        NoLegend()