# Mouse Hippocampus snRNA Integrative Analysis Overview
## Data
The data used in this notebook originates from [these](https://www.encodeproject.org/carts/enc4_mouse_snrna_parse/) [carts](https://www.encodeproject.org/carts/enc4_mouse_snrna_10x/). For a simple table of the specific data used in this notebook, see the [hippocampus metadata](https://github.com/erebboah/ENC4_Mouse_SingleCell/blob/master/snrna/ref/hippocampus_minimal_metadata.tsv).

## Aims
This notebook reads in the pre-processed hippocampus Parse and 10x data and merges by technology. Then we combine technologies by CCA integration. Next we use an [external 10x brain atlas](https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-10x) to predict celltype labels. Finally, we perform manual celltype annotation by assigning each cluster to the celltype predicted for the majority of cells in the cluster, then adjusting the labels as we see fit.

## Results
Seurat CCA works pretty well for integrating the 3 types of experiments: Parse standard, Parse deep, and 10x multiome. We decided on 3 levels of annotation: `gen_celltypes` or general celltypes (e.g. "Neuron"), `celltypes` for higher resolution (e.g. "Inhibitory"), and finally `subtypes` for the highest resolution of celltype annotations (e.g. "Pvalb"). The external atlas did not separate their oligodendrocytes into OPCs, MFOLs, and MOLs, but we use our expertise with the brain to check marker genes and assign cell type labels.

In [16]:
# TODO
# script that converts outputs to formats that austin has organized
# explain space requirements
# "shallow" --> "standard"
# store outputs in synapse

In [1]:
library(Matrix)
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(viridis))
library(glmGamPoi)

options(future.globals.maxSize = 8000 * 1024^2)
future.seed=TRUE

setwd("/share/crsp/lab/seyedam/share/enc4_mouse/snrna/")
meta = read.delim("ref/enc4_mouse_snrna_metadata.tsv")

"running command 'timedatectl' had status 1"


# Functions

In [19]:
# read in sparse matrix and assign row and column names
get_counts = function(batch){
    counts = readMM(paste0("scrublet/",batch,"_matrix.mtx"))
    barcodes = read.delim(paste0("scrublet/",batch,"_barcodes_scrublet.tsv"),header = F, 
                          col.names=c("cellID","doublet_scores","doublets"))
    
    features = read.delim(paste0("scrublet/",batch,"_genes.tsv"),header = F) 
    rownames(counts) = features$V1 
    colnames(counts) = barcodes$cellID
    out = counts
}



In [20]:
# read in associated metadata
get_metadata = function(batch){
    barcodes = read.delim(paste0("scrublet/",batch,"_barcodes_scrublet.tsv"),header = F, 
                          col.names=c("cellID","doublet_scores","doublets"))
    barcodes$library_accession = do.call("rbind", strsplit(barcodes$cellID, "[.]"))[,2]
    barcodes = left_join(barcodes,meta,by = "library_accession")
    out = barcodes
}



In [21]:
# merge the counts across experimental "batches"
# for example we sequenced 2 Parse "deep" libraries that should be combined into 1 counts matrix
# the technical batch effects between the standard and deep Parse libraries (and the Parse and 10x libraries) requires CCA integration

merge_counts = function(batches_list){
    matrix_list = list()
    for (i in 1:length(batches_list)){
        batch = batches_list[i]
        matrix_list[[i]] = get_counts(batch)
    }
    
    if (length(batches_list) < 2){
       matrix = matrix_list[[1]] 
       out = matrix
    } else {
        matrix = matrix_list[[1]]
        for (j in 2:length(batches_list)){
            matrix = RowMergeSparseMatrices(matrix,matrix_list[[j]])
        }
        out = matrix
    }
}

In [22]:
# merge the metadata across experimental "batches"
merge_metadata = function(batches_list){
    meta_list = list()
    for (i in 1:length(batches_list)){
        batch = batches_list[i]
        meta_list[[i]] = get_metadata(batch)
    }
    
    if (length(batches_list) < 2){
       meta = meta_list[[1]] 
       out = meta
    } else {
        meta = meta_list[[1]]
        for (j in 2:length(batches_list)){
            meta = rbind(meta,meta_list[[j]])
        }
        out = meta
    }
}


In [23]:
# make seurat object
seurat_obj = function(counts,metadata){
    obj = CreateSeuratObject(counts = counts, min.cells = 0, min.features = 0)
    obj@meta.data = cbind(obj@meta.data,metadata)
    obj[["percent.mt"]] = PercentageFeatureSet(obj, pattern = "^mt-")
    obj[["percent.ribo"]] <- PercentageFeatureSet(obj, pattern = "^Rp[sl][[:digit:]]|^Rplp[[:digit:]]|^Rpsa")
    out = obj
}


# Read in data
Use functions defined above to create 3 Seurat objects: Parse standard, Parse deep, and 10x. Also make sure to get associated metadata, which includes the QC filter information.

In [24]:
meta = meta[meta$tissue == "Hippocampus",]

In [25]:
# get the experimental batches for 10x, Parse standard, and Parse deep
tenx_batches = unique(meta$experiment_batch[meta$technology == "10x"])

parse_standard_batches = unique(meta$experiment_batch[meta$technology == "Parse" & 
                                              meta$depth1 == "shallow"])

parse_deep_batches = unique(meta$experiment_batch[meta$technology == "Parse" & 
                                                  meta$depth1 == "deep"])

In [26]:
#tenx_counts = merge_counts(tenx_batches)
#tenx_meta = merge_metadata(tenx_batches)

parse_standard_counts = merge_counts(parse_standard_batches)
parse_standard_meta = merge_metadata(parse_standard_batches)

parse_deep_counts = merge_counts(parse_deep_batches)
parse_deep_meta = merge_metadata(parse_deep_batches)


# Make Seurat objects

In [27]:
#obj_10x = seurat_obj(tenx_counts, 
#                     tenx_meta)

In [28]:
obj_parse_standard = seurat_obj(parse_standard_counts, 
                                parse_standard_meta)

"Non-unique features (rownames) present in the input matrix, making unique"


In [29]:
obj_parse_deep = seurat_obj(parse_deep_counts, 
                            parse_deep_meta)

"Non-unique features (rownames) present in the input matrix, making unique"


# Filter
Use QC information in metadata to filter by # UMIs and # genes detected per nucleus as well as doublet scores and percent mitochondrial gene expression. See [detailed metadata](https://github.com/erebboah/ENC4_Mouse_SingleCell/blob/master/snrna/ref/enc4_mouse_snrna_metadata.tsv) for more information.

In [30]:
#obj_10x <- subset(obj_10x, 
#                  subset = nCount_RNA > unique(obj_10x$lower_nCount_RNA) & 
#                  nCount_RNA < unique(obj_10x$upper_nCount_RNA)  & 
#                  nFeature_RNA > unique(obj_10x$lower_nFeature_RNA) & 
#                  doublet_scores < unique(obj_10x$upper_doublet_scores) & 
#                  percent.mt < unique(obj_10x$upper_percent.mt)) 

obj_parse_standard <- subset(obj_parse_standard, 
                            subset = nCount_RNA > unique(obj_parse_standard$lower_nCount_RNA) & 
                            nCount_RNA < unique(obj_parse_standard$upper_nCount_RNA)  & 
                            nFeature_RNA > unique(obj_parse_standard$lower_nFeature_RNA) & 
                            doublet_scores < unique(obj_parse_standard$upper_doublet_scores) & 
                            percent.mt < unique(obj_parse_standard$upper_percent.mt))

obj_parse_deep <- subset(obj_parse_deep, 
                         subset = nCount_RNA > unique(obj_parse_deep$lower_nCount_RNA) & 
                         nCount_RNA < unique(obj_parse_deep$upper_nCount_RNA)  & 
                         nFeature_RNA > unique(obj_parse_deep$lower_nFeature_RNA) & 
                         doublet_scores < unique(obj_parse_deep$upper_doublet_scores) & 
                         percent.mt < unique(obj_parse_deep$upper_percent.mt))

             

# SCT + CCA normalization and integration 
Use pretty standard Seurat pipeline to perform SCT normalization and integration. Create list of the 3 Seurat objects, use additional package to make SCT go faster (`method = "glmGamPoi"`), and save pre-integrated data in `seurat` folder. Use Parse standard Seurat object as reference dataset because it contains all timepoints, while 10x data only contains 2 timepoints. The integration takes a while, so make sure to save the object after it finishes.

In [32]:
#obj.list = list(obj_10x,obj_parse_standard,obj_parse_deep)
obj.list = list(obj_parse_standard,obj_parse_deep)

obj.list <- lapply(X = obj.list, FUN = SCTransform, method = "glmGamPoi", 
                   vars.to.regress = c("percent.mt","nFeature_RNA"), verbose = F)

saveRDS(obj.list,file="seurat/hippocampus_Parse_10x_to_integrate.rds")

In [40]:
obj.list = readRDS("seurat/hippocampus_Parse_10x_to_integrate.rds")

features <- SelectIntegrationFeatures(object.list = obj.list, nfeatures = 3000, verbose = F)
obj.list <- PrepSCTIntegration(object.list = obj.list, anchor.features = features, verbose = F)

#names(obj.list) = c("10x","standard","deep")
names(obj.list) = c("standard","deep")

reference_dataset <- which(names(obj.list) == "standard")

anchors <- FindIntegrationAnchors(object.list = obj.list, normalization.method = "SCT", 
    anchor.features = features, reference = reference_dataset, verbose = F)
combined.sct <- IntegrateData(anchorset = anchors, normalization.method = "SCT", verbose = F)

saveRDS(combined.sct,file="seurat/hippocampus_Parse_10x_integrated.rds")


"UNRELIABLE VALUE: One of the 'future.apply' iterations ('future_lapply-1') unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore"."


# Dimensionality reduction and clustering
Standard Seurat processing with PCA, UMAP, SNN graph construction, and clustering. Use high clustering resolution to separate smaller subtypes.

In [41]:
combined.sct = readRDS("seurat/hippocampus_Parse_10x_integrated.rds")
DefaultAssay(combined.sct) = "integrated" # Make sure to cluster on the integrated assay


## PCA

In [42]:
DefaultAssay(combined.sct) = "integrated"
combined.sct <- RunPCA(combined.sct, verbose = T, npcs = 50)


PC_ 1 
Positive:  Kcnip4, Rbfox1, Celf2, Lrrtm4, Grin2a, Tenm2, Lingo2, Dlg2, Dab1, Gria1 
	   Nrg3, Slit3, Epha6, Erc2, Ryr2, Kcnd2, Opcml, Kalrn, Fam155a, Rfx3 
	   Dlgap2, Meg3, Dlgap1, Ptprd, Ppfia2, Hs6st3, Lrp1b, Fam19a1, Grm7, Galnt17 
Negative:  Slc1a2, Gpc5, Slc1a3, Npas3, Atp1a2, Ntm, Msi2, Ptprz1, Mertk, Rora 
	   Nrxn1, Rgs20, Rorb, Plpp3, Gm3764, Ptprt, Prex2, Qk, Grm3, Sox6 
	   Htra1, Farp1, Gpc6, Nwd1, Gli3, Slc39a12, Daam2, Rmst, Prdm16, Phka1 
PC_ 2 
Positive:  Erbb4, Nxph1, Sox2ot, Grip1, Nrxn3, Grik1, Meg3, Pde4b, Lhfpl3, Rbms3 
	   Sgcz, Pcdh15, Dpp10, Galntl6, Kcnmb2, Plp1, Lrrc4c, Astn2, Ptprt, Etl4 
	   Alk, Elmo1, Snhg11, Pcdh9, Gabrg3, Ptprm, Mbp, Bcas1, Cntnap4, Zfp536 
Negative:  Trpm3, Slc1a2, Slc4a4, Rfx3, Kcnip4, Zbtb20, Glis3, Slc1a3, Celf2, Dab1 
	   Maml2, Lrp1b, Atp1a2, Lingo2, Ahcyl2, Prex2, Lrrtm4, Gpc5, Gabrb1, Slit3 
	   Kirrel3, Dgkh, Grin2a, Fam19a2, Ryr2, Cadm2, Ccdc85a, Btbd9, Kalrn, Sparcl1 
PC_ 3 
Positive:  Dock10, Plcl1, St18, Plp1, Mbp, L

## UMAP and clustering

In [44]:
combined.sct <- RunUMAP(combined.sct, reduction = "pca", dims = 1:30,verbose = F)
combined.sct <- FindNeighbors(combined.sct, reduction = "pca", dims = 1:30,verbose = F)

"The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session"


In [45]:
combined.sct <- FindClusters(combined.sct,resolution=1.6,verbose = F)

# Plotting: check integration and clustering

In [55]:
nclusters = length(unique(combined.sct$seurat_clusters))
cluster_cols = colorRampPalette(brewer.pal(9,"Set1"))(nclusters)

In [168]:
pdf(file="plots/hippocampus_umap_technologies.pdf",
    width = 20, height = 8)
p1 <- DimPlot(combined.sct, reduction = "umap", group.by = "technology")
p2 <- DimPlot(combined.sct, reduction = "umap", label = TRUE, repel = TRUE, cols = cluster_cols)
p1 + p2

dev.off()

In [169]:
pdf(file="plots/hippocampus_experiment_distribution.pdf",
    width = 20, height = 6)
DimPlot(combined.sct, reduction = "umap", group.by = "seurat_clusters",split.by = "depth2", label = TRUE, label.size = 6, repel = TRUE, shuffle = T,cols = cluster_cols)

ggplot(combined.sct@meta.data, aes(x=seurat_clusters, fill=depth2)) + geom_bar(position = "fill") & 
theme(text = element_text(size = 20), axis.text.x = element_text(size = 20), axis.text.y = element_text(size = 20))

dev.off()


In [170]:
combined.sct$sample = factor(combined.sct$sample, levels=paste0("HC_",rep(c("10","14","25","36","2m","18m"),each=4),rep(c("_M","_F"),each=2),c("_1","_2")))

pdf(file="plots/hippocampus_umaps_sample_barplot.pdf",
    width = 20, height = 10)
p1=DimPlot(combined.sct, reduction = "umap", group.by = "seurat_clusters", label = TRUE, label.size = 8, repel = TRUE, 
          cols = cluster_cols)
p2=ggplot(combined.sct@meta.data, aes(x=seurat_clusters, fill=sample)) + geom_bar(position = "fill") +
theme(text = element_text(size = 20), axis.text.x = element_text(size = 20), axis.text.y = element_text(size = 20)) & coord_flip()
gridExtra::grid.arrange(
  p1, p2,
  widths = c(2,1.6),
  layout_matrix = rbind(c(1, 2)))

dev.off()

In [171]:
pdf(file="plots/hippocampus_umaps_age_sex_distribution.pdf",
    width = 18, height = 19)
p1=DimPlot(combined.sct, reduction = "umap", group.by = "timepoint", label = TRUE, label.size = 5, repel = TRUE)
p2 = ggplot(combined.sct@meta.data, aes(x=seurat_clusters, fill=timepoint)) + geom_bar(position = "fill") & 
theme(text = element_text(size = 20), axis.text.x = element_text(size = 20), axis.text.y = element_text(size = 20)) & coord_flip()

p3=DimPlot(combined.sct, reduction = "umap", group.by = "sex", label = TRUE, label.size = 5, repel = TRUE, shuffle = T)
p4 = ggplot(combined.sct@meta.data, aes(x=seurat_clusters, fill=sex)) + geom_bar(position = "fill") & 
theme(text = element_text(size = 20), axis.text.x = element_text(size = 20), axis.text.y = element_text(size = 20)) & coord_flip()
gridExtra::grid.arrange(
  p1, p2, p3, p4,
  widths = c(2,1),
  layout_matrix = rbind(c(1, 2),
                        c(3, 4)))

dev.off()

In [172]:
# I want Vip+ and Sncg+ clusters to be separate, and Sst+ and Pvalb+.
pdf(file="plots/hippocampus_inhib_neuron_featureplots.pdf",
    width = 35, height = 20)
DefaultAssay(combined.sct) = "SCT" # do NOT use integrated assay to visualize gene expression
FeaturePlot(combined.sct, pt.size = 0.1, order = T,
            features =c("Sst","Pvalb",
                        "Vip","Sncg"), ncol =2)  & scale_colour_gradientn(colours = viridis(11)) & 
                        NoAxes()& 
                        theme(text = element_text(size = 20))

dev.off()

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.



In [48]:
saveRDS(combined.sct,file="seurat/hippocampus_Parse_10x_integrated.rds")


# Predict cell types from references
[10x brain atlas](https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-10x) and [associated paper](https://www.sciencedirect.com/science/article/pii/S0092867421005018?via%3Dihub).
Subsampled 1000 nuclei per `cell_type_alias_label`. `/share/crsp/lab/seyedam/share/enc4_mouse/snrna/ref/external_data/brain_atlas_subsampled_sct_pca.rds`

In [49]:
referencehippocampus = readRDS("ref/external_data/brain_atlas_subsampled_sct_pca.rds")

DefaultAssay(combined.sct) <- "SCT"
DefaultAssay(referencehippocampus) <- "SCT"

transfer_anchors <- FindTransferAnchors(
    reference = referencehippocampus,
    query = combined.sct,
    reference.assay = "SCT",
    normalization.method = "SCT",
    reference.reduction = "pca",
    recompute.residuals = FALSE,
    dims = 1:50,
    verbose = F)


In [50]:
predictions <- TransferData(
    anchorset = transfer_anchors, 
    refdata = referencehippocampus$subclass_label, 
    weight.reduction = combined.sct[['pca']],
    dims = 1:50,
    verbose = F)

combined.sct <- AddMetaData(
    object = combined.sct,
    metadata = predictions)
    
combined.sct$atlas_predictions = combined.sct$predicted.id

In [51]:
saveRDS(combined.sct,file="seurat/hippocampus_Parse_10x_integrated.rds")


# Add cell cycle scores

In [53]:
load("ref/mouse_cellcycle_genes.rda")
DefaultAssay(combined.sct) = "SCT"
combined.sct<- CellCycleScoring(combined.sct, s.features = m.s.genes, g2m.features = m.g2m.genes)


# Plotting: check predicted celltypes

In [173]:
pdf(file="plots/hippocampus_umaps_predictions.pdf",
    width = 15, height = 12)
nclusters = length(unique(combined.sct$atlas_predictions))
DimPlot(combined.sct, reduction = "umap", group.by = "atlas_predictions",
        label = TRUE, label.size = 6, repel = TRUE,cols = colorRampPalette(brewer.pal(9,"Set1"))(nclusters)) + NoLegend()
dev.off()

In [174]:
pdf(file="plots/hippocampus_qc_featureplots.pdf",
    width = 25, height = 10)
FeaturePlot(combined.sct, pt.size = 0.1, order = T,
            features =c("nFeature_RNA",
                        "percent.mt",
                        "percent.ribo",
                        "doublet_scores",
                        "G2M.Score"), ncol =3)  & scale_colour_gradientn(colours = viridis(11)) & 
                        NoAxes()& 
                        theme(text = element_text(size = 20))

dev.off()


Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.



# Rename clusters based on maximum predicted celltype

In [175]:
Idents(combined.sct) = "seurat_clusters"
mat = as.matrix(table(Idents(combined.sct), combined.sct$atlas_predictions))
ct = colnames(mat)[max.col(mat)]
names(ct) = 0:(length(ct)-1)

# basically add the cluster info to the maximum predicted celltype
for (i in 1:length(unique(Idents(combined.sct))))
{
    search = paste0("\\<",names(ct)[i],"\\>")
    replace = paste0(ct[i],".",names(ct)[i])
    Idents(combined.sct) = gsub(search,replace,Idents(combined.sct))
}

combined.sct[["atlas_celltypes"]] <- Idents(combined.sct)

In [176]:
pdf(file="plots/hippocampus_umaps_maximum_predictions.pdf",
    width = 15, height = 12)
nclusters = length(unique(combined.sct$atlas_celltypes))
DimPlot(combined.sct, reduction = "umap", group.by = "atlas_celltypes",
        label = TRUE, label.size = 6, repel = TRUE,cols = colorRampPalette(brewer.pal(9,"Set1"))(nclusters))  & NoLegend()

dev.off()

# Manual celltype annotation

In [178]:
combined.sct$subtypes = combined.sct$atlas_celltypes

In [179]:
# dot plot of some marker genes
pdf(file="plots/hippocampus_dotplot.pdf",
    width = 12, height = 12)
Idents(combined.sct) = "subtypes"
DotPlot(combined.sct, features = c("Dnah6","Dnah12", # ependymal
                                   "Prox1", # early DG
                                   "Pcdh15","Sox6", # OPC
                                   "Plp1","Mbp", # MFOL
                                  "Mag","Mog",# MOL
                                  "Tmem119","Csf1r","Cx3cr1")) # microglia
dev.off()

## Fix oligodendrocyte clusters

In [180]:
combined.sct$subtypes = gsub("\\<Oligo.20\\>","OPC.20",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Oligo.11\\>","OPC.11",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Oligo.23\\>","MFOL.23",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Oligo.9\\>","MOL.9",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Oligo.44\\>","MOL.44",combined.sct$subtypes) 


## Add early DG and ependymal clusters

In [181]:
combined.sct$subtypes = gsub("\\DG.8\\>","DG_early.8",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<Vip.7\\>","DG_early.7",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<Micro-PVM.41\\>","DG_early.41",combined.sct$subtypes)

combined.sct$subtypes = gsub("\\<CA3.39\\>","Ependymal.39",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Sncg.45\\>","Ependymal.45",combined.sct$subtypes) 


In [182]:
# get rid of cluster #
combined.sct$subtypes = do.call("rbind", strsplit(as.character(combined.sct$subtypes), "[.]"))[,1]

In [183]:
combined.sct$subtypes = gsub("\\<Astro\\>","Astrocyte",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Endo\\>","Endothelial",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Sst Chodl\\>","Sst",combined.sct$subtypes) 


In [184]:
pdf(file="plots/hippocampus_prelim_subtypes.pdf",
    width = 15, height = 10)
# clusters and celltypes
nclusters = length(unique(combined.sct$subtypes))
DimPlot(combined.sct, reduction = "umap", group.by = "subtypes", label = TRUE, label.size = 8, repel = TRUE, 
          cols = colorRampPalette(brewer.pal(9,"Set1"))(nclusters))
dev.off()

# Add celltypes and gen_celltypes metadata
Based on the subtypes annotation, we can group the cells into broader categories.

In [185]:
combined.sct$celltypes = combined.sct$subtypes

combined.sct$celltypes = gsub("\\<CA1-ProS\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<CA3\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<CR\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<CT SUB\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<Car3\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<DG\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<DG_early\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<L2 IT ENTm\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<L2/3 IT PPP\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<L3 IT ENT\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<L6 CT CTX\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<NP SUB\\>","Excitatory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<SUB-ProS\\>","Excitatory",combined.sct$celltypes)

combined.sct$celltypes = gsub("\\<Lamp5\\>","Inhibitory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<Pvalb\\>","Inhibitory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<Sncg\\>","Inhibitory",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<Sst\\>","Inhibitory",combined.sct$celltypes)

combined.sct$celltypes = gsub("\\<MFOL\\>","Oligodendrocyte",combined.sct$celltypes)
combined.sct$celltypes = gsub("\\<MOL\\>","Oligodendrocyte",combined.sct$celltypes)

combined.sct$celltypes = gsub("\\<SMC-Peri\\>","Smooth_muscle",combined.sct$celltypes)

combined.sct$celltypes = gsub("\\<Micro-PVM\\>","Microglia",combined.sct$celltypes)


In [186]:
table(combined.sct$celltypes)


      Astrocyte     Endothelial       Ependymal      Excitatory      Inhibitory 
           4561             521             244           29383            3157 
      Microglia             OPC Oligodendrocyte   Smooth_muscle            VLMC 
            575            1849            1935             118             270 

In [187]:
combined.sct$gen_celltype = combined.sct$celltypes

combined.sct$gen_celltype = gsub("\\<Excitatory\\>","Neuron",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Inhibitory\\>","Neuron",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Astrocyte\\>","Glial",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<OPC\\>","Glial",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Oligodendrocyte\\>","Glial",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Microglia\\>","Myeloid",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Ependymal\\>","Stromal",combined.sct$gen_celltype)



In [188]:
table(combined.sct$gen_celltype)


  Endothelial         Glial       Myeloid        Neuron Smooth_muscle 
          521          8345           575         32540           118 
      Stromal          VLMC 
          244           270 

# Plotting the 3 levels of annotations

In [189]:
color_ref = read.csv("ref/enc4_mouse_snrna_celltypes_c2c12.csv")
gen_celltype_colors = unique(color_ref[color_ref$X...tissue == "Hippocampus",c("gen_celltype","gen_celltype_color")])
rownames(gen_celltype_colors) = gen_celltype_colors$gen_celltype
gen_celltype_colors = gen_celltype_colors[sort(unique(combined.sct$gen_celltype)),]

pdf(file="plots/hippocampus_gen_celltype.pdf",
    width = 15, height = 10)

DimPlot(combined.sct, reduction = "umap", 
        group.by = "gen_celltype", 
        label = TRUE, label.size = 8, repel = TRUE,
       cols = gen_celltype_colors$gen_celltype_color)

dev.off()

In [190]:
celltype_colors = unique(color_ref[color_ref$X...tissue == "Hippocampus",c("celltypes","celltype_color")])
rownames(celltype_colors) = celltype_colors$celltypes
celltype_colors = celltype_colors[sort(unique(combined.sct$celltypes)),]

pdf(file="plots/hippocampus_celltypes.pdf",
    width = 15, height = 10)

DimPlot(combined.sct, reduction = "umap", 
        group.by = "celltypes", 
        label = TRUE, label.size = 8, repel = TRUE,
       cols = celltype_colors$celltype_color)

dev.off()

In [191]:
subtype_colors = unique(color_ref[color_ref$X...tissue == "Hippocampus",c("subtypes","subtype_color")])
rownames(subtype_colors) = subtype_colors$subtypes
subtype_colors = subtype_colors[sort(unique(combined.sct$subtypes)),]

pdf(file="plots/hippocampus_subtypes.pdf",
    width = 15, height = 10)

DimPlot(combined.sct, reduction = "umap", 
        group.by = "subtypes", 
        label = TRUE, label.size = 8, repel = TRUE,
       cols = celltype_colors$subtype_color)

dev.off()

## Proportion plot of celltypes over timepoint

In [192]:
combined.sct_parse = subset(combined.sct,subset= technology == "Parse")

samples = sort(unique(combined.sct_parse$timepoint))
dflist = list()
for (i in 1:length(unique(combined.sct_parse$timepoint))){
  tp=combined.sct_parse@meta.data[combined.sct_parse@meta.data$timepoint == samples[i],]
  #tp=tp[complete.cases(tp),]
  tp_df=as.data.frame(table(tp$celltypes))
  tp_df$percentage=tp_df$Freq/nrow(tp)
  tp_df$timepoint=rep(i,nrow(tp_df))
  dflist[[i]]=tp_df
}
df = do.call(rbind, dflist)
df <- df[order(df$timepoint),]
colnames(df)= c("celltypes","Freq","percentage","timepoint")



In [193]:
pdf(file="plots/hippocampus_timepoint_celltypes_proportions.pdf",
    width = 15, height = 10)

ggplot(df, aes(x=timepoint, y=percentage, fill=celltypes)) + 
  geom_area()  +
  scale_fill_manual(values= celltype_colors$celltype_color) + 
  scale_x_continuous(breaks = c(1,2,3,4,5,6),labels= c("PND_10","PND_14",
                                                         "PND_25","PND_36","PNM_02","PNM_18-20"))+
  scale_y_continuous(breaks = c(0,0.1,0.2,0.3,0.4,0.5,
                                0.6,0.7,0.8,0.9,1.0),labels= c("0%","10%","20%","30%","40%","50%","60%","70%","80%","90%","100%")) + 
theme_minimal()+theme(text = element_text(size = 30)) + 
theme(axis.text.x = element_text(size = 30))  + 
theme(axis.text.y = element_text(size = 30))   + 
theme(axis.text.x = element_text(angle = 45, hjust = 1))
  
dev.off()

In [167]:
saveRDS(combined.sct,file="seurat/hippocampus_Parse_10x_integrated.rds")


In [2]:
sessionInfo()


R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /opt/apps/OpenBLAS/0.3.6/lib/libopenblas_zenp-r0.3.6.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] glmGamPoi_1.6.0    viridis_0.6.2      viridisLite_0.4.0  forcats_0.5.1     
 [5] stringr_1.4.0      dplyr_1.0.9        purrr_0.3.4        readr_2.1.2       
 [9] tidyr_1.2.0        tibble_3.1.7       ggplot2_3.3.6      tidyverse_1.3.1   
[13] SeuratObject_4.0.4 Seurat_4.1.0       Matrix_1.3-4      

loaded via a namespace (and not attached):
  [1] readxl_1.3.1                uuid_1.0-3                 
  [3] backports_1.4.1             plyr_1.8.7                 
  [5] igraph_1.2.11               repr_1.1.4                 
  [7] lazyeval_0.2.2              splines_4.1.2              
  [9] listenv_0.8.0               scattermore_0.8            
 