# Mouse snRNA Integrative Analysis
## Gastrocnemius
### Data
- [Gastroc data table](https://github.com/erebboah/enc4_mouse/blob/master/snrna/ref/gastrocnemius_minimal_metadata.tsv)

### Aims
[integrate_parse_10x.R](https://github.com/erebboah/enc4_mouse/blob/master/snrna/scripts/integrate_parse_10x.R):
1. Read in pre-processed Parse and 10x data and merge counts matrices across experiments (within the same technology) for each tissue.
2. Filter nuclei by # genes, # UMIs, percent mitochondrial gene expression, and doublet score. See [detailed metadata](https://github.com/erebboah/enc4_mouse/blob/master/snrna/ref/enc4_mouse_snrna_metadata.tsv) for filter cutoffs. **Also filter 10x nuclei for those passing snATAC filters.**
3. Run SCT on the 3 objects to regress `percent.mt` and `nFeature_RNA`. Use  `method = "glmGamPoi"` to speed up this step, and save pre-integrated data in `seurat` folder.
4. Combine Parse standard, Parse deep, and 10x data by CCA integration. Use Parse standard as reference dataset because it contains all timepoints, while 10x data only contains 2 timepoints. 
5. Score nuclei by cell cycle using these [mouse cell cycle genes](https://github.com/erebboah/enc4_mouse/blob/master/snrna/ref/mouse_cellcycle_genes.rda) to aid in manual celltype annotation.

[predict_gastroc_celltypes.R](https://github.com/erebboah/enc4_mouse/blob/master/snrna/scripts/predict_gastroc_celltypes.R): Use an [external TA dataset](https://www.synapse.org/#!Synapse:syn21676145/files/) from [this paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7733460/pdf/41467_2020_Article_20063.pdf) to predict celltype labels.

**In this notebook**:
Manual celltype annotation by assigning each cluster to the celltype predicted for the majority of cells in the cluster, then adjusting the labels as we see fit. Find marker genes for `gen_celltype`, `celltypes`, and `subtypes` and save in `seurat/markers`.

### Results
- Seurat CCA integration still works ok, but this tissue has the most difference between Parse and 10x, most likely due to differences in nuclei prep that are most apparent in muscle.
- Low cluster definition, especially in myonuclei

In [1]:
library(Matrix)
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(viridis))
library(glmGamPoi)
library(RColorBrewer)
options(future.globals.maxSize = 10000 * 1024^2)
future.seed=TRUE


"running command 'timedatectl' had status 1"


In [None]:
setwd("../../enc4_mouse/snrna/")

In [2]:
setwd("/share/crsp/lab/seyedam/share/enc4_mouse/snrna/")

In [3]:
system("mkdir plots/gastrocnemius")
system("mkdir plots/gastrocnemius/qc")
system("mkdir plots/gastrocnemius/clustering")
system("mkdir plots/gastrocnemius/annotation")
system("mkdir seurat/markers/gastrocnemius")
system("mkdir ref/gastrocnemius")

# Functions

In [4]:
get_orig_counts = function(file){
    metadata = metadata[metadata$file_accession == file,]
    counts = readMM(paste0("counts_10x/",file,"/matrix.mtx"))
    
    barcodes = read.delim(paste0("counts_10x/",file,"/barcodes.tsv"),header = F, col.names="barcode")
    features = read.delim(paste0("counts_10x/",file,"/genes.tsv"),header = F, col.names="gene_name") 
    colnames(counts) = barcodes$barcode
    rownames(counts) = features$gene_name
    out = counts

}

In [5]:
knee_df = function(mtx,expt_name){
    df = as.data.frame(rowSums(mtx))
    colnames(df) = c("nUMI")
    df <- tibble(total = df$nUMI,
               rank = row_number(dplyr::desc(total))) %>%
    distinct() %>%
    arrange(rank)
    df$experiment = expt_name
    out = df
}

# QC plots

## Knee plot

In [3]:
combined.sct = readRDS("seurat/gastrocnemius_Parse_10x_integrated.rds")
cellbend_10x = subset(combined.sct,subset=technology =="10x")
orig_parse = subset(combined.sct,subset=technology =="Parse")
parse_standard = subset(orig_parse,subset=depth1 =="shallow")
parse_deep = subset(orig_parse,subset=depth1 =="deep")

In [23]:
metadata = read.delim("ref/enc4_mouse_snrna_metadata.tsv")
metadata = metadata[metadata$technology == "10x",]
metadata = metadata[metadata$tissue == "Gastrocnemius",]

files = metadata$file_accession

orig_10x = get_orig_counts(files[1])

for (j in 2:length(files)){
    counts_adding = get_orig_counts(files[j])
    orig_10x = cbind(orig_10x, counts_adding)
}


In [24]:
cellbend_knee_plot = knee_df(cellbend_10x@assays$RNA@counts, "10x + Cellbender")
orig_knee_plot = knee_df(orig_10x, "10x")

parse_standard_knee_plot = knee_df(parse_standard@assays$RNA@counts, "Parse standard")
parse_deep_knee_plot = knee_df(parse_deep@assays$RNA@counts, "Parse deep")

pdf(file="plots/gastrocnemius/qc/experiment_kneeplots.pdf",
    width = 10, height = 8)
ggplot(rbind(cellbend_knee_plot,orig_knee_plot,parse_standard_knee_plot,parse_deep_knee_plot), 
       aes(rank, total, group = experiment, color = experiment)) +
geom_path() + 
scale_y_log10() + scale_x_log10() + annotation_logticks() +
labs(y = "Total UMI count", x = "Barcode rank", title = "Mouse gastrocnemius knee plot") + 
geom_hline(yintercept=500, linetype="dashed", color = "red", size=1)
dev.off()


"Transformation introduced infinite values in continuous y-axis"


In [25]:
pdf(file="plots/gastrocnemius/qc/experiment_violinplots.pdf",
    width = 20, height = 5)
VlnPlot(combined.sct, features = c("nFeature_RNA"), ncol = 1, split.by = "depth2",
        pt.size = 0, group.by = "sample", cols = c("#811b74","#C08DBA","#00a1e0"))+ ggtitle("# genes per nucleus") +
stat_summary(fun.y = median, geom='point', size = 15, colour = "black", shape = 95) & theme(text = element_text(size = 20), 
                                                                              axis.text.x = element_text(size = 20), 
                                                                              axis.text.y = element_text(size = 20))
VlnPlot(combined.sct, features = c("nCount_RNA"), ncol = 1, split.by = "depth2",
        pt.size = 0, group.by = "sample", cols = c("#811b74","#C08DBA","#00a1e0")) + ggtitle("# UMIs per nucleus") +
stat_summary(fun.y = median, geom='point', size = 15, colour = "black", shape = 95)& theme(text = element_text(size = 20), 
                                                  axis.text.x = element_text(size = 20), 
                                                  axis.text.y = element_text(size = 20))
VlnPlot(combined.sct, features = c("percent.mt"), ncol = 1, split.by = "depth2",
        pt.size = 0, group.by = "sample", cols = c("#811b74","#C08DBA","#00a1e0")) & theme(text = element_text(size = 20), 
                                                  axis.text.x = element_text(size = 20), 
                                                  axis.text.y = element_text(size = 20)) 
dev.off()

"`fun.y` is deprecated. Use `fun` instead."
"Groups with fewer than two data points have been dropped."
"Groups with fewer than two data points have been dropped."
"`fun.y` is deprecated. Use `fun` instead."
"Groups with fewer than two data points have been dropped."
"Groups with fewer than two data points have been dropped."
"Groups with fewer than two data points have been dropped."
"Groups with fewer than two data points have been dropped."


## UMAP "Feature Plots" of QC metadata

In [26]:
pdf(file="plots/gastrocnemius/qc/qc_featureplot.pdf",
    width = 25, height = 10)
FeaturePlot(combined.sct, pt.size = 0.1, order = T,
            features =c("nFeature_RNA",
                        "nCount_RNA",
                        "percent.mt",
                        "percent.ribo",
                        "doublet_scores",
                        "G2M.Score"), ncol =3)  & scale_colour_gradientn(colours = viridis(11)) & 
                        NoAxes()& 
                        theme(text = element_text(size = 20))

dev.off()


Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.



# Check integration and clustering
Adjusting UMAP function to cluster nuclei more tightly, and increasing cluster resolution.

In [27]:
nclusters = length(unique(combined.sct$seurat_clusters))
cluster_cols = colorRampPalette(brewer.pal(9,"Set1"))(nclusters)

In [28]:
pdf(file="plots/gastrocnemius/clustering/UMAP_Parse_10x.pdf",
    width = 20, height = 8)
p1 <- DimPlot(combined.sct, reduction = "umap", group.by = "technology")
p2 <- DimPlot(combined.sct, reduction = "umap", label = TRUE, repel = TRUE, cols = cluster_cols)
p1 + p2

dev.off()

In [29]:
pdf(file="plots/gastrocnemius/clustering/Parse_10x_experiment_distribution.pdf",
    width = 20, height = 6)
DimPlot(combined.sct, reduction = "umap", group.by = "seurat_clusters",split.by = "depth2", label = TRUE, label.size = 6, repel = TRUE, shuffle = T,cols = cluster_cols)

ggplot(combined.sct@meta.data, aes(x=seurat_clusters, fill=depth2)) + geom_bar(position = "fill") & 
theme(text = element_text(size = 20), axis.text.x = element_text(size = 20), axis.text.y = element_text(size = 20))

dev.off()


In [30]:
combined.sct$sample = factor(combined.sct$sample, levels=paste0("G_",rep(c("4","10","14","25","36","2m","18m"),each=4),rep(c("_M","_F"),each=2),c("_1","_2")))

pdf(file="plots/gastrocnemius/clustering/UMAP_cluster_sample_barplot.pdf",
    width = 20, height = 10)
p1=DimPlot(combined.sct, reduction = "umap", group.by = "seurat_clusters", label = TRUE, label.size = 8, repel = TRUE, 
          cols = cluster_cols)
p2=ggplot(combined.sct@meta.data, aes(x=seurat_clusters, fill=sample)) + geom_bar(position = "fill") +
theme(text = element_text(size = 20), axis.text.x = element_text(size = 20), axis.text.y = element_text(size = 20)) & coord_flip()
gridExtra::grid.arrange(
  p1, p2,
  widths = c(2,1.6),
  layout_matrix = rbind(c(1, 2)))

dev.off()

In [31]:
pdf(file="plots/gastrocnemius/clustering/age_sex_barplot.pdf",
    width = 18, height = 19)
p1=DimPlot(combined.sct, reduction = "umap", group.by = "timepoint", label = TRUE, label.size = 5, repel = TRUE)
p2 = ggplot(combined.sct@meta.data, aes(x=seurat_clusters, fill=timepoint)) + geom_bar(position = "fill") & 
theme(text = element_text(size = 20), axis.text.x = element_text(size = 20), axis.text.y = element_text(size = 20)) & coord_flip()

p3=DimPlot(combined.sct, reduction = "umap", group.by = "sex", label = TRUE, label.size = 5, repel = TRUE, shuffle = T)
p4 = ggplot(combined.sct@meta.data, aes(x=seurat_clusters, fill=sex)) + geom_bar(position = "fill") & 
theme(text = element_text(size = 20), axis.text.x = element_text(size = 20), axis.text.y = element_text(size = 20)) & coord_flip()
gridExtra::grid.arrange(
  p1, p2, p3, p4,
  widths = c(2,1),
  layout_matrix = rbind(c(1, 2),
                        c(3, 4)))

dev.off()

# Plotting: check predicted celltypes

In [32]:
pdf(file="plots/gastrocnemius/annotation/UMAP_predictions.pdf",
    width = 15, height = 12)
nclusters = length(unique(combined.sct$predictions))
DimPlot(combined.sct, reduction = "umap", group.by = "predictions",
        label = TRUE, label.size = 6, repel = TRUE,cols = colorRampPalette(brewer.pal(9,"Set1"))(nclusters)) + NoLegend()

dev.off()

# Rename clusters based on maximum predicted celltype

In [7]:
Idents(combined.sct) = "seurat_clusters"
mat = as.matrix(table(Idents(combined.sct), combined.sct$predictions))
ct = data.frame(predicted_celltypes = colnames(mat)[max.col(mat)])
ct$seurat_clusters = 0:(nrow(ct)-1)
ct$seurat_clusters = factor(ct$seurat_clusters, levels = 0:(nrow(ct)-1))
ct$predicted_celltypes = paste0(ct$predicted_celltypes, ".", ct$seurat_clusters)

metadata = as.data.frame(combined.sct@meta.data)
metadata = left_join(metadata, ct)

combined.sct$predicted_celltypes = metadata$predicted_celltypes

[1m[22mJoining, by = "seurat_clusters"


In [34]:
pdf(file="plots/gastrocnemius/annotation/UMAP_maximum_predictions.pdf",
    width = 15, height = 12)

#options(repr.plot.width=20,repr.plot.height=20)

nclusters = length(unique(combined.sct$predicted_celltypes))
DimPlot(combined.sct, reduction = "umap", group.by = "predicted_celltypes",
        label = TRUE, label.size = 4, repel = TRUE,cols = colorRampPalette(brewer.pal(9,"Set1"))(nclusters))  & NoLegend()
dev.off()


## Fix clusters
Type 2X: Myh1

Type 2A: Myh2

Type 2B: Myh4

In [16]:
combined.sct$subtypes = combined.sct$predicted_celltypes
combined.sct$subtypes = gsub("\\<FAPs.40\\>","Adipocytes.40",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<FAPs.34\\>","Tenocytes.34",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<Type IIx Myonuclei.14\\>","Perinatal.14",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<Type IIx Myonuclei.16\\>","Perinatal.16",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<Type IIx Myonuclei.31\\>","Perinatal.31",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<Type IIx Myonuclei.27\\>","Perinatal.27",combined.sct$subtypes)

combined.sct$subtypes = gsub("\\<Type IIx Myonuclei.7\\>","Type IIa Myonuclei.7",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<Type IIx Myonuclei.8\\>","Type IIb Myonuclei.8",combined.sct$subtypes)

combined.sct$subtypes = gsub("\\<Type IIx Myonuclei.4\\>","Type IIb Myonuclei.4",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<Type IIx Myonuclei.39\\>","Type IIb Myonuclei.39",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<FAPs.35\\>","Type IIb Myonuclei.35",combined.sct$subtypes)
combined.sct$subtypes = gsub("\\<FAPs.36\\>","Type IIb Myonuclei.35",combined.sct$subtypes)



## Get rid of cluster number

In [17]:
# get rid of cluster #
combined.sct$subtypes = do.call("rbind", strsplit(as.character(combined.sct$subtypes), "[.]"))[,1]


## Change some names

In [18]:
combined.sct$subtypes = gsub("\\<Endothelial Cells\\>","Endothelial",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Smooth Muscle\\>","Smooth_muscle",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Schwann Cells\\>","Schwann_cells",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Satellite Cells\\>","Satellite",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<FAPs\\>","FAP",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Immune Cells\\>","Macrophages",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Neuromuscular Junction\\>","NMJ",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Myotendinous Junction\\>","MTJ",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Type I Myonuclei\\>","Type1",combined.sct$subtypes) 


combined.sct$subtypes = gsub("\\<Type IIx Myonuclei\\>","Type2X",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Type IIb Myonuclei\\>","Type2B",combined.sct$subtypes) 
combined.sct$subtypes = gsub("\\<Type IIa Myonuclei\\>","Type2A",combined.sct$subtypes) 



# Add celltypes and gen_celltypes metadata
Based on the subtypes annotation, we can group the cells into broader categories.

In [34]:
combined.sct$celltypes = combined.sct$subtypes
combined.sct$gen_celltype = combined.sct$celltypes

combined.sct$gen_celltype = gsub("\\<Type1\\>","Myonuclei",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Type2A\\>","Myonuclei",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Type2B\\>","Myonuclei",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Type2X\\>","Myonuclei",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<NMJ\\>","Myonuclei",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<MTJ\\>","Myonuclei",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Satellite\\>","Myonuclei",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Perinatal\\>","Myonuclei",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<FAP\\>","Stromal",combined.sct$gen_celltype)
combined.sct$gen_celltype = gsub("\\<Macrophages\\>","Myeloid",combined.sct$gen_celltype)


### Dot plot of marker genes

In [35]:
genes = c("Col22a1","Myh1","Myh2","Myh3","Myh4","Myh6","Myh7","Myh8","Myh9","Myh11")

In [36]:
pdf(file="plots/gastrocnemius/annotation/subtype_marker_dotplot.pdf",
    width = 20, height = 12)

options(repr.plot.width=15, repr.plot.height=10)

DefaultAssay(combined.sct) = "SCT"
Idents(combined.sct) = "subtypes"
Idents(combined.sct) = factor(Idents(combined.sct), levels = sort(as.character(unique(Idents(combined.sct)))))
DotPlot(combined.sct, features = genes)+ 
theme(axis.text.x = element_text(angle = 45, hjust = 1)) 
dev.off()

## Plotting the 3 levels of annotation

In [38]:
color_ref = read.delim("ref/enc4_mouse_snrna_celltypes_c2c12.csv",sep=",",col.names = c("tissue","gen_celltype","celltypes",
                                                                              "subtypes","gen_celltype_color",
                                                                              "celltype_color","subtype_color"))
gen_celltype_colors = unique(color_ref[color_ref$tissue == "Gastrocnemius",c("gen_celltype","gen_celltype_color")])
rownames(gen_celltype_colors) = gen_celltype_colors$gen_celltype
gen_celltype_colors = gen_celltype_colors[sort(unique(combined.sct$gen_celltype)),]

pdf(file="plots/gastrocnemius/annotation/UMAP_final_gen_celltype.pdf",
   width = 15, height = 10)

#options(repr.plot.width=15,repr.plot.height=10)
DimPlot(combined.sct, reduction = "umap", 
        group.by = "gen_celltype", 
        label = TRUE, label.size = 8, repel = TRUE,
       cols = gen_celltype_colors$gen_celltype_color)

dev.off()

In [40]:
celltype_colors = unique(color_ref[color_ref$tissue == "Gastrocnemius",c("celltypes","celltype_color")])
rownames(celltype_colors) = celltype_colors$celltypes
celltype_colors = celltype_colors[sort(unique(combined.sct$celltypes)),]

pdf(file="plots/gastrocnemius/annotation/UMAP_final_celltypes.pdf",
    width = 15, height = 10)

DimPlot(combined.sct, reduction = "umap", 
        group.by = "celltypes", 
        label = TRUE, label.size = 8, repel = TRUE,
       cols = celltype_colors$celltype_color)

dev.off()

In [42]:
subtype_colors = unique(color_ref[color_ref$tissue == "Gastrocnemius",c("subtypes","subtype_color")])
rownames(subtype_colors) = subtype_colors$subtypes
subtype_colors = subtype_colors[sort(unique(combined.sct$subtypes)),]

pdf(file="plots/gastrocnemius/annotation/UMAP_final_subtypes.pdf",
    width = 15, height = 10)

DimPlot(combined.sct, reduction = "umap", 
        group.by = "subtypes", 
        label = TRUE, label.size = 8, repel = TRUE,
       cols = subtype_colors$subtype_color)

dev.off()

In [20]:
saveRDS(combined.sct,file="seurat/gastrocnemius_Parse_10x_integrated.rds")