---
# Dataset Formatting: Richards-GBM-LGG
*L.Richards*  
*2020-05-22*  
*/cluster/projects/pughlab/projects/cancer_scrna_integration/data/*  

---

Format Richards-GBM-LGG dataset. This will be input for data integration tools.

In [None]:
library(Seurat) #v4.0.1
library(earlycross) # v0.1

---
## 1.0 Generate & cluster data cohort (H4H)
---

Lets merge 3 pairs together, that way we have a good mix of biological replicates, multiple samples from the same patient and  samples from different patients. This should help us understand if the algorithms are over or under correcting the data (but most likely over correcting...)

In [None]:
setwd("/cluster/projects/pughlab/projects/OICR_Brain_NucSeq/GBM/analysis/BatchCorrection")
source("~/github/oicr-brain-tri-gbm/src/scRNA_helper_functions.r")

data.path <- "/cluster/projects/pughlab/projects/OICR_Brain_NucSeq/GBM/analysis/remove-doublets/seurat_objs/patients"

objects <- c("B_P.GBM593.1_P.GBM593.2_R.GBM898_Seurat.rds", # LGG oligoastrocytoma
             "C_P.GBM577.1_P.GBM577.2_R.GBM625_Seurat.rds", # GBM
             "F_P.GBM620_R.GBM691_Seurat.rds" # GBM
            )

# read in seurat objects into a list
seurats <- list()
for (i in 1:length(objects)){
    
    seurats[[i]] <- readRDS(paste0(data.path, "/", objects[i])) 
    
}

# merge seurat objects together
# new cohort size is 35,549 nuclei
seurats <- merge(seurats[[1]], y = c(seurats[[2]], seurats[[3]]))

# cluster merged data
seurats <- quickCluster(seurats,
                        normalize = TRUE,
                        vars.to.regress = NULL,
                        #k.param = 20,
                        dims = 20, # max dims 1:dims
                        n.vargenes = 2000,
                        min.resolution = 2.11,
                        max.resolution = 2.11,
                        n.resolution = 1, #how many resolutions to cluster over
                        verbose = FALSE,
                        pc.calc = 75, # how many PCs to calculate
                        pca.genes = "var" # accepts "all" or "var"
                       )

# plot data
# pdf("Pairs_B.C.F_NoBatchCorrection.pdf", width = 18, height = 5)
# DimPlot(seurats, 
#         group.by = c("SampleID", "PairID", "SingleR_CollapsedLabels"),
#         ncol = 3
#        )
# dev.off()

# save data
saveRDS(seurats, 
        file = "/cluster/projects/pughlab/projects/cancer_scrna_integration/data/Richards-GBM-LGG/original-data/Richards-GBM-LGG_seurat.rds"
       )

---
## 2.0 Output files in 10x common format
---

Output counts matrix with 10x/CellRanger formatting style and metadata csv.

In [None]:
# save metadata as csv file
meta <- data.frame(seurats@meta.data)

keep <- c("AnalysisID", "TissueID", "SampleID", 
          "PairID", "TumourStage", "Age", "Sex",
          "TumourGrade", "Pathology_Detailed", "Pathology", 
          "TumourType", "nCount_RNA", "nFeature_RNA", "SingleR_CollapsedLabels"
         )

meta <- meta[ ,keep]
meta$CellBarcode <- rownames(meta)

write.csv(meta, file = "Richards-GBM-LGG_meta.csv")

In [None]:
# export count matrix as default 10x CellRanger output
Write10X(seurats)

---