## Merging Disease and Control Samples

2021-09-03: 

Our goal here is to see when we merge samples extracted from patients with GBM with healthy samples would the cluster together or separately?

Hypothesis: If disease samples cluster separately with the control samples, then the transcriptome can help differentiate between disease state and normal state.

Table: Healthy Brain Samples

| #   | Run        | Sample Name  | Instrument | Age | Sex    | Tissue                   |
| --- | ---------- | ------------ | ---------- | --- | ------ | ------------------------ |
| 1   | SRR9264382 | 5981_BA9_10x | Novaseq    | 44  | Female | Brain, prefrontal cortex |
| 2   | SRR9264383 | 5546_BA9_10x | Novaseq    | 34  | Female | Brain, prefrontal cortex |
| 3   | SRR9264388 | 5609_BA9_10x | Novaseq    | 54  | Female | Brain, prefrontal cortex |
| 4   | SRR9264389 | 5787_BA9_10x | Novaseq    | 39  | Male   | Brain, prefrontal cortex |
| 5   | SRR9262938 | 5958_BA9_10x | Novaseq    | 22  | Male  | Brain, prefrontal cortex |
| 6   | SRR9262956 | 5577_BA9_10x | Novaseq    | 21  | Male  | Brain, prefrontal cortex |

Table: GBM Samples

| #   | Sample Name | Instrument          | Age | Sex    | Disease |
| --- | ----------- | ------------------- | --- | ------ | ------- |
| 1   | SF...159    | Illumina HiSeq 2000 | 60  | Male   | GBM IV  |
| 2   | SF...209    | Illumina HiSeq 2000 | 62  | Female | GBM IV  |
| 3   | SF...215    | Illumina HiSeq 2000 | 46  | Male   | GBM IV  |
| 4   | SF...232    | Illumina HiSeq 2000 | 65  | Male   | GBM IV  |
| 5   | SF...247    | Illumina HiSeq 2000 | 45  | Male   | GBM IV  |
| 6   | SF...285    | Illumina HiSeq 2000 | 54  | Male   | GBM IV  |

Table: NCS Samples

| #   | Run         | Sample Name | Instrument          | Disease         |
| --- | ----------- | ----------- | ------------------- | --------------- |
| 1   | SRR10353960 | GBM27       | Illumina HiSeq 2000 | GBM IV CD133+/+ |
| 2   | SRR10353961 | GBM28       | Illumina HiSeq 2000 | GBM IV CD133+/+ |
| 3   | SRR10353962 | GBM29       | Illumina HiSeq 2000 | GBM IV CD133+/+ |

In [1]:
.libPaths(c("/scratch/samkyy/gete-gbm/renv/library/R-4.0/x86_64-pc-linux-gnu","/tmp/RtmpJsRC8Z/renv-system-library", .libPaths()))
.libPaths()

In [2]:
resultsPath <- "~/scratch/gete-gbm/results"
getwd()

In [3]:
library(Seurat)
library(Matrix)
library(ggplot2)
library(cowplot)
library(genefilter)
library(tidyverse)
library(RCurl)
library(scales)
# library(SingleCellExperiment)
library(AnnotationHub)
library(ensembldb)
# Packages for figure layout formatting
library(grid)
library(gridExtra)

Attaching SeuratObject

“running command 'timedatectl' had status 1”
Registered S3 method overwritten by 'cli':
  method     from         
  print.boxx spatstat.geom

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mtibble [39m 3.1.3     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.3     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.1
[32m✔[39m [34mpurrr  [39m 0.3.4     

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mtidyr[39m::[32mexpand()[39m masks [34mMatrix[39m::expand()
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[31m✖[39m [34mtidyr[39m::[32mpack()[39m   masks [34mMatrix[39m::pack()
[31m✖[39m [34mreadr[39m::[32mspec()[39m   masks [34mgenefi

In [4]:
# Packages for Gene Ontology
library(GOstats)
library(org.Hs.eg.db)

Loading required package: Category

Loading required package: graph


Attaching package: 'graph'


The following object is masked from 'package:stringr':

    boundary





Attaching package: 'GOstats'


The following object is masked from 'package:AnnotationDbi':

    makeGOGraph






In [5]:
source("~/scratch/gete-gbm/bin/util.R")
source("~/scratch/gete-gbm/bin/util_go.R")
source("~/scratch/gete-gbm/bin/util_seurat.R")
source("~/scratch/gete-gbm/bin/util_viz.R")

## Load Datasets

In [None]:
### load filtered neuroblastoma datasets
load("/home/samkyy/scratch/gete-gbm/results/2021-06-11/filt_ge_gbmsc.RData")
load("/home/samkyy/scratch/gete-gbm/results/2021-06-11/filt_gte_gbmsc.RData")

### load filtered and integrated GBM datasets
load("/home/samkyy/scratch/gete-gbm/data/2021-04-01_integrate/filt_gte_brain.intergrated.RData")
load("/home/samkyy/scratch/gete-gbm/data/2021-04-01_integrate/filt_GE_select.intergrated.RData")

In [None]:
### Rename
filt_GE_select.intergrated@meta.data <- filt_GE_select.intergrated@meta.data %>%
        dplyr::rename(orig.ident = seq_folder,
                     GBM_integrated_snn_res.0.8 = integrated_snn_res.0.8,
                     GBM_seurat_clusters = seurat_clusters)
filt_gte_brain.intergrated@meta.data <- filt_gte_brain.intergrated@meta.data %>%
        dplyr::rename(orig.ident = seq_folder,
                     GBM_integrated_snn_res.0.8 = integrated_snn_res.0.8,
                     GBM_seurat_clusters = seurat_clusters)

colnames(filt_GE_select.intergrated@meta.data)
colnames(filt_gte_brain.intergrated@meta.data)

In [None]:
#### Split the dataset into a list of seurat objects by their sample of origin
filt_gte_gbmsc.list <- SplitObject(object=filt_gte_gbmsc, split.by="orig.ident")
filt_ge_gbmsc.list <- SplitObject(object=filt_ge_gbmsc, split.by="orig.ident")

## End of Notebook

In [1]:
sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] fansi_0.4.2          digest_0.6.27        utf8_1.2.1          
 [4] crayon_1.4.1         IRdisplay_1.0.0.9000 repr_1.1.3.9000     
 [7] lifecycle_1.0.0      jsonlite_1.7.2       evaluate_0.14 