## Motivation {.unnumbered}
Single cell omic analysis can be done on both R or Python. There currently exists a few packages to format, process & analyse scRNAseq data, namely Seurat (V5) [[Hao et al.; 2023]](https://www.nature.com/articles/s41587-023-01767-y), Scanpy [[Wolf et al.; 2018]](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1382-0), SingleCellExperiment [[Amezquita et al.; 2019]](https://www.nature.com/articles/s41592-019-0654-x), Scran [[Lun et al.; 2016]](https://f1000research.com/articles/5-2122/v2). In this tutorial we will focus on using Seurat (V5) due to its comprehensive functionality to handle multimodal datasets and interoperability with other data formats. For larger datasets (>100k cells), we recommend using Scanpy to speed up the processing time.

In [1]:
## set up environment
suppressMessages({
library(scUnify)
setwd("/nemo/lab/caladod/working/Matthew/project/matthew/MH_GSE247917")})

“replacing previous import ‘cowplot::get_legend’ by ‘ggpubr::get_legend’ when loading ‘scUnify’”
“replacing previous import ‘cowplot::align_plots’ by ‘patchwork::align_plots’ when loading ‘scUnify’”
“replacing previous import ‘biomaRt::select’ by ‘rstatix::select’ when loading ‘scUnify’”
“replacing previous import ‘scales::viridis_pal’ by ‘viridis::viridis_pal’ when loading ‘scUnify’”


## Import CellRanger Outputs
Now we will import the outputs from cellranger-multi as a Seurat object. We will first need to specify a cellranger-multi output directory and a sample name for each sequencing run.

In [7]:
## store a list of 10x output directories as a vector & define sample names
dir <- "/nemo/lab/caladod/scratch/hungm/matthew/MH_GSE247917/cellranger/"
files = list.files(dir)
dir.list <- paste0(dir, files, "/outs/per_sample_outs/", files, "/count/sample_filtered_feature_bc_matrix/")
dir.list

Below is a wrapper function to make a list of Seurat objects from a list of specified cellranger-multi output directories, with sequencing names specified in the "samples" column in the Seurat object  metadata. Gene expression counts will be stored in the "RNA" assay of each object. Cells with < 200 nFeatures_RNA and genes expressed in < 3 cells will be pre-filtered.

If ADT & HTO library is present (in our case yes), we can specify the argument "adt = TRUE" and "hto = TRUE" to 1) separate ADT/HTO library from GEX and 2) separate HTO library from ADT. This should result in 2 extra assays ("ADT" and "HTO") for the Seurat objects.
  
:::{.callout-warning}
If HTO library is present, please make sure HTO feature names have the <u>same prefix</u> when running cellranger-multi to specify in the function below. Otherwise user have to manually separate HTO library from ADT.
:::

In [8]:
## build seurat object with HTO & ADT, specifying strings to separate HTO tag names from ADT tag names
obj_list <- create_seurat_object(dir = dirlist, samples = samples, hto_str = "anti-human_Hashtag_")

filtered_matrix_1 --- Loading Sample 1

Step 1 : Adding RNA counts



ERROR: Error in Read10X(dir[i]): Directory provided does not exist


Finally, a quick check if the Seurat objects are set up properly.

In [14]:
## View seurat object list
obj_list

$filtered_matrix_1
An object of class Seurat 
17028 features across 7502 samples within 3 assays 
Active assay: RNA (16828 features, 0 variable features)
 1 layer present: counts
 2 other assays present: HTO, ADT

$filtered_matrix_2
An object of class Seurat 
17301 features across 8227 samples within 3 assays 
Active assay: RNA (17101 features, 0 variable features)
 1 layer present: counts
 2 other assays present: HTO, ADT


In [15]:
## View metadata the first seurat object
head(obj_list[[1]])

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,nCount_HTO,nFeature_HTO,nCount_ADT,nFeature_ADT,samples
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<dbl>,<int>,<dbl>,<int>,<chr>
filtered_matrix_1_AAACCTGAGATGCCAG-1,SeuratProject,3172,1155,1018,6,1148,153,filtered_matrix_1
filtered_matrix_1_AAACCTGAGCAATATG-1,SeuratProject,2904,1206,127,6,1285,158,filtered_matrix_1
filtered_matrix_1_AAACCTGAGCCACCTG-1,SeuratProject,3384,1150,566,6,1983,161,filtered_matrix_1
filtered_matrix_1_AAACCTGAGGATCGCA-1,SeuratProject,2535,928,235,6,2667,168,filtered_matrix_1
filtered_matrix_1_AAACCTGAGGCTAGCA-1,SeuratProject,6264,1646,419,6,1915,166,filtered_matrix_1
filtered_matrix_1_AAACCTGCAAGTCTAC-1,SeuratProject,5994,1772,558,6,1904,164,filtered_matrix_1
filtered_matrix_1_AAACCTGCAGAGTGTG-1,SeuratProject,3223,907,90,6,735,152,filtered_matrix_1
filtered_matrix_1_AAACCTGCATCAGTCA-1,SeuratProject,1726,798,227,6,1007,158,filtered_matrix_1
filtered_matrix_1_AAACCTGCATGAACCT-1,SeuratProject,692,418,577,6,1412,156,filtered_matrix_1
filtered_matrix_1_AAACCTGCATTTGCCC-1,SeuratProject,5568,1594,152,6,3386,169,filtered_matrix_1


In [16]:
## View assays in first seurat object
for(a in names(obj_list[[1]]@assays)){
    print(obj_list[[1]][[a]])}

Assay (v5) data with 16828 features for 7502 cells
First 10 features:
 AL627309.1, AL669831.5, LINC00115, FAM41C, NOC2L, KLHL17, PLEKHN1,
AL645608.8, HES4, ISG15 
Layers:
 counts 
Assay (v5) data with 8 features for 7502 cells
First 8 features:
 anti-human-Hashtag-1-totalC, anti-human-Hashtag-2-totalC,
anti-human-Hashtag-3-totalC, anti-human-Hashtag-4-totalC,
anti-human-Hashtag-5-totalC, anti-human-Hashtag-6-totalC,
anti-human-Hashtag-7-totalC, anti-human-Hashtag-8-totalC 
Layers:
 counts 
Assay (v5) data with 192 features for 7502 cells
First 10 features:
 anti-human-CD80-totalC, anti-human-CD86-totalC,
anti-human-CD274-totalC, anti-human-CD273-totalC,
anti-human-CD275-totalC, anti-mouse-human-CD11b-totalC,
anti-human-CD252-totalC, anti-human-CD137L-totalC,
anti-human-CD155-totalC, anti-human-CD112-totalC 
Layers:
 counts 


## Session Info {.unnumbered}

In [19]:
## save output of the this session
qsave(obj_list, file = "seurat/1_process/GSE247917_raw.qs")

In [20]:
sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Rocky Linux 8.7 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /nemo/lab/caladod/working/Matthew/.conda/envs/seurat5/lib/libopenblasp-r0.3.23.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] qs_0.26.3                   viridis_0.6.4              
 [3] viridisLite_0.4.2           ggalluvial_0.12.5          
 [5] ggnewscale_0.4.9            ggrepel_0.9.4