---
# Dataset Formatting: Ma-LIHC
*L.Richards*  
*2020-06-07*  
*/cluster/projects/pughlab/projects/cancer_scrna_integration/data/Ma-LIHC/*  

---

Format Ma-LIHC dataset. This will be input for data integration tools.

In [None]:
library(Seurat) #v4.0.1
library(earlycross) # v0.1

---
## 1.0 Format downloaded public data
---

Downloaded data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125449

>"	A total of 19 tumors were profiled. Set 1 contains scRNA-seq data of twelve samples, i.e., S16_P10_LCP18, S02_P01_LCP21, S10_P05_LCP23, S09_P04_LCP25, S08_P03_LCP26, S07_P02_LCP28, S11_P06_LCP29, S12_P07_LCP30, S20_P12_LCP35, S21_P13_LCP37, S15_P09_LCP38, and S19_P11_LCP39. Set 2 includes scRNA-seq data of seven samples, i.e., S351_P10_LCP34, S355_P13_LCP42, S358_P16_LCP46, S305_P06_LCP56, S300_P02_LCP60, 364_P21_LCP65, and S365_P22_LCP66. "

In [None]:
### Set 1

# read and format files from GEO accession
meta <- read.table("./original-data/Set1/GSE125449_Set1_samples.txt.gz",
                   sep = "\t",
                   header = T
                  )
rownames(meta) <- meta$Cell.Barcode

counts <- Read10X("./original-data/Set1/")

# combine into a seurat object
set1 <- CreateSeuratObject(counts = counts, 
                          meta.data = meta
                         )


In [None]:
### Set 1

# read and format files from GEO accession
meta <- read.table("./original-data/Set2/GSE125449_Set2_samples.txt.gz",
                   sep = "\t",
                   header = T
                  )
rownames(meta) <- meta$Cell.Barcode

counts <- Read10X("./original-data/Set2/")

# combine into a seurat object
set2 <- CreateSeuratObject(counts = counts, 
                          meta.data = meta
                         )


In [None]:
### COMBINE Set1 & Set2
combo <- merge(set1, y = set2)

# save seurat object
saveRDS(combo, file = "Ma-LIHC_seurat.rds")

---
## 2.0 Output files in 10x common format
---

Output counts matrix with 10x/CellRanger formatting style and metadata csv.

In [None]:
# save metadata as csv file
meta <- data.frame(combo@meta.data)
write.csv(meta, file = "Ma-LIHC_meta.csv")

In [None]:
# export count matrix as default 10x CellRanger output
Write10X(combo, dir = "./")

---