# Make Arrow Files

Process raw data to make Arrow files.

In [6]:
library(ArchR)

In [7]:
set.seed(1)
addArchRThreads(threads = 48) 

Setting default number of Parallel threads to 48.



In [8]:
addArchRGenome("hg38")

Setting default genome to Hg38.



In [9]:
FRAG_BASE = "/oak/stanford/groups/akundaje/surag/projects/scATAC-reprog/singlecell/chromap/outputs/"
frag.files =  list.files(FRAG_BASE, pattern="*gz$")
sample.names = lapply(strsplit(frag.files, "\\."), "[[", 1)
frag.files = paste(FRAG_BASE, frag.files, sep='')
names(frag.files) = sample.names

frag.files

In [11]:
ArrowFiles <- createArrowFiles(
  inputFiles = frag.files,
  sampleNames = names(frag.files),
  filterTSS = 4, #Dont set this too high because you can always increase later
  filterFrags = 1000, 
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

filterFrags is no longer a valid input. Please use minFrags! Setting filterFrags value to minFrags!

filterTSS is no longer a valid input. Please use minTSS! Setting filterTSS value to minTSS!

Using GeneAnnotation set by addArchRGenome(Hg38)!

Using GeneAnnotation set by addArchRGenome(Hg38)!

ArchR logging to : ArchRLogs/ArchR-createArrows-1470b4757495de-Date-2021-07-13_Time-22-21-52.log
If there is an issue, please report to github with logFile!

Cleaning Temporary Files

2021-07-13 22:21:53 : Batch Execution w/ safelapply!, 0 mins elapsed.

ArchR logging successful to : ArchRLogs/ArchR-createArrows-1470b4757495de-Date-2021-07-13_Time-22-21-52.log



In [13]:
ArrowFiles

In [14]:
# rds to tsv
for (x in c("D0",
"D2",
"D4",
"D6",
"D8",
"D10",
"D12",
"D14",
"iPSC")) {
    r = readRDS(sprintf("./QualityControl/%s/%s-Pre-Filter-Metadata.rds", x, x))
    write.table(r, sprintf("./QualityControl/%s/%s-Pre-Filter-Metadata.tsv", x, x), sep='\t', row.names=F, quote=F)
}

In [15]:
doubScores <- addDoubletScores(
  input = ArrowFiles,
  k = 10, #Refers to how many cells near a "pseudo-doublet" to count.
  knnMethod = "UMAP", #Refers to the embedding to use for nearest neighbor search.
  LSIMethod = 1
)

ArchR logging to : ArchRLogs/ArchR-addDoubletScores-1470b444919248-Date-2021-07-13_Time-22-55-48.log
If there is an issue, please report to github with logFile!

2021-07-13 22:55:49 : Batch Execution w/ safelapply!, 0 mins elapsed.

2021-07-13 22:55:49 : iPSC (1 of 9) :  Computing Doublet Statistics, 0.001 mins elapsed.

Filtering 1 dims correlated > 0.75 to log10(depth + 1)

iPSC (1 of 9) : UMAP Projection R^2 = 0.85733

iPSC (1 of 9) : UMAP Projection R^2 = 0.85733

2021-07-13 23:01:40 : D8 (2 of 9) :  Computing Doublet Statistics, 5.848 mins elapsed.

D8 (2 of 9) : UMAP Projection R^2 = 0.98805

D8 (2 of 9) : UMAP Projection R^2 = 0.98805

2021-07-13 23:06:39 : D6 (3 of 9) :  Computing Doublet Statistics, 10.836 mins elapsed.

D6 (3 of 9) : UMAP Projection R^2 = 0.9915

D6 (3 of 9) : UMAP Projection R^2 = 0.9915

2021-07-13 23:11:47 : D4 (4 of 9) :  Computing Doublet Statistics, 15.977 mins elapsed.

D4 (4 of 9) : UMAP Projection R^2 = 0.99617

D4 (4 of 9) : UMAP Projection R^2 = 0.

In [16]:
# moved files manually (krishna)
paste("/srv/scratch/surag/scATAC-reprog/arrow/", ArrowFiles, sep='')

---

In [17]:
sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS/LAPACK: /users/surag/anaconda3/envs/r36_cran/lib/libopenblasp-r0.3.9.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] gridExtra_2.3                     nabor_0.5.0                      
 [3] Seurat_3.1.5                      BSgenome.Hsapiens.UCSC.hg38_1.4.1
 [5] BSgenome_1.54.0                   rtracklayer_1.46.0               
 [7] Biostrings_2.54.0                 XVect