In [9]:
library(doParallel)
library(FigR)
library(BSgenome.Hsapiens.UCSC.hg38)
library(SingleCellExperiment)

options("optmatch_max_problem_size" = Inf)

In [2]:
# Load Data
RNAmat <- readRDS("/home/lorna/Desktop/pituunpaired/exprMat.rds")
ATAC.se <- readRDS("/home/lorna/Desktop/pituunpaired/atac.se.rds")
CCA_PCs <- readRDS("/home/lorna/Desktop/pituunpaired/cca.rds")


In [6]:
# Identify ATAC and RNA cells
isATAC <- grepl("_2",rownames(CCA_PCs))
table(isATAC) # ATAC vs RNA

isATAC
FALSE  TRUE 
23542 10128 

In [7]:
# Pair cells with FigR 
ATAC_PCs <- CCA_PCs[isATAC,]
RNA_PCs <- CCA_PCs[!isATAC,]

In [10]:
pairing <- pairCells(ATAC = ATAC_PCs,
            RNA = RNA_PCs,
            keepUnique = TRUE)

Running geodesic pairing in after chunking data ..
Number of cells in bigger dataset:  23542 
Number of cells in smaller dataset:  10128 
Difference in cell count between 2 datasets:  13414 
Chunking larger dataset to match smaller datset ..
Chunk size n =  10128  cells
Total number of chunks:  3 

Chunk #  1 
No. cells in chunk:  10128 



Constructing KNN graph for computing geodesic distance ..
Computing graph-based geodesic distance ..
# KNN subgraphs detected:
 6 
Skipping subgraphs with either ATAC/RNA cells fewer than:  50  ..
Pairing cells for subgraph No. 1 
Total ATAC cells in subgraph:  9593 
Total RNA cells in subgraph:  9412 
Subgraph size:  9412 
Search threshold being used:  3765 
[1] "Constructing KNN based on geodesic distance to reduce search pairing search space"
[1] "Number of cells being paired: 9412 ATAC and 9412  RNA cells"
Determing pairs through optimized bipartite matching ..
Assembling pair list ..
Finished!
Pairing cells for subgraph No. 2 
Total ATAC cells 

Insufficient number of cells in subgraph. Skipping current subgraph



Pairing cells for subgraph No. 3 
Total ATAC cells in subgraph:  132 
Total RNA cells in subgraph:  190 
Subgraph size:  132 
Search threshold being used:  53 
[1] "Constructing KNN based on geodesic distance to reduce search pairing search space"
[1] "Number of cells being paired: 132 ATAC and 132  RNA cells"
Determing pairs through optimized bipartite matching ..
Assembling pair list ..
Finished!
Pairing cells for subgraph No. 4 
Total ATAC cells in subgraph:  174 
Total RNA cells in subgraph:  216 
Subgraph size:  174 
Search threshold being used:  70 
[1] "Constructing KNN based on geodesic distance to reduce search pairing search space"
[1] "Number of cells being paired: 174 ATAC and 174  RNA cells"
Determing pairs through optimized bipartite matching ..
Assembling pair list ..
Finished!
Pairing cells for subgraph No. 5 
Total ATAC cells in subgraph:  74 
Total RNA cells in subgraph:  160 
Subgraph size:  74 
Search threshold being used:  30 
[1] "Constructing KNN based on geodesi

Insufficient number of cells in subgraph. Skipping current subgraph



Pairing cells for subgraph No. 3 
Total ATAC cells in subgraph:  30 
Total RNA cells in subgraph:  46 
Subgraph size:  30 
Search threshold being used:  12 


Insufficient number of cells in subgraph. Skipping current subgraph



Pairing cells for subgraph No. 4 
Total ATAC cells in subgraph:  49 
Total RNA cells in subgraph:  69 
Subgraph size:  49 
Search threshold being used:  20 


Insufficient number of cells in subgraph. Skipping current subgraph



Pairing cells for subgraph No. 5 
Total ATAC cells in subgraph:  32 
Total RNA cells in subgraph:  36 
Subgraph size:  32 
Search threshold being used:  13 


Insufficient number of cells in subgraph. Skipping current subgraph



Pairing cells for subgraph No. 6 
Total ATAC cells in subgraph:  32 
Total RNA cells in subgraph:  78 
Subgraph size:  32 
Search threshold being used:  13 


Insufficient number of cells in subgraph. Skipping current subgraph



Further filtering out ATAC-RNA cell pairs based on the mininum euclidean distance ..


Barcodes in the larger dataset will now be unique ..




In [11]:
pairing

ATAC,RNA,dist
<chr>,<chr>,<dbl>
GAGGTCCTCACTGATG-1_2,TCTCTGGGTCCACGCA-2_1,0.06732072
CAAGCTATCTGCTACC-1_2,GGTGTTAGTTCGAACT-2_1,0.05652161
ATCGAGTGTTCAGTTG-1_2,CGCATAACACGCACCA-2_1,0.05359401
CATGTTTCAAGGACCA-1_2,ACATCCCCAGGCATTT-1_1,0.05994139
TGCTCACTCTGGAAGG-1_2,CAAGGGATCAGCATTG-2_1,0.05932020
CGATGATGTTTGCCAA-1_2,TGAGTCACACGCGCTA-2_1,0.06410813
AGGCGAACAAGCCAGA-1_2,ACTGTCCCATCTCAAG-2_1,0.04633421
CCTTGCATCATAGGTC-1_2,TAGGAGGTCTCTTCAA-2_1,0.05813438
TCGATTTGTAAGGTCG-1_2,CCCGGAAGTGAGTTTC-1_1,0.06290961
CACTGAATCCACGGCA-1_2,CGCATGGCAAGGCTTT-1_1,0.05129992


In [12]:
# Filter paired object
pairing <- pairing[order(pairing$dist, decreasing = FALSE), ]
pairingClean <- pairing[!duplicated(pairing$ATAC),]
pairingClean <- pairingClean[c("ATAC","RNA")]
pairingClean <- data.frame(lapply(pairingClean, function(x) {gsub("_.*", "", x)}))

In [13]:
pairingClean

ATAC,RNA
<chr>,<chr>
TCTATTGGTATACGCT-1,ATTGGGTCACTTCAGA-1
AACCGATAGTCATACC-1,TGGGCTGTCTTACCAT-1
TTCATTGAGATGTTGA-1,ACAGCCGTCTCGGCTT-2
GCGGAAACATAGCAGG-1,AGCTTCCGTGATTCAC-1
GACTAACGTAACCCAT-1,ATTGTTCAGTAGGGTC-1
GTGTCCTCAAGCCTTA-1,CCACACTCACTGGACC-1
TTGTCTATCCATCATT-1,GAATCACAGCTGACTT-1
GTTGGGCCAATTGGCT-1,TTCGATTTCGGAGCAA-2
GTTATGGAGACTAGCG-1,CTAACCCAGGAGGGTG-1
TAATCGGGTAACCGAG-1,CTGTGAATCGGAACTT-2


In [14]:
write.csv(pairingClean, "/home/lorna/Desktop/pituunpaired/barMap.csv", row.names = FALSE)