# Seurat integration of PB HSC/MPP cells

In [1]:
library(Seurat)
library(future)
library(anndata)
library(batchelor)
library(scater)

Registered S3 method overwritten by 'spatstat.geom':
  method     from
  print.boxx cli 

Attaching SeuratObject

Loading required package: SingleCellExperiment

Loading required package: SummarizedExperiment

Loading required package: MatrixGenerics

Loading required package: matrixStats


Attaching package: 'MatrixGenerics'


The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowD

In [2]:
plan(strategy = "multicore", workers = 4)

options(future.globals.maxSize = +Inf)

In [3]:
data.path = '2021/BloodPaper/h5ad/'

infile <- paste0(data.path, '20211203_COMBO10_PB_HSCMPP_counts.h5ad')

In [5]:
Sys.time()
data = read_h5ad(infile)
Sys.time()

[1] "2021-12-03 14:09:33 GMT"

[1] "2021-12-03 14:09:39 GMT"

In [6]:
data

AnnData object with n_obs <U+00D7> n_vars = 14978 <U+00D7> 24310
    obs: 'batch', 'n_counts', 'n_genes', 'mitoc_fraction', 'doublet_score', 'library', 'donor', 'organ', 'leiden.1.2', 'silhouette.1.2', 'dpt_pseudotime', 'dpt_pseudotime_rank', 'S_score', 'G2M_score', 'phase', 'annot'
    var: 'gene_ids', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'dex_leiden_1_2', 'diffmap_evals', 'donor_colors', 'draw_graph', 'hvg', 'iroot', 'leiden', 'leiden.1.2_colors', 'log1p', 'neighbors', 'phase_colors'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap'
    layers: 'counts', 'lognorm'
    obsp: 'connectivities', 'distances'

In [7]:
Sys.time()

obj <- SingleCellExperiment( assays = List(counts = as(t(data$X), "CsparseMatrix")),
                              colData=data$obs,
                              rowData=data$var
                           )

Sys.time()

[1] "2021-12-03 14:10:07 GMT"

[1] "2021-12-03 14:10:14 GMT"

In [8]:
Sys.time()
obj = as.Seurat(obj, data = 'counts') 
Sys.time()

[1] "2021-12-03 14:10:19 GMT"

"Feature names cannot have underscores ('_'), replacing with dashes ('-')"
"Feature names cannot have underscores ('_'), replacing with dashes ('-')"


[1] "2021-12-03 14:10:20 GMT"

In [9]:
obj

An object of class Seurat 
24310 features across 14978 samples within 1 assay 
Active assay: originalexp (24310 features, 0 variable features)

---

---

In [13]:
data.rds = '2021/BloodPaper/rds/'

In [14]:
Sys.time()
saveRDS(file = paste0(data.rds, '20211203_COMBO_only_PB_HSCMPP_filtered_counts_Seurat3_input_obj.rds'), obj)
Sys.time()

[1] "2021-12-03 14:11:40 GMT"

[1] "2021-12-03 14:12:06 GMT"

In [15]:
rm(data)
rm(obj)

In [18]:
data.path = '2021/BloodPaper/'

In [19]:
data <- readRDS(file = paste0(data.path, 'rds/20211203_COMBO_only_PB_HSCMPP_filtered_counts_Seurat3_input_obj.rds') ) 


data.list <- SplitObject(object = data, split.by = "donor")


for (i in 1:length(x = data.list)) {
    data.list[[i]] <- NormalizeData(object = data.list[[i]], verbose = FALSE)
    data.list[[i]] <- FindVariableFeatures(object = data.list[[i]], 
        selection.method = "vst", nfeatures = 3000, verbose = FALSE)
}

reference.list <- data.list[ c("DOD1", "DOD2", "TQ198", "BP62j",
                               "BP37d", "BP74", "BP1c", "BP59h") ]

In [20]:
# find anchors                               
Sys.time()
data.anchors <- FindIntegrationAnchors(object.list = reference.list, dims = 1:15
                                        )
Sys.time()

[1] "2021-12-03 14:17:25 GMT"

Computing 2000 integration features

Scaling features for provided objects

Finding all pairwise anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 4868 anchors

Filtering anchors

	Retained 1505 anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 3798 anchors

Filtering anchors

	Retained 1697 anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 4719 anchors

Filtering anchors

	Retained 2048 anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 4761 anchors

Filtering anchors

	Retained 1346 anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 7859 anchors

Filtering anchors

	Retained 2249 anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 4570 anchors

Filtering anchors

	Retained 1745 anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 4212 anchors


[1] "2021-12-03 14:21:11 GMT"

In [21]:
saveRDS(file = paste0(data.path, 'rds/20211203_COMBO_only_PB_HSCMPP_Seurat3_VST_classic_anchors.rds'), data.anchors)        

In [22]:
# integrate features
Sys.time()
data.integrated <- IntegrateData(anchorset = data.anchors, dims = 1:15)
Sys.time()

[1] "2021-12-03 14:35:10 GMT"

Merging dataset 5 into 4

Extracting anchors for merged samples

Finding integration vectors

Finding integration vector weights

Integrating data

Merging dataset 3 into 6

Extracting anchors for merged samples

Finding integration vectors

Finding integration vector weights

Integrating data

Merging dataset 6 3 into 2

Extracting anchors for merged samples

Finding integration vectors

Finding integration vector weights

Integrating data

Merging dataset 8 into 7

Extracting anchors for merged samples

Finding integration vectors

Finding integration vector weights

Integrating data

Merging dataset 1 into 2 6 3

Extracting anchors for merged samples

Finding integration vectors

Finding integration vector weights

Integrating data

Merging dataset 4 5 into 2 6 3 1

Extracting anchors for merged samples

Finding integration vectors

Finding integration vector weights

Integrating data

Merging dataset 7 8 into 2 6 3 1 4 5

Extracting anchors for merged samples

Finding integration v

[1] "2021-12-03 14:36:34 GMT"

In [23]:
saveRDS(file = paste0(data.path, 'rds/20211203_COMBO_only_PB_HSCMPP_Seurat3_VST_classic_integrated.rds'), data.integrated)

In [24]:
#Making sure cell/gene orders are the same
data.integrated = data.integrated[row.names(data), colnames(data)]

In [29]:
#Convertin to AnnData
x = data.integrated@assays$integrated@data
x = AnnData(X = t(x),
            obs = data.integrated@meta.data
           )

In [30]:
# Export/save anndata

outfile <-  paste0(data.path, 'h5ad/20211203_COMBO_only_PB_HSCMPP_Seurat3_VST_classic_integrated.h5ad')     

Sys.time()
x$write_h5ad(outfile, compression = 'lzf')
Sys.time()

[1] "2021-12-03 14:48:07 GMT"

None

[1] "2021-12-03 14:48:09 GMT"