# Convert files from R seurat to Python scanpy

In [1]:
suppressPackageStartupMessages({
    library(Seurat)
    library(SeuratDisk)
    
})
options(warn=-1)

set.seed(23)


In [19]:
# load RNA (unprocessed is better to avoid dispairment of samples)
data <- Read10X("data/ari_org/aggr/outs/count/filtered_feature_bc_matrix/")
so <- CreateSeuratObject(counts = data, project = "ari_org", min.cells = 3, min.features = 200)
so

An object of class Seurat 
30279 features across 19401 samples within 1 assay 
Active assay: RNA (30279 features, 0 variable features)

In [21]:
# save the data as H5seurat and then convert to H5ad
SaveH5Seurat(so, filename = "data/ari_org/ari_org_aggr_raw.H5Seurat", overwrite = TRUE)
Convert("data/ari_org/ari_org_aggr_raw.H5Seurat", dest = "data/ari_org/ari_org_aggr_raw.h5ad", overwrite = TRUE)

Creating h5Seurat file for version 3.1.5.9900

Adding counts for RNA

Adding data for RNA

No variable features found for RNA

No feature-level metadata found for RNA

Validating h5Seurat file

Adding data from RNA as X

Adding counts from RNA as raw

Transfering meta.data to obs



In [10]:
# Read and convert the integrated object
so.combined.sct <- readRDS(file = "data/ari_org/ari_org_integrated.rds")

In [12]:
DefaultAssay(object = so.combined.sct) <- "RNA"

In [13]:
# Save to h5ad for scanpy/scVelo/etc

so.combined.sct
SaveH5Seurat(so.combined.sct, filename = "data/ari_org/ari_org.integrated.h5Seurat", overwrite = TRUE)

An object of class Seurat 
81018 features across 17537 samples within 3 assays 
Active assay: RNA (29404 features, 0 variable features)
 2 other assays present: SCT, integrated
 3 dimensional reductions calculated: pca, umap, tsne

Creating h5Seurat file for version 3.1.5.9900

Adding counts for RNA

Adding data for RNA

No variable features found for RNA

No feature-level metadata found for RNA

Adding counts for SCT

Adding data for SCT

Adding scale.data for SCT

No variable features found for SCT

No feature-level metadata found for SCT

Writing out SCTModel.list for SCT

Adding data for integrated

Adding scale.data for integrated

Adding variable features for integrated

No feature-level metadata found for integrated

Writing out SCTModel.list for integrated

Adding cell embeddings for pca

Adding loadings for pca

No projected loadings for pca

Adding standard deviations for pca

No JackStraw data for pca

Adding cell embeddings for umap

No loadings for umap

No projected loadings for umap

No standard deviations for umap

No JackStraw data for umap

Adding cell embeddings for tsne

No loadings for tsne

No projected loadings for tsne

No standard deviations for tsne

No JackStraw data for tsne



In [15]:
# convert to H5ad
Convert("data/ari_org/ari_org.integrated.h5Seurat", dest = "data/ari_org/ari_org.integrated.h5ad", overwrite = TRUE)

Validating h5Seurat file

Adding data from RNA as X

Adding counts from RNA as raw

Transfering meta.data to obs

Adding dimensional reduction information for tsne (global)

Adding dimensional reduction information for umap (global)



In [2]:
# Read and convert the integrated object
ari_annot.sct <- readRDS(file = "data/ari_org/ari_org_annotated.rds")

In [3]:
DefaultAssay(object = ari_annot.sct) <- "RNA"
table(ari_annot.sct@meta.data$celltype)


           NH         LGR5+       Surface       Stromal     Glandular 
         4009          2389          2710          1657          1654 
      PV-like       Luminal Proliferative      Ciliated            10 
         1387          1262           816           682           426 
         Endo            13 
          336           209 

In [4]:
ari_annot.sct$cell_id <- ari_annot.sct$celltype

In [5]:
# cell type was factor, and it was converted to number in scanpy. so need to make char
ari_annot.sct$celltype <- as.character(ari_annot.sct$celltype)
class(ari_annot.sct$celltype)

In [6]:
table(ari_annot.sct$celltype)


           10            13      Ciliated          Endo     Glandular 
          426           209           682           336          1654 
        LGR5+       Luminal            NH Proliferative       PV-like 
         2389          1262          4009           816          1387 
      Stromal       Surface 
         1657          2710 

In [7]:
# Save to h5ad for scanpy/scVelo/etc

ari_annot.sct
SaveH5Seurat(ari_annot.sct, filename = "data/ari_org/ari_org.annotated.h5Seurat", overwrite = TRUE)

An object of class Seurat 
81018 features across 17537 samples within 3 assays 
Active assay: RNA (29404 features, 0 variable features)
 2 other assays present: SCT, integrated
 3 dimensional reductions calculated: pca, umap, tsne

Creating h5Seurat file for version 3.1.5.9900

Adding counts for RNA

Adding data for RNA

No variable features found for RNA

No feature-level metadata found for RNA

Adding counts for SCT

Adding data for SCT

Adding scale.data for SCT

No variable features found for SCT

No feature-level metadata found for SCT

Writing out SCTModel.list for SCT

Adding data for integrated

Adding scale.data for integrated

Adding variable features for integrated

No feature-level metadata found for integrated

Writing out SCTModel.list for integrated

Adding cell embeddings for pca

Adding loadings for pca

No projected loadings for pca

Adding standard deviations for pca

No JackStraw data for pca

Adding cell embeddings for umap

No loadings for umap

No projected loadings for umap

No standard deviations for umap

No JackStraw data for umap

Adding cell embeddings for tsne

No loadings for tsne

No projected loadings for tsne

No standard deviations for tsne

No JackStraw data for tsne



In [8]:
# convert to H5ad
Convert("data/ari_org/ari_org.annotated.h5Seurat", dest = "data/ari_org/ari_org.annotated.h5ad", overwrite = TRUE)

Validating h5Seurat file

Adding data from RNA as X

Adding counts from RNA as raw

Transfering meta.data to obs

Adding dimensional reduction information for tsne (global)

Adding dimensional reduction information for umap (global)



In [9]:
sessionInfo()

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SeuratDisk_0.0.0.9020 SeuratObject_4.1.3    Seurat_4.3.0         

loaded via a namespace (and not attached):
  [1] Rtsne_0.16             colorspace_2.1-0       deldir_1.0-6          
  [4] ellipsis_0.3.2         ggridges_0.5.4         IRdisplay_1.1         
  [7] base64enc_0.1-3        spatstat.data_3.