Skip to content
This repository has been archived by the owner on Mar 9, 2023. It is now read-only.

Latest commit

 

History

History
118 lines (109 loc) · 4.51 KB

info_h5ad.md

File metadata and controls

118 lines (109 loc) · 4.51 KB

Some information on the .h5ad result file.

$ h5ls ./write/pmbc3k.h5ad 
X                        Dataset {2638, 1838}
obs                      Dataset {2638}
obsm                     Dataset {2638}
raw.X                    Group
raw.var                  Dataset {13714}
uns                      Group
var                      Dataset {1838}
varm                     Dataset {1838}

Note that the sparse raw data is stored in a group:

$ h5ls write/pmbc3k.h5ad/raw.X
data                     Dataset {2238732/Inf}
indices                  Dataset {2238732/Inf}
indptr                   Dataset {2639/Inf}

Note that the annotation of observations and variables are stored as structured arrays:

$ h5ls -v ./write/pmbc3k.h5ad/obs
Opened "./write/pmbc3k.h5ad" with sec2 driver.
obs                      Dataset {2638/2638}
    Location:  1:24557556
    Links:     1
    Chunks:    {330} 10890 bytes
    Storage:   87054 logical bytes, 42051 allocated bytes, 207.02% utilization
    Filter-0:  deflate-1 OPT {4}
    Type:      struct {
                   "index"            +0    16-byte null-padded ASCII string
                   "n_genes"          +16   native long
                   "percent_mito"     +24   native float
                   "n_counts"         +28   native float
                   "louvain"          +32   native signed char
               } 33 bytes

For categorical annotation, we store category labels (e.g., louvain_categories) and colors (e.g., louvain_colors) in the unstructured annotation. Also, we store parameters of and further unstructured annotation generated by each tools in a group named after the tool (louvain, neighbors, pca, rank_gene_groups):

$ h5ls write/pmbc3k.h5ad/uns
louvain                  Group
louvain_categories       Dataset {8}
louvain_colors           Dataset {8}
neighbors                Group
pca                      Group
rank_genes_groups        Group

The multi-dimensional annotation is treated in the same way.

$ h5ls -v ./write/pmbc3k.h5ad/obsm 
Opened "./write/pmbc3k.h5ad" with sec2 driver.
obsm                     Dataset {2638/2638}
    Location:  1:18063662
    Links:     1
    Chunks:    {83} 19256 bytes
    Storage:   612016 logical bytes, 575607 allocated bytes, 106.33% utilization
    Filter-0:  deflate-1 OPT {4}
    Type:      struct {
                   "X_pca"            +0    [50] native float
                   "X_tsne"           +200  [2] native double
                   "X_umap"           +216  [2] native double
               } 232 bytes

Here follows a summary of the whole content.

$ h5ls -r ./write/pmbc3k.h5ad 
/                        Group
/X                       Dataset {2638, 1838}
/obs                     Dataset {2638}
/obsm                    Dataset {2638}
/raw.X                   Group
/raw.X/data              Dataset {2238732/Inf}
/raw.X/indices           Dataset {2238732/Inf}
/raw.X/indptr            Dataset {2639/Inf}
/raw.var                 Dataset {13714}
/uns                     Group
/uns/louvain             Group
/uns/louvain/params      Group
/uns/louvain/params/random_state Dataset {1}
/uns/louvain/params/resolution Dataset {1}
/uns/louvain_categories  Dataset {8}
/uns/louvain_colors      Dataset {8}
/uns/neighbors           Group
/uns/neighbors/connectivities Group
/uns/neighbors/connectivities/data Dataset {42406/Inf}
/uns/neighbors/connectivities/indices Dataset {42406/Inf}
/uns/neighbors/connectivities/indptr Dataset {2639/Inf}
/uns/neighbors/distances Group
/uns/neighbors/distances/data Dataset {23742/Inf}
/uns/neighbors/distances/indices Dataset {23742/Inf}
/uns/neighbors/distances/indptr Dataset {2639/Inf}
/uns/neighbors/params    Group
/uns/neighbors/params/method Dataset {1}
/uns/neighbors/params/n_neighbors Dataset {1}
/uns/pca                 Group
/uns/pca/variance        Dataset {50}
/uns/pca/variance_ratio  Dataset {50}
/uns/rank_genes_groups   Group
/uns/rank_genes_groups/names Dataset {100}
/uns/rank_genes_groups/params Group
/uns/rank_genes_groups/params/groupby Dataset {1}
/uns/rank_genes_groups/params/method Dataset {1}
/uns/rank_genes_groups/params/reference Dataset {1}
/uns/rank_genes_groups/params/use_raw Dataset {1}
/uns/rank_genes_groups/scores Dataset {100}
/var                     Dataset {1838}
/varm                    Dataset {1838}

You might note that the neighborhood graph is stored in the unstructured annotation - however, anndata will still slice and recognize it. In the long run, we might have another field that treats n_observations x n_observations sparse matrices.