Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem reading h5 file #732

Closed
EmilyJBrown opened this issue Aug 23, 2018 · 10 comments
Closed

problem reading h5 file #732

EmilyJBrown opened this issue Aug 23, 2018 · 10 comments

Comments

@EmilyJBrown
Copy link

EmilyJBrown commented Aug 23, 2018

Hi there,

I'm trying to read an h5 file from a published data set (available on GEO accession GSM2561498), using the Read10X_h5 function, but keep getting the following error.

> ley.ctrl.data <- Read10X_h5('GSM2561498.h5')
Error in x$exists(name) : HDF5-API Errors:
    error #000: H5L.c in H5Lexists(): line 879: unable to get link info
        class: HDF5
        major: Symbol table
        minor: Object not found

    error #001: H5L.c in H5L__exists(): line 2962: path doesn't exist
        class: HDF5
        major: Symbol table
        minor: Object already exists

    error #002: H5Gtraverse.c in H5G_traverse(): line 867: internal path traversal failed
        class: HDF5
        major: Symbol table
        minor: Object not found

    error #003: H5Gtraverse.c in H5G_traverse_real(): line 594: can't look up component
        class: HDF5
        major: Symbol table
        minor: Object not found

    error #004: H5Gobj.c in H5G__obj_lookup(): line 1156: can't locate object
        class: HDF5
        major: Symbol table
        minor: Object not found

    error #005: H5Gstab.c in H5G__stab_lookup(): line 886: can't read message
        class: HDF5
        major: Symbol table
        minor: Unrecognized message

Any insight into what might be going on would be hugely helpful!

Thanks so much,
Emily

@andrewwbutler
Copy link
Collaborator

Hi Emily,

It looks like that file isn't consistent with 10X's documentation on how the H5 output file should be structured and therefore the Read10X_h5 function isn't going to work here. However, you can still read in the file with

library(hdf5r)
infile <- H5File$new("GSM2561498.h5")

Alternatively, you could try cellrangerRkit from 10X as they recommend on that documentation page.

@EmilyJBrown
Copy link
Author

EmilyJBrown commented Aug 24, 2018

Hi Andrew,

Thank you so much for the reply. Clearly, I am not very familiar with the h5 format. I was able to read in the file with the command you suggested above, but it is unclear to me where to go from here to create the Seurat object, or if that is even possible? I would very much like to continue using Seurat if possible, since I would like to use the RunMultiCCA as well as additional packages that require a Seurat object, but using infile as the raw.data for CreateSeuratObject yields the following error:

> library(hdf5r)
> infile <- H5File$new("GSM2561498.h5")
> library(Seurat)
> ley.ctrl <- CreateSeuratObject(raw.data = infile, project = "Ley.Ctrl", min.cells = 3, min.genes = 200)
Error in object.raw.data > is.expr : 
  comparison (6) is possible only for atomic and list types

Thank you so much for your help!
Emily

@andrewwbutler
Copy link
Collaborator

Hi Emily,

You need to convert the data in the H5 file into a matrix before passing that to CreateSeuratObject.

You can read a little more about how to use hdf5 files in R here. For specific details on that particular dataset, I would recommend emailing the contact on the GEO page as that gets a bit beyond the scope of Seurat.

@EmilyJBrown
Copy link
Author

Thanks so much for all your help, Andrew. I will follow up with the the authors if I can't get it to work on my own.

Thanks again,
Emily

@xizhihui
Copy link

xizhihui commented Dec 22, 2018

Hi, andrewwbutler.
I got an error when using Read10X_h5 to read the h5 file from the ouput of cellranger-3.0.0 count. The error told me that data["matrix/gene_names"] does not exist. And I found that gene_names in cellranger.h5 is the data["matrix/features/name"]. I didnt' test the data["matrix/genes"], but I think it won't be work either.
Below is the cellranger h5 data structures, according to the structure, neither "genes" nor "gene_names" will not be contained in cellranger h5 file. Am I right?

(root)
└── matrix [HDF5 group]
    ├── barcodes
    ├── data
    ├── indices
    ├── indptr
    ├── data
    ├── shape
    └── features [HDF5 group]
        ├─ _all_tag_keys
        ├─ feature_type
        ├─ genome
        ├─ id
        ├─ name
        ├─ pattern [Feature Barcoding only]
        ├─ read [Feature Barcoding only]
        └─ sequence [Feature Barcoding only]

@jbalberge
Copy link

Hi Emily, I got the same error while trying to read molecule_info.h5 files instead of gene barcodes matrices. You can re-generate gene-barcode matrices with the cellranger aggr command.
JB

@romanhaa
Copy link

romanhaa commented Sep 25, 2019

I just faced the same issue and came up with this solution. Maybe this will help anybody even though the same function in Seurat v3 works fine for me.

h5_data <- hdf5r::H5File$new('filtered_feature_bc_matrix.h5', mode = 'r')

feature_matrix <- Matrix::sparseMatrix(
  i = h5_data[['matrix/indices']][],
  p = h5_data[['matrix/indptr']][],
  x = h5_data[['matrix/data']][],
  dimnames = list(
    h5_data[['matrix/features/name']][],
    h5_data[['matrix/barcodes']][]
  ),
  dims = h5_data[['matrix/shape']][],
  index1 = FALSE
)

@huwenhuo
Copy link

huwenhuo commented Feb 25, 2023

I got the same error from some old data. This is a totally different data format, I don't think romanhaa's comment solved this. Here is what I got by "h5dump -n molecule_info.h5" . 10x has this document page: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/molecule_info

HDF5 "molecule_info.h5" {
FILE_CONTENTS {
 group      /
 dataset    /barcode_idx
 group      /barcode_info
 dataset    /barcode_info/genomes
 dataset    /barcode_info/pass_filter
 dataset    /barcodes
 dataset    /count
 dataset    /feature_idx
 group      /features
 dataset    /features/_all_tag_keys
 dataset    /features/feature_type
 dataset    /features/genome
 dataset    /features/id
 dataset    /features/name
 dataset    /gem_group
 dataset    /library_idx
 dataset    /library_info
 group      /metrics
 dataset    /umi
 }
}

I am not sure this is the best solution, but this works

tmp = DropletUtils::read10xMolInfo('molecule_info.h5')
mtx = DropletUtils::makeCountMatrix(tmp$data$gene, tmp$data$cell, value = tmp$data$reads)
dimnames(mtx) = list(tmp$genes[tmp$data$gene], tmp$cell)
mtx[1:3, 1:10]

@suanzaoren
Copy link

suanzaoren commented Mar 10, 2023

Hi Emily,Have you solved your problem? Have you been able to convert h5 files into Seurat objects? I have the same problem as you,I hope you could give me some advice.
Thanks,
Zerun Song

@annaborchers
Copy link

I got the same error from some old data. This is a totally different data format, I don't think romanhaa's comment solved this. Here is what I got by "h5dump -n molecule_info.h5" . 10x has this document page: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/molecule_info

HDF5 "molecule_info.h5" {
FILE_CONTENTS {
 group      /
 dataset    /barcode_idx
 group      /barcode_info
 dataset    /barcode_info/genomes
 dataset    /barcode_info/pass_filter
 dataset    /barcodes
 dataset    /count
 dataset    /feature_idx
 group      /features
 dataset    /features/_all_tag_keys
 dataset    /features/feature_type
 dataset    /features/genome
 dataset    /features/id
 dataset    /features/name
 dataset    /gem_group
 dataset    /library_idx
 dataset    /library_info
 group      /metrics
 dataset    /umi
 }
}

I am not sure this is the best solution, but this works

tmp = DropletUtils::read10xMolInfo('molecule_info.h5')
mtx = DropletUtils::makeCountMatrix(tmp$data$gene, tmp$data$cell, value = tmp$data$reads)
dimnames(mtx) = list(tmp$genes[tmp$data$gene], tmp$cell)
mtx[1:3, 1:10]

Hi there, tmp and mtx worked for me but when I try the third line starting with dimnames(mtx) I get this error:
Error in fixupDN.if.valid(value, x@Dim) :
length of Dimnames[[1]] (44260822) is not equal to Dim[1] (36613)

Any idea how to fix? Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants