Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read error with R's rhdf5? #171

Closed
fruce-ki opened this issue May 4, 2018 · 8 comments
Closed

Read error with R's rhdf5? #171

fruce-ki opened this issue May 4, 2018 · 8 comments

Comments

@fruce-ki
Copy link

fruce-ki commented May 4, 2018

Hello,

I am trying to parse the abundance.h5 into an R session.

Using h5dump('abundance.h5') from the rhdf5 package, all I get is multiple rows of this:

Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.

followed by the structure of the Kallisto output but with all elements containing nothing but NULL.

So I thought I'd try Kallisto's built-in parser: kallisto h5dump abundance.h5 -o ./test and surely enough I get a new directory full of bootstrap files in plaintext that all look perfectly fine, so the data is definitely in there.

The R function definitely works with .h5 files from a different source so I'm working on the assumption that the rhdf5 package is less likely to be the one at fault.
Is there any chance that the h5 created by kallisto is somehow not correct/standard and that your own parser simply knows how to get around that?

This is all on Linux with up-to-date R and packages, as well as the latest Kallisto (0.44).

Cheers!

@pmelsted
Copy link
Contributor

pmelsted commented May 8, 2018

We use the standard hdf5 libraries for writing and reading the h5 files, most likely the same libraries used in R. I'd be happy to test the h5 files to see if there is something weird in reading them in R.

@fruce-ki
Copy link
Author

fruce-ki commented May 8, 2018

Thanks for getting back to me!
It seems I managed to break the import of .h5 files from my other source as well and it seems to be connected to updating certain R packages, although I have not managed to pinpoint what or when. Given the new developments, it's probably safe to say the problem is not originating with kallisto.
I need to poke around a bit more first, but I may take you up on that offer a bit later.

EDIT: Turns out this was unrelated to the issue reported here. An update to data.table affected some of my type casts. The original issue persists despite addressing this problem.

@fruce-ki
Copy link
Author

fruce-ki commented May 9, 2018

I've been able to confirm that the .h5 parses just fine on R 3.4.1 but fails on R 3.4.3 (linux only, parses fine on OSX R 3.4.3), despite identical rhdf5 package versions (2.22.0). It seems that whatever the problem is, it is deeply rooted in core R.

However, on the same systems, the issue only occurs with kallisto .h5 files, whereas .h5 files created by the wasabi package parse fine on both systems.

So clearly there IS something about the kallisto version of abundance.h5 files that is different from the wasabi version of abundance.h5 and that difference upsets (some?) newer versions of R.

@pmelsted
Copy link
Contributor

pmelsted commented May 9, 2018

ouch, closing this issue for now.

@kevinblighe
Copy link

kevinblighe commented Nov 16, 2019

I am posting this here for anybody else who arrives at this thread as a result of the same problem. I have been getting the exact same error message as per fruce-ki (as elaborated here: grimbough/rhdf5#46 (comment)), namely:

Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, 
  HDF5. Dataset. Read failed.

I have solved the issue by setting up a conda environment for Kallisto, which has the following installed packages:

    hdf5-1.10.5                |       nompi_h3c11f04_1104         3.1 MB   conda-forge
    kallisto-0.46.0            |       h4f7b962_1                  532 KB   bioconda

I had to re-generate the h5 files with Kallisto in this environment, and then read them into R via tximport (R also in conda). R sessionInfo():

R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03

Matrix products: default
BLAS/LAPACK: /apps/users/user2004/.conda/envs/R/lib/libopenblasp-r0.3.7.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.30.0       tximport_1.14.0    readr_1.3.1        data.table_1.12.2 
[5] RColorBrewer_1.1-2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3      zeallot_0.1.0   crayon_1.3.4    R6_2.4.1       
 [5] backports_1.1.5 pillar_1.4.2    rlang_0.4.1     vctrs_0.2.0    
 [9] Rhdf5lib_1.8.0  hms_0.5.2       compiler_3.6.1  pkgconfig_2.0.3
[13] tibble_2.1.3

This does not explain where exactly was the problem, but provides a solution for others.



Further update:

The problem appears to be specifically related to Kallisto and how it generates the h5 files under different system configurations / settings.

@Aynur31
Copy link

Aynur31 commented Mar 17, 2021

Hello,
İ am having the same issue. Please help. İ greatly appreciate any advice.
İ followed the solution above , reinstalled Kallisto using Conda, but İ still get the same error.

so <- sleuth_prep(s2c, ~ condition)
reading in kallisto results
dropping unused factor levels
....
normalizing est_counts
51272 targets passed the filter
normalizing tpm
merging in metadata
Error in H5Fopen(file, "H5F_ACC_RDONLY") :
HDF5. File accessability. Unable to open file.
In addition: Warning message:
In check_num_cores(num_cores) :
It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.``

Please help.
,` sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] sleuth_0.30.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.5 purrr_0.3.4
[6] readr_1.4.0 tidyr_1.1.3 tibble_3.1.0 ggplot2_3.3.3 tidyverse_1.3.0
[11] biomaRt_2.34.2

loaded via a namespace (and not attached):
[1] Biobase_2.38.0 httr_1.4.2 bit64_4.0.5
[4] jsonlite_1.7.2 modelr_0.1.8 assertthat_0.2.1
[7] BiocManager_1.30.10 stats4_3.4.2 blob_1.2.1
[10] cellranger_1.1.0 yaml_2.2.1 remotes_2.2.0
[13] progress_1.2.2 pillar_1.5.1 RSQLite_2.2.4
[16] backports_1.2.1 glue_1.4.2 rvest_1.0.0
[19] colorspace_2.0-0 XML_3.99-0.3 pkgconfig_2.0.3
[22] broom_0.7.5 haven_2.3.1 zlibbioc_1.24.0
[25] scales_1.1.1 generics_0.1.0 IRanges_2.12.0
[28] ellipsis_0.3.1 cachem_1.0.4 withr_2.4.1
[31] BiocGenerics_0.24.0 lazyeval_0.2.2 cli_2.3.1
[34] magrittr_2.0.1 crayon_1.4.1 readxl_1.3.1
[37] memoise_2.0.0 fs_1.5.0 fansi_0.4.2
[40] xml2_1.3.2 tools_3.4.2 data.table_1.14.0
[43] prettyunits_1.1.1 hms_1.0.0 lifecycle_1.0.0
[46] S4Vectors_0.16.0 munsell_0.5.0 reprex_1.0.0
[49] AnnotationDbi_1.40.0 compiler_3.4.2 rlang_0.4.10
[52] rhdf5_2.22.0 grid_3.4.2 RCurl_1.98-1.3
[55] rstudioapi_0.13 bitops_1.0-6 gtable_0.3.0
[58] DBI_1.1.1 curl_4.3 R6_2.5.0
[61] lubridate_1.7.10 fastmap_1.1.0 bit_4.0.4
[64] utf8_1.2.1 stringi_1.5.3 parallel_3.4.2
[67] Rcpp_1.0.6 vctrs_0.3.6 dbplyr_2.1.0
[70] tidyselect_1.1.0 `

@jwqian99
Copy link

Hello, İ am having the same issue. Please help. İ greatly appreciate any advice. İ followed the solution above , reinstalled Kallisto using Conda, but İ still get the same error.

so <- sleuth_prep(s2c, ~ condition)
reading in kallisto results
dropping unused factor levels
....
normalizing est_counts
51272 targets passed the filter
normalizing tpm
merging in metadata
Error in H5Fopen(file, "H5F_ACC_RDONLY") :
HDF5. File accessability. Unable to open file.
In addition: Warning message:
In check_num_cores(num_cores) :
It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.``

Please help. ,` sessionInfo() R version 3.4.2 (2017-09-28) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] sleuth_0.30.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.5 purrr_0.3.4 [6] readr_1.4.0 tidyr_1.1.3 tibble_3.1.0 ggplot2_3.3.3 tidyverse_1.3.0 [11] biomaRt_2.34.2

loaded via a namespace (and not attached): [1] Biobase_2.38.0 httr_1.4.2 bit64_4.0.5 [4] jsonlite_1.7.2 modelr_0.1.8 assertthat_0.2.1 [7] BiocManager_1.30.10 stats4_3.4.2 blob_1.2.1 [10] cellranger_1.1.0 yaml_2.2.1 remotes_2.2.0 [13] progress_1.2.2 pillar_1.5.1 RSQLite_2.2.4 [16] backports_1.2.1 glue_1.4.2 rvest_1.0.0 [19] colorspace_2.0-0 XML_3.99-0.3 pkgconfig_2.0.3 [22] broom_0.7.5 haven_2.3.1 zlibbioc_1.24.0 [25] scales_1.1.1 generics_0.1.0 IRanges_2.12.0 [28] ellipsis_0.3.1 cachem_1.0.4 withr_2.4.1 [31] BiocGenerics_0.24.0 lazyeval_0.2.2 cli_2.3.1 [34] magrittr_2.0.1 crayon_1.4.1 readxl_1.3.1 [37] memoise_2.0.0 fs_1.5.0 fansi_0.4.2 [40] xml2_1.3.2 tools_3.4.2 data.table_1.14.0 [43] prettyunits_1.1.1 hms_1.0.0 lifecycle_1.0.0 [46] S4Vectors_0.16.0 munsell_0.5.0 reprex_1.0.0 [49] AnnotationDbi_1.40.0 compiler_3.4.2 rlang_0.4.10 [52] rhdf5_2.22.0 grid_3.4.2 RCurl_1.98-1.3 [55] rstudioapi_0.13 bitops_1.0-6 gtable_0.3.0 [58] DBI_1.1.1 curl_4.3 R6_2.5.0 [61] lubridate_1.7.10 fastmap_1.1.0 bit_4.0.4 [64] utf8_1.2.1 stringi_1.5.3 parallel_3.4.2 [67] Rcpp_1.0.6 vctrs_0.3.6 dbplyr_2.1.0 [70] tidyselect_1.1.0 `

See if your h5 files are being used by other programs, your error is different.

@jwqian99
Copy link

Encountered the same read failed problem using sleuth 0.30.0.

Opening and closing the h5 file with h5py in python seems to solve this problem. Seems that Kallisto did not close the file correctly. Corresponds to kevinblighe reference of the file closing issue posted at rhd5 lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants