Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue reading h5 files #120

Open
pimentel opened this issue Jun 6, 2017 · 74 comments
Open

Issue reading h5 files #120

pimentel opened this issue Jun 6, 2017 · 74 comments
Labels

Comments

@pimentel
Copy link
Collaborator

pimentel commented Jun 6, 2017

Some users have reported having issues reading the H5 files.

Here is the error:

> so <- sleuth_prep(s2c, ~ condition)
reading in kallisto results
..Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessability. Unable to open file.

I would like to track this down so if you are having this issue please respond with the following:

  • Your sessionInfo() in R
  • Your operating system
  • Your version of gcc: gcc --version

And any other information you think might be informative.

Thanks,

Harold

@pimentel pimentel added the bug label Jun 6, 2017
@rachelzoeb
Copy link

Hi Harold,

I am getting this error as well.

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.5 (unknown)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] bindrcpp_0.1 sleuth_0.29.0 dplyr_0.7.0 ggplot2_2.2.1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 tidyr_0.6.3 assertthat_0.2.0 grid_3.3.0 plyr_1.8.4 R6_2.2.1 gtable_0.2.0 magrittr_1.5 scales_0.4.1 zlibbioc_1.16.0
[11] rlang_0.1.1 lazyeval_0.2.0 data.table_1.10.4 tools_3.3.0 glue_1.0.0 munsell_0.4.3 parallel_3.3.0 rhdf5_2.14.0 pkgconfig_2.0.1 colorspace_1.3-2
[21] bindr_0.1 tibble_1.3.3

Operating system: macOS Sierra version 10.12.5
RStudio Version 1.0.136
gcc version Apple LLVM version 8.1.0 (clang-802.0.42)

Hope this helps!

Rachel

@jmcribeiro
Copy link

Hi Harold and Rachel,

Same problem here.

so <- sleuth_prep(s2c, full_model = full_design)
reading in kallisto results
dropping unused factor levels
........................................................................
normalizing est_counts
72457 targets passed the filter
normalizing tpm
merging in metadata
Error in H5Fopen(file, "H5F_ACC_RDONLY") :
HDF5. File accessability. Unable to open file.

#########################################################

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] splines stats graphics grDevices utils datasets methods base

other attached packages:
[1] sleuth_0.29.0 dplyr_0.5.0 ggplot2_2.2.1 BiocInstaller_1.24.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 magrittr_1.5 zlibbioc_1.20.0 devtools_1.13.2
[5] munsell_0.4.3 colorspace_1.3-2 R6_2.2.1 rlang_0.1.1
[9] httr_1.2.1 plyr_1.8.4 tools_3.3.2 parallel_3.3.2
[13] grid_3.3.2 rhdf5_2.18.0 data.table_1.10.4 gtable_0.2.0
[17] DBI_0.6-1 git2r_0.18.0 withr_1.0.2 lazyeval_0.2.0
[21] digest_0.6.12 assertthat_0.2.0 tibble_1.3.3 tidyr_0.6.3
[25] curl_2.6 memoise_1.1.0 scales_0.4.1

#############################################################
I tried another drive using setwd() withouth success
I also tried

@jmcribeiro
Copy link

...continuing...

options(max.print=10000000)

also without success.

Regards,

Jose

@jmcribeiro
Copy link

Hi,

I run the same script on a linux machine. This time I got errors/warnings, including:

1: In read_kallisto(path, read_bootstrap = TRUE, max_bootstrap = max_bootstrap) :
You specified to read bootstraps, but we won't do so for plaintext

Indeed I have run kallisto with the --plain-text option

Now I am re-running kallisto withouth the option, and we will see what happens.

Perhaps the R versions of sleuth on Mac and Windows are not reporting the errors/warnings above.

Regards,

Jose

@jmcribeiro
Copy link

Hi,

I rerun kallisto without the --plain-text option.

Now the .h5 files were created in the expected subdirectories, it was not there before.

when running the command

so <- sleuth_prep(s2c, full_model = full_design)

on a Windows machine I now get

reading in kallisto results
dropping unused factor levels
........................................................................
normalizing est_counts
72457 targets passed the filter
normalizing tpm
merging in metadata
summarizing bootstraps
Error in parallel::mclapply(x, y, mc.cores = num_cores) :
'mc.cores' > 1 is not supported on Windows
###################################################

On looking the documentation on sleuth_prep at

https://pachterlab.github.io/sleuth/docs/sleuth_prep.html

SUGGESTION 1:
I cannot find an option to limit the script to a single core, so I suggest that that is either included as a switch on sleuth_prep or that a new version of the function takes into account the environment and indicates how many cpu's to use.

SUGGESTION 2:

On

https://pachterlab.github.io/kallisto/manual

The text

Optional arguments:
--bias Perform sequence based bias correction
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5
--fusion Search for fusions for Pizzly

could be changed to (ADDED TEXT IN BOLD).

Optional arguments:
--bias Perform sequence based bias correction
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5 (NOT COMPATIBLE WITH SLEUTH)
--fusion Search for fusions for Pizzly

On the other hand, running the same script on a Linux machine after rerunning kallisto I got no error messages! Bingo!

so <- sleuth_prep(s2c, full_model = full_design)
reading in kallisto results
........................................................................
normalizing est_counts
72457 targets passed the filter
normalizing tpm
merging in metadata
normalizing bootstrap samples
summarizing bootstraps

Regards,

Jose

@warrenmcg
Copy link
Collaborator

warrenmcg commented Jun 14, 2017

@jmcribeiro, it appears the documentation on the website is not up-to-date with the current version, as it is run separately, so it doesn't have the new options. If you go into R and do ?sleuth_prep you'll see the most up-to-date documentation.

The option you want for sleuth_prep is num_cores. So so <- sleuth_prep(s2c, full_model = full_design, num_cores = 1).

@jmcribeiro
Copy link

Hi Warren,

Thanks for your comment. your recommendation worked! Thanks.

Please see my recommendation to make sure R in windows flags the plain-text flag as well to avoid other users getting lost.

Regards,

Jose

@warrenmcg
Copy link
Collaborator

Hello!

Those are two great suggestions.

For the Windows issue, we can set a quick patch to warn users that Windows does not support mclapply and switch num_cores to 1. Moving forward, we can explore switching to using the future package, which would allow Windows users to operate multiple cores too.

For the text files issue, I wonder if this is the reason most people are having issues? I think it would make sense for sleuth_prep to check for abundance.tsv files if the abundance.h5 is absent, and use the appropriate read method.

What do you think @pimentel of these two options?

@rachelzoeb
Copy link

I have rerun kallisto and removed the --plain-text flag which removed the h5 error. However, now I get this error:

.Error in process_bootstrap(i, samp_name, kal_path, num_transcripts, est_counts_sf[[i]], :
File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".File ../analysis/data/kallisto/Deer_R1_S22/abundance.h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".

Any help is greatly appreciated,

Rachel

@warrenmcg
Copy link
Collaborator

Hi @rachelzoeb,

What was your full kallisto command, and what version of kallisto did you use?

It seems that you did not use the -b option when running kallisto, which is a requirement to take full advantage of sleuth.

If you did use the -b option with this error, maybe there is a something wrong with how sleuth is interacting with your particular version of kallisto.

If you are using the latest version of kallisto, then it would be helpful if you gave your OS and version of gcc (use gcc --version) as Harold suggested above, and emailed your abundance.h5 file to him or posted it here for me and other users to look at to help you out.

@pimentel
Copy link
Collaborator Author

@warrenmcg thanks so much for fielding these questions.

regarding the windows patch: that sounds like a great idea

Unfortunately the bootstraps are not available via plaintext at all. This is because H5 provides nice compression that is a bit of a pain to get otherwise. Initially, plaintext abundance.tsv was only intended for quick sanity checks. However, we have been discussing changing the format to remove the dependency on H5 which has proven to be an issue for some time now...

More on this soon.

@xindiguo
Copy link

Hi,

I ran kallisto with quant --bootstrap-samples=100 --threads=16 and 4 out of 8 of my h5 files had the can-not-open error. My kallisto ran on a linux server and then I downloaded the h5 files to my local machine (mac OS) to run sleuth in R. Do you think there might be an error during the file transfer? Also, I have checked the file size of the error h5 and for 3 out of 4 files have the error, the h5 file size is smaller than the tsv file. Not sure it is related. Thanks in advance for any help!

sessionInfo() in R -

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.20.0         BiocInstaller_1.26.0 bindrcpp_0.2         synapseClient_1.15-0 sleuth_0.29.0       
[6] dplyr_0.7.2          ggplot2_2.2.1        biomaRt_2.32.1      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12         compiler_3.4.1       plyr_1.8.4           bindr_0.1            zlibbioc_1.22.0     
 [6] bitops_1.0-6         tools_3.4.1          digest_0.6.12        bit_1.1-12           RSQLite_2.0         
[11] memoise_1.1.0        tibble_1.3.4         gtable_0.2.0         pkgconfig_2.0.1      rlang_0.1.2         
[16] DBI_0.7              parallel_3.4.1       stringr_1.2.0        S4Vectors_0.14.3     IRanges_2.10.2      
[21] stats4_3.4.1         bit64_0.9-7          grid_3.4.1           glue_1.1.1           Biobase_2.36.2      
[26] data.table_1.10.4    R6_2.2.2             AnnotationDbi_1.38.2 XML_3.98-1.9         tidyr_0.7.0         
[31] reshape2_1.4.2       blob_1.1.0           magrittr_1.5         matrixStats_0.52.2   scales_0.5.0        
[36] BiocGenerics_0.22.0  assertthat_0.2.0     colorspace_1.3-2     stringi_1.1.5        RCurl_1.95-4.8      
[41] lazyeval_0.2.0       munsell_0.4.3        rjson_0.2.15        
> 

OS -
kallisto was ran on 3.2.0-29-generic GNU/Linux
sleuth was ran in R on macOS Sierra version 10.12.5

gcc -

$ gcc --version
Apple LLVM version 8.1.0 (clang-802.0.42)

@miguelroboso
Copy link

miguelroboso commented Oct 27, 2017

I am currently having this issue.

I have a data frame built as it is in the walkthrough, and it looks like this:

    sample condition           path
 1:     P1        ns  expression/P1
 2:     P2        ns  expression/P2
 3:     P3         s  expression/P3
 4:     P4         s  expression/P4
 5:     P5        ns  expression/P5
 6:     P6        ns  expression/P6
 7:     P7         s  expression/P7
 8:     P8         s  expression/P8
 9:     P9        ns  expression/P9
10:    P10        ns expression/P10
11:    P11         s expression/P11
12:    P12         s expression/P12
>sessionInfo() 
R version 3.3.1 (2016-06-21) 
Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.8 (Final)  locale: 
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8         
[4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8     
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C         

attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base       

other attached packages: 
[1] bindrcpp_0.2        sleuth_0.29.0       dplyr_0.7.4         ggplot2_2.2.1       edgeR_3.16.5       
 [6] biomaRt_2.30.0      limma_3.30.13       data.table_1.10.4-3  

loaded via a namespace (and not attached):  
[1] locfit_1.5-9.1       tidyselect_0.2.2     purrr_0.2.4          lattice_0.20-34      
[5] rhdf5_2.18.0         colorspace_1.3-2     htmltools_0.3.6      stats4_3.3.1          
[9] viridisLite_0.2.0    yaml_2.1.14          base64enc_0.1-3      XML_3.98-1.7         [13] plotly_4.7.1         rlang_0.1.2          glue_1.1.1           DBI_0.6-1            
[17] BiocGenerics_0.20.0  bindr_0.1            plyr_1.8.4           stringr_1.2.0        [21] zlibbioc_1.20.0      munsell_0.4.3        gtable_0.2.0         htmlwidgets_0.9     
 [25] memoise_1.1.0        evaluate_0.10        Biobase_2.34.0       knitr_1.15.1         
[29] IRanges_2.8.2        parallel_3.3.1       AnnotationDbi_1.36.2 Rcpp_0.12.13         
[33] scales_0.5.0         backports_1.1.0      S4Vectors_0.12.2     jsonlite_1.5        
 [37] digest_0.6.12        stringi_1.1.5        grid_3.3.1           rprojroot_1.2        [41] tools_3.3.1          bitops_1.0-6         magrittr_1.5         lazyeval_0.2.0      
 [45] RCurl_1.95-4.8       tibble_1.3.4         RSQLite_1.1-2        tidyr_0.7.2          [49] pkgconfig_2.0.1      assertthat_0.1       rmarkdown_1.6        httr_1.2.1           
[53] R6_2.2.2

OS is CENTOS, 2.6.32-696.10.2.el6.x86_64

bash-4.1$ gcc --version
gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

However, I don't know why it is trying to read H5 files.
In the expression directories I only have tsv files (ran kallisto with --plain-text output)

Lastly, the error is:

It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.reading in kallisto results
dropping unused factor levels
............
normalizing est_counts
59202 targets passed the filter
normalizing tpm
merging in metadata
Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessability. Unable to open file.

@Sames-Jtudd
Copy link

Hi had the same problem and solved it.

The issue was due to the file structure I was using. Clearly this may not be the issue for everybody.

When setting up the kr_dirs data frame as per the instructions
(https://pachterlab.github.io/sleuth_walkthroughs/trapnell/analysis.html)

the program assumes that each sample is found it own directory which has both the

abundance.tsv
and
abundance.h5

with the file name unedited. When I arranged the files like this the error was not tripped.

hope that helps

@sarahharvey88
Copy link

sarahharvey88 commented Nov 22, 2017

Hello

I am also having the same error message with one of my files (I have 46 and it only seems to be kicking up this one,which I have re-generated by re-running Kallisto)

Error in process_bootstrap(i, samp_name, kal_path, num_transcripts, est_counts_sf[[i]], : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".File ../quant/WTCHG_412393_006/abundance.h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".

However, I used 100 bootstraps when I ran Kallisto and and when I look at the run info file also produced by kallisto it confirms this for this sample.

{
"n_targets": 60054,
"n_bootstraps": 100,
"n_processed": 15681904,
"kallisto_version": "0.43.1",
"index_version": 10,
"start_time": "Wed Nov 22 12:14:52 2017",
"call": "kallisto quant -i transcripts.idx -o quant/WTCHG_403319_006 -b 100 ../../data/reads/WTCHG_403319_006_1.fastq.gz ../../data/reads/WTCHG_403319_006_2.fastq.gz"
}

My sleuth prep command is this: so <- sleuth_prep(sample_to_condition, target_mapping = ttg,
aggregation_column = 'gene_id', extra_bootstrap_summary = TRUE, num_cores=1)

Any help appreciated! I used Kallisto v0.43.1 on our uni Linux server then am running Sleuth (latest version) on my macbook.

Sarah

@warrenmcg
Copy link
Collaborator

@sarahharvey88, that is odd. Could you send the problematic h5 file so I can reproduce the error on my side? Email me at:

warren-mcgee at fsm.northwestern.edu
(replace at with @ and remove spaces)

@warrenmcg
Copy link
Collaborator

@miguelroboso, as has been mentioned previously, the plain text files do not have the bootstraps included. You should rerun kallisto without the --plaintext option included. The error you are seeing is because there is a line within sleuth_prep that expects an h5 file to be present.

pinging @pimentel: the offending line causing Miguel's user-unfriendly error is this one. The current version expects an H5 file to be present, so should we be more explicit about that requirement in sleuth_prep?

@brucemoran
Copy link

brucemoran commented Dec 17, 2017

Also get this error. NB samples were run using Nextflow and executed by PBS/Torque. When I rerun the offending samples 'interactively' they all work. Not ideal though...

Kallisto command:

kallisto quant \
-l ${params.fragment_len} \
-s ${params.fragment_sd} \
-b ${params.bootstrap} \
-i ${index} \
-t ${task.cpus} \
-o ./ \
${reads1} ${reads2}

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /apps/software/R/3.4.0/lib64/R/lib/libRblas.so
LAPACK: /apps/software/R/3.4.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_IE.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_IE.UTF-8        LC_COLLATE=en_IE.UTF-8
 [5] LC_MONETARY=en_IE.UTF-8    LC_MESSAGES=en_IE.UTF-8
 [7] LC_PAPER=en_IE.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] bindrcpp_0.2   rhdf5_2.20.0   biomaRt_2.32.1 sleuth_0.29.0  dplyr_0.7.4
[6] ggplot2_2.2.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13         compiler_3.4.0       plyr_1.8.4
 [4] bindr_0.1            zlibbioc_1.22.0      bitops_1.0-6
 [7] digest_0.6.12        bit_1.1-12           RSQLite_2.0
[10] memoise_1.1.0        tibble_1.3.4         gtable_0.2.0
[13] pkgconfig_2.0.1      rlang_0.1.2          DBI_0.7
[16] parallel_3.4.0       IRanges_2.10.5       S4Vectors_0.14.7
[19] stats4_3.4.0         bit64_0.9-7          grid_3.4.0
[22] glue_1.1.1           data.table_1.10.4-2  Biobase_2.36.2
[25] R6_2.2.2             AnnotationDbi_1.38.2 XML_3.98-1.9
[28] blob_1.1.0           magrittr_1.5         scales_0.5.0
[31] BiocGenerics_0.22.1  assertthat_0.2.0     colorspace_1.3-2
[34] RCurl_1.95-4.8       lazyeval_0.2.0       munsell_0.4.3

cat /etc/*-release | head -n1
CentOS Linux release 7.3.1611 (Core)

gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)

@brucemoran
Copy link

brucemoran commented Dec 17, 2017

NB to find offending h5 files, you can use h5ls(<path/to/abundance.h5>). From this it seems that dim(h5ls(<path/to/abundance.h5>)) should be 115. So using something like below will show those samples that fail.

apply(s2c,1,function(f){ dh5 <- try(dim(h5ls(paste0(f[3],"/abundance.h5")))[1]); if(dh5!=115){ dh5<-"ERROR" }; return(paste0(f[3]," -> ",dh5)) })

@TBradley27
Copy link

TBradley27 commented Jan 29, 2018

Hello,

I am experiencing a similar problem.

sessionInfo()

R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 16.04.3 LTS

 

Matrix products: default

BLAS: /usr/lib/libblas/libblas.so.3.6.0

LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

 

locale:

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       

 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   

 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              

[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

 

attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base     

 

other attached packages:

[1] bindrcpp_0.2  sleuth_0.29.0 dplyr_0.7.4   ggplot2_2.2.1

 

loaded via a namespace (and not attached):

 [1] Rcpp_0.12.15        rstudioapi_0.7      bindr_0.1           magrittr_1.5       

 [5] zlibbioc_1.24.0     devtools_1.13.4     munsell_0.4.3       colorspace_1.3-2   

 [9] R6_2.2.2            rlang_0.1.6         plyr_1.8.4          tools_3.4.3        

[13] parallel_3.4.3      grid_3.4.3          rhdf5_2.22.0        data.table_1.10.4-3

[17] gtable_0.2.0        utf8_1.1.3          cli_1.0.0           withr_2.1.1        

[21] lazyeval_0.2.1      assertthat_0.2.0    digest_0.6.14       tibble_1.4.2       

[25] crayon_1.3.4        memoise_1.1.0       glue_1.2.0          compiler_3.4.3     

[29] pillar_1.1.0        scales_0.5.0        pkgconfig_2.0.1 

Operating System:

Linux ubuntu 4.13.0-32-generic x86_64 GNU/Linux

GCC version:

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609

The R instance is being run within a virtual machine hosted by a Windows OS, but I am not sure if that tells you anything or not.

@lydiarck
Copy link

I get a slightly different H5-related error message:

Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
HDF5. Dataset. Read failed.

Like Bruce's experience above, it only happens for some of my files, and if I re-run kallisto interactively for these files (instead of from a shell script), the resulting files can be read using sleuth with no issues.

> sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)

Matrix products: default
BLAS: /usr/analysis/src/R/R-3.4.3/lib/libRblas.so
LAPACK: /usr/analysis/src/R/R-3.4.3/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2         sleuth_0.29.0        dplyr_0.7.4         
[4] ggplot2_2.2.1        BiocInstaller_1.28.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15        bindr_0.1           magrittr_1.5       
 [4] zlibbioc_1.24.0     tidyselect_0.2.3    munsell_0.4.3      
 [7] colorspace_1.3-2    R6_2.2.2            rlang_0.1.6        
[10] stringr_1.2.0       plyr_1.8.4          tools_3.4.3        
[13] parallel_3.4.3      grid_3.4.3          rhdf5_2.22.0       
[16] data.table_1.10.4-3 gtable_0.2.0        lazyeval_0.2.1     
[19] assertthat_0.2.0    tibble_1.4.2        reshape2_1.4.3     
[22] purrr_0.2.4         tidyr_0.8.0         glue_1.2.0         
[25] stringi_1.1.6       compiler_3.4.3      pillar_1.1.0       
[28] scales_0.5.0        pkgconfig_2.0.1    
Red Hat Enterprise Linux Server release 6.9 (Santiago)

gcc --version
gcc (GCC) 4.7.4

@cajames2
Copy link

Hello!

I am still receiving this error message:

reading in kallisto results
dropping unused factor levels
....................................................................................................................................................Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessability. Unable to open file.
In addition: Warning message:
In check_num_cores(num_cores) :
  It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.

I am using kallisto 0.44.0. I ran the initial kallisto script using this command:

kallisto quant -i transcripts.idx -o output -b 100 READ1.fastq READ2.fastq

I then tried to run the sleuth_prep command in a couple of ways and got the same error both times.

so <- sleuth_prep(s2c, extra_bootstrap_summary = TRUE)
and

> mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL",
+                          dataset = "hsapiens_gene_ensembl",
+                          host = 'ensembl.org')
> t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id",
+                                      "external_gene_name"), mart = mart)
> t2g <- dplyr::rename(t2g, target_id = ensembl_transcript_id,
+                      ens_gene = ensembl_gene_id, ext_gene = external_gene_name)
> so <- sleuth_prep(s2c, target_mapping = t2g)
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.30.0       hexbin_1.27.1        sleuth_0.29.0        ggplot2_2.2.1        data.table_1.11.2    BiocInstaller_1.24.0
[7] bindrcpp_0.2.2       dplyr_0.7.5         

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17         git2r_0.21.0         plyr_1.8.4           bindr_0.1.1          bitops_1.0-6         tools_3.3.0         
 [7] zlibbioc_1.20.0      bit_1.1-13           digest_0.6.15        lattice_0.20-35      RSQLite_2.1.1        memoise_1.1.0       
[13] tibble_1.4.2         gtable_0.2.0         rhdf5_2.18.0         pkgconfig_2.0.1      rlang_0.2.0          DBI_1.0.0           
[19] curl_3.2             yaml_2.1.19          parallel_3.3.0       withr_2.1.2          httr_1.3.1           knitr_1.20          
[25] IRanges_2.8.2        S4Vectors_0.12.2     devtools_1.13.5      bit64_0.9-7          stats4_3.3.0         grid_3.3.0          
[31] tidyselect_0.2.4     Biobase_2.34.0       glue_1.2.0           R6_2.2.2             AnnotationDbi_1.36.2 XML_3.98-1.11       
[37] blob_1.1.1           tidyr_0.8.1          purrr_0.2.4          magrittr_1.5         BiocGenerics_0.20.0  scales_0.5.0        
[43] assertthat_0.2.0     colorspace_1.3-2     RCurl_1.95-4.10      lazyeval_0.2.1       munsell_0.4.3       
> 
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin14.5.0
Thread model: posix

I checked each one of my abundance.h5 files (384 total), and none of them seem to be the obvious offender. Is there anything obvious I missed that is preventing my analysis?

Thank you!

@warrenmcg
Copy link
Collaborator

@cajames2, a few questions:

  1. what version of sleuth are you running? Version 0.29.0 could be the current master version or the devel version, and it will help to know what you're working with.

  2. did you run the suggested code from brucemoran above? Did that identify any samples with an unexpected dimension?

  3. if the answer is 'no', what is the RAM available for your computer? It is possible that 384 samples (which is a lot) is too much for your system to handle at once, and the cryptic error message is indicating that your machine ran out of RAM and swap memory. I know I have worked with a dataset that has 600 samples, and that still uses 60-80 GB of RAM on a machine with 128 GB. If you're working off of a laptop, that is likely the issue.

  4. if RAM is not the problem and none of your kallisto files are corrupted, that's when we'll have to explore exactly what happened. There is probably a way for us to run the "reading in kallisto files" step of sleuth_prep while still keeping track of which file we're reading.

@warrenmcg
Copy link
Collaborator

@lydiarck: sorry for the delayed response. It seems like in your situation, something is failing with kallisto or with your script. Depending on how exactly you're running the script, you might also be running into a memory issue that is causing certain kallisto runs to fail. Did you see anything suspicious with the log messages, or with the auxiliary files accompanying the corrupted runs?

@cajames2
Copy link

@warrenmcg: Thanks for your quick reply. I am using package ‘sleuth’ version 0.29.0. When I run the code suggested by brucemoran, each one of my .h5 files returns an error. This makes me think there may have been an issue with the initial kallisto run. However, I spot checked some of the abundance.tsv files and they are populated, so in practicality the kallisto run worked as expected.

An example:

> apply(s2c,1,function(f){ dh5 <- try(dim(h5ls(paste0(f[3],"/abundance.h5")))[1]); if(dh5!=115){ dh5<-"ERROR" }; return(paste0(f[3]," -> ",dh5)) })

Error in try(dim(h5ls(paste0(f[3], "/abundance.h5")))[1]) : 
  could not find function "h5ls"
 [1]"../output/Plate1A01 -> ERROR"

But, the abundance.tsv file for this sample has shows the transcript ids that aligned faithfully to my data set for that sample.

For what it's worth, when I ran kallisto on the .fastq.gz file of my entire data set, my computer could not handle it. To get around that, I unzipped the file and demultiplexed all my samples and wrote a loop so that kallisto would run on each sample individually. It took about 8 hours but seemed to work fine. Do you think that maybe this was the issue? If not, I'm inclined to think my computer might not have sufficient RAM to handle this data set.

Thanks for all your help.

@warrenmcg
Copy link
Collaborator

@cajames2, The problem is not with your files, but with the rhdf5 package and the h5ls function.
I would make sure these lines work:

library(rhdf5)
?h5ls

If they don't work, that's the problem. Once those lines work, try repeating the suggested code above.

In the meantime, my suspicion is that your computer can't handle the dataset on its own with the available RAM. This will be especially true if you're handling 384 samples while also sending data out to multiple cores. Because of how R does forking, a full copy of all data currently in the R workspace will be sent to each worker, and so RAM can balloon quite a lot if you have a lot of data already present. Unfortunately, not much we can do about that...

To confirm that RAM is the issue, I would pull the activity monitor up and watch your RAM usage while the sleuth run is going. You could try processing the bootstraps using just one core -- it will take a while, but it may have a better chance of succeeding.

@SRenan
Copy link

SRenan commented Nov 9, 2018

Also experiencing the original error:

reading in kallisto results
dropping unused factor levels
.Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) : 
  HDF5. File accessibilty. Unable to open file.

I believe the hdf5 files are corrupted and this has nothing to do with sleuth but here is the requested info.

This happens with gcc 4.4.4

gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and 7.3.1

gcc --version
gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

Using sleuth 0.30.0

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.10 (Santiago)

Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] sleuth_0.30.0       DT_0.4              limma_3.36.5       
 [4] Biobase_2.40.0      BiocGenerics_0.26.0 biomaRt_2.36.1     
 [7] ggplot2_3.0.0       XCIR_0.1.25         PSUmisc_0.0.11     
[10] data.table_1.11.8 

Here is the kallisto 0.44.0 command used to generate the hdf5 files

kallisto quant -t 20 -i kal_idx   samp_1.fastq samp_2.fastq -o samp_out -b 100

Now, sorting the runs by the size of their abundance.h5 file and running sleuth_prep file by file:

  1. The error happens in lower file sizes only
  2. There is no error reported in the kallisto run but most of the failing samples report less EM of the bootstrap than the desired number. While the files that can be read by sleuth consistently report 100 iterations.
  3. I don't know anything about hdf5, but opening the h5 files (less), all successful files have an "<89>HDF" tag at the top while all the tested error file don't.

So my take is that sleuth is fine and the hdf5 files are simply corrupted. This is supported by running kallisto's h5dump

kallisto h5dump samp/abundance.h5 -output-dir="./"
HDF5-DIAG: Error detected in HDF5 (1.8.15-patch1) thread 0:
  #000: H5F.c line 604 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file

On my end, I'm thinking this may be the batch system killing jobs which would explain the lack of error reported by kallisto. Looking at some of the scripts in this issue, I suspect some other users may be in the same situation.

@warrenmcg
Copy link
Collaborator

Hi @SRenan,

If h5dump is not working, then I think your diagnosis that this is related to your batch system is correct. You can confirm this if you are able to successfully run kallisto interactively on one of the problematic samples. If kallisto fails interactively as well, please submit an issue to kallisto here with the details of your set-up and the error.

If it turns out to be an issue with your batch system, consult with the IT team at your institution to see what you can do to monitor your batch jobs. It may be as simple as adding the &> log_file.txt "redirect all shell output to 'log_file.txt'" command to the end of your kallisto command (see here), or something else depending on your cluster and your script. The most common reason for batch jobs getting killed is miscalculating your RAM and core needs when submitting a job, so they will also be able to troubleshoot with you to see those need to be adjusted when submitting these jobs, or if something else is happening, so that this problem is prevented in the future.

@johanneskoester
Copy link

HI guys. I am happy to tell you that the issue is now fixed when using the latest bioconda packages of rhdf5 and rhdf5lib. It was indeed a combination of missing zlib support and problems when making the included szip library portable. For the future, we have protected ourselves against such problems by adding a test to the bioconductor-rhdf5 package that ensures kallisto compatibility.

@warrenmcg
Copy link
Collaborator

Great work @johanneskoester! I wonder what this means for the rhdf5 and rhdf5lib packages when downloading them directly from bioconductor? Do they have this issue? Was this only an issue if kallisto/rhdf5/rhdf5lib were all built using bioconda?

@johanneskoester
Copy link

So, this issue was more likely to appear when packaging it in a portable way. However, one issue that certainly occurs also when installing directly is that, if zlib headers are not found, rhdf5lib will silently compile without zlib compression support. Then, upon using it, you get these not very descriptive error messages posted here whenever reading a dataset with zlib compressed tables.

@marcora
Copy link

marcora commented Feb 18, 2019

I am having the same problem with rhdf5 and rhdf5lib packages when downloading them directly from bioconductor!

@egenomics
Copy link

Hi,
I am working with previous runs of kallisto (that worked) in other machines. Now I get the sleuth error when trying to load them. My session in R 3.4.3 is afterwards. I have tried a new installation with R 3.5 but can't install rhdf5lib for some reason.

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic

sessionInfo(

  • )
    R version 3.4.3 (2017-11-30)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /home/jl/anaconda2/lib/R/lib/libRblas.so
LAPACK: /home/jl/anaconda2/lib/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rmarkdown_1.11 knitr_1.21 rhdf5_2.20.0 tximportData_1.6.0 tidyr_0.8.2
[6] dplyr_0.8.0.1 dbplyr_1.3.0 RSQLite_2.1.1 cowplot_0.9.4 ggplot2_3.1.0
[11] sleuth_0.30.0 RevoUtils_10.0.8 RevoUtilsMath_10.0.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 pillar_1.3.1 compiler_3.4.3 plyr_1.8.4 tools_3.4.3
[6] zlibbioc_1.24.0 digest_0.6.18 bit_1.1-14 evaluate_0.13 memoise_1.1.0
[11] tibble_2.0.1 gtable_0.2.0 pkgconfig_2.0.2 rlang_0.3.1 DBI_1.0.0
[16] rstudioapi_0.9.0 parallel_3.4.3 yaml_2.2.0 xfun_0.4 withr_2.1.2
[21] bit64_0.9-7 grid_3.4.3 tidyselect_0.2.5 glue_1.3.0 data.table_1.12.0
[26] R6_2.4.0 purrr_0.3.0 blob_1.1.1 magrittr_1.5 htmltools_0.3.6
[31] scales_1.0.0 assertthat_0.2.0 colorspace_1.4-0 stringi_1.3.1 lazyeval_0.2.1
[36] munsell_0.5.0 crayon_1.3.4

gcc --version
gcc (crosstool-NG fa8859cb) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.

@johanneskoester
Copy link

@egenomics @marcora consider using the bioconda packages. Besides more reproducible analyses and easier management, they do not suffer from this problem anymore.

@marcora
Copy link

marcora commented Feb 25, 2019

They do actually!

@warrenmcg
Copy link
Collaborator

@marcora: are we talking bioconductor or bioconda? You mentioned installing from bioconductor. However, Johannes seems to have fixed this issue with the bioconda recipe for installing rhdf5, which is different than the standard way of installing rhdf5 through bioconductor. If you are still having an issue after installing the bioconda recipe, can you confirm?

@marcora
Copy link

marcora commented Feb 26, 2019

Within R, I use BiocManager::install() to install R packages... not conda. Is there a way to fix this issue when installing/compiling packages from within R directly? If not, I will try to manage my R environment via conda, but it is not optimal when some packages or package versions are not available in conda repos.

@egenomics
Copy link

Installing from bioconda failed as well. In the end I managed to reinstall everything in a 3.5 R and is working now...

@johanneskoester
Copy link

Only the very latest versions of the packages in bioconda are fixed. In particular for the latest R. So if you use e.g. an older R, you will get the bug again because conda fetches older versions of the packages. You need bioconductor-rhdf5 >=2.26.2 and bioconductor-rhdf5lib >= 1.4.2, together with r-base >=3.5.1. In that combination it should work. If not, please post the output of conda list of the respective environment. Also note that the required channel order defined at https://bioconda.github.io has to be used. Otherwise, conda will e.g. pick stuff from the commercial R or default channels, which might not yet contain our fixes.

@warrenmcg
Copy link
Collaborator

warrenmcg commented Feb 26, 2019

Hi @johanneskoester, thank you again for all of your hard work to troubleshoot this issue!

Do you have an update on whether the bioconductor installations of rhdf5 and rhdf5lib are being fixed to address this issue? Has the package developer been informed of this issue? The typical R user will not be working with bioconda, and it would be great to have this working on all fronts.

edit: I went ahead and opened an issue on the Rhdf5lib repo.

@grimbough
Copy link

I've thought about this some more, and my current suspicion is that this is a file locking issue and some other process is preventing rhdf5 from opening the .h5 file. If you're seeing this error please try running the following in R and then re-run the command that threw the error:

Sys.setenv(HDF5_USE_FILE_LOCKING = "FALSE")

If this works please report back.


As for why I'm not convinced this is a missing ZLIB issue, it's mostly due to the fact the error reported here occurs when opening the HDF5 file e.g.

Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessability. Unable to open file.

However, a failure where ZLIB was required but not found would only occur when trying to read a dataset, opening the file should be fine regardless of filter availability, and the error would look something like:

Error in H5Dread(...  : 
  HDF5. Dataset. Read failed.

Here's a little example demonstrating that you get this error if a second process tries to open an HDF5 file that is already open:

## download the example abundance file
h5_file <- tempfile(pattern = "abundance", fileext = ".h5")
download.file('https://raw.githubusercontent.com/pachterlab/sleuth/master/tests/testthat/small_test_data/kallisto/abundance.h5', 
              destfile = h5_file, mode = "wb")

## open a file handle and view
fid1 <- H5Fopen( h5_file )
fid1

#HDF5 FILE 
#        name /
#    filename 
#
#        name       otype dclass dim
#0 aux        H5I_GROUP             
#1 bootstrap  H5I_GROUP             
#2 est_counts H5I_DATASET  FLOAT  15

## launch Rscript to run a new process accessing the same file
## this will fail
system2("Rscript", 
        args = paste0("-e 'fid2 <- rhdf5::H5Fopen(\"", h5_file, "\"); fid2'"))

#Error in rhdf5::H5Fopen("/tmp/RtmpYJppmS/abundance403f6b7e2c5f.h5") : 
#  HDF5. File accessibilty. Unable to open file.
#Execution halted

## close the file handle in this process and try again
H5Fclose(fid1)
system2("Rscript", 
        args = paste0("-e 'fid2 <- rhdf5::H5Fopen(\"", h5_file, "\"); fid2'"))

#HDF5 FILE 
#        name /
#    filename 
#
#        name       otype dclass dim
#0 aux        H5I_GROUP             
#1 bootstrap  H5I_GROUP             
#2 est_counts H5I_DATASET  FLOAT  15

@yifeiliu3959
Copy link

Hi,

I am experiencing the same problem. I tried the Sys.setenv(HDF5_USE_FILE_LOCKING = "FALSE"). But does not work.

sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocInstaller_1.32.1 rhdf5_2.26.2 raster_2.8-19 gdalUtils_2.0.1.14
[5] rgdal_1.4-3 sp_1.3-1 ncdf4_1.16.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 lattice_0.20-38 codetools_0.2-16 foreach_1.4.4 R.methodsS3_1.7.1
[6] grid_3.5.3 R.oo_1.22.0 R.utils_2.8.0 Rhdf5lib_1.4.3 iterators_1.0.10
[11] tools_3.5.3 xfun_0.6 yaml_2.2.0 compiler_3.5.3 BiocManager_1.30.4
[16] knitr_1.22

operation system:
macOS Mojave

gcc version:
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

@yifeiliu3959
Copy link

Hi,

I am experiencing the same problem. I tried the Sys.setenv(HDF5_USE_FILE_LOCKING = "FALSE"). But does not work.

sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocInstaller_1.32.1 rhdf5_2.26.2 raster_2.8-19 gdalUtils_2.0.1.14
[5] rgdal_1.4-3 sp_1.3-1 ncdf4_1.16.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 lattice_0.20-38 codetools_0.2-16 foreach_1.4.4 R.methodsS3_1.7.1
[6] grid_3.5.3 R.oo_1.22.0 R.utils_2.8.0 Rhdf5lib_1.4.3 iterators_1.0.10
[11] tools_3.5.3 xfun_0.6 yaml_2.2.0 compiler_3.5.3 BiocManager_1.30.4
[16] knitr_1.22

operation system:
macOS Mojave

gcc version:
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

By the way, ‘kallisto’ is not available (for R version 3.5.3) and ‘sleuth’ is not available (for R version 3.5.3)

@umasstr
Copy link

umasstr commented Feb 10, 2020

I encounter this error regularly when running kallisto/sleuth on a large number of samples. This is persistent across Linux distros and Windows (WSL), containerized and non-containerized packages. In my experience, 100% of unreadable HDF5 messages have been attributed to silent h5 corruption upon generation by kallisto.

Error rate of local/interactive run <<< linux server jobs. Time to pseudoalign on linux server jobs (despite more cores/RAM assigned) is wildly increased as well. I can only speculate that there is some minuscule basal rate of silent h5 corruption by kallisto that is exaggerated in some scenarios (server jobs e.g.). Pseudoaligning many samples may just reach the expected value for at least one corrupt h5.

In agreement with this thread, simply checking the h5 files and regenerating those which have been corrupted works every time. Thanks @brucemoran for the means to conduct the file check.

@oggismetto
Copy link

Dear all, i am using a specific pipeline which removes all .h5 files. I have only the abundance.tsv. This means i cannot run sleuth? Thanks in advance for any help

@mschilli87
Copy link
Contributor

@oggismetto:

Dear all, i am using a specific pipeline which removes all .h5 files. I have only the abundance.tsv. This means i cannot run sleuth?

AFAICT, read_kallisto falls back to reading the TSV if no HDF5 was found. So you should be able to run sleuth, but with the limitation that kallisto's bootstraps (via -b flag) are only stored in the HDF5 and you'd loose that extra information on the the estimated technical noise in you differential expression analyses.

@pragathisneha
Copy link

Hi,
I am having issue with salmon files converted by wasabi in sleuth.
Error : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b"

@pragathisneha
Copy link

Hi, I am having issue with salmon files converted by wasabi in sleuth. Error : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b"
@warrenmcg : Please help me with this issue

@mschilli87
Copy link
Contributor

@pragathisneha: What operating system are you on? The wasabi README mentiones a Windows issue with bootstrap information from salmon.
Regadless, AFAICT, sleuth has done all it (or its developers) can do for you: It tells you what information is missing in your input data. Since you get these from a different tool, you are probably better off asking for help at the wasabi and/or salmon support channels.

@martalopes5234
Copy link

Hello,

I am having the same problem:

so <- sleuth_prep(s2c, ~ condition)
reading in kallisto results
..Error in H5Fopen(file, "H5F_ACC_RDONLY") :
HDF5. File accessability. Unable to open file.

SessionInfo()

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.0.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] pt_PT.UTF-8/pt_PT.UTF-8/pt_PT.UTF-8/C/pt_PT.UTF-8/pt_PT.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocManager_1.30.18 rhdf5filters_1.9.0 Rhdf5lib_1.18.2 httr_1.4.4 rhdf5_2.40.0
[6] sleuth_0.30.0 Matrix_1.5-0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lattice_0.20-45 prettyunits_1.1.1 ps_1.7.1 rprojroot_2.0.3
[6] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2 mime_0.12 R6_2.5.1
[11] ggplot2_3.3.6 pillar_1.8.1 rlang_1.0.5 curl_4.3.2 lazyeval_0.2.2
[16] rstudioapi_0.14 data.table_1.14.2 miniUI_0.1.1.1 callr_3.7.2 urlchecker_1.0.1
[21] devtools_2.4.4 stringr_1.4.1 htmlwidgets_1.5.4 munsell_0.5.0 shiny_1.7.2
[26] compiler_4.2.1 httpuv_1.6.6 pkgconfig_2.0.3 pkgbuild_1.3.1 htmltools_0.5.3
[31] tidyselect_1.1.2 tibble_3.1.8 fansi_1.0.3 withr_2.5.0 crayon_1.5.1
[36] dplyr_1.0.10 later_1.3.0 grid_4.2.1 xtable_1.8-4 gtable_0.3.1
[41] lifecycle_1.0.2 DBI_1.1.3 magrittr_2.0.3 scales_1.2.1 cli_3.4.0
[46] stringi_1.7.8 cachem_1.0.6 fs_1.5.2 promises_1.2.0.1 remotes_2.4.2
[51] ellipsis_0.3.2 generics_0.1.3 vctrs_0.4.1 tools_4.2.1 glue_1.6.2
[56] purrr_0.3.4 processx_3.7.0 pkgload_1.3.0 parallel_4.2.1 fastmap_1.1.0
[61] colorspace_2.0-3 sessioninfo_1.2.2 memoise_2.0.1 profvis_0.3.7 usethis_2.1.6

macOS Monterey
Versão 12.0.1

Kallisto version HDF5 FILES 1.12.2

rhdf5::h5version()
This is Bioconductor rhdf5 2.40.0 linking to C-library HDF5 1.10.7 and rhdf5filters 1.9.0

I already tried to look for a more recent rhdf5 package that supports HDF5 FILES 1.12.2 with no success.

How did you solve this issue?

@marianaago
Copy link

Hi,

I am having the same issue to this day - I was wondering if anyone reached a concensus on the best way to tackle this issue?

I am getting the following message:

so <- sleuth_prep(kal_dirs_fixed, extra_bootstrap_summary = TRUE)

reading in kallisto results
dropping unused factor levels
..........................................
normalizing est_counts
58048 targets passed the filter
normalizing tpm
merging in metadata
Error in H5Fopen(file, flags = flags, fapl = fapl, native = native) :
HDF5. File accessibility. Unable to open file.
In addition: Warning message:
In check_num_cores(num_cores) :
It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.

sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8 LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.utf8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocManager_1.30.21 Rhdf5lib_1.22.0 cowplot_1.1.1 sleuth_0.30.1 rhdf5_2.44.0

loaded via a namespace (and not attached):
[1] tximport_1.28.0 KEGGREST_1.40.0 gtable_0.3.3 ggplot2_3.4.2 Biobase_2.60.0 rhdf5filters_1.12.1
[7] vctrs_0.6.3 tools_4.3.0 bitops_1.0-7 generics_0.1.3 parallel_4.3.0 stats4_4.3.0
[13] curl_5.0.1 tibble_3.2.1 fansi_1.0.4 AnnotationDbi_1.62.1 RSQLite_2.3.1 blob_1.2.4
[19] pkgconfig_2.0.3 data.table_1.14.8 dbplyr_2.3.2 S4Vectors_0.38.1 lifecycle_1.0.3 GenomeInfoDbData_1.2.10
[25] compiler_4.3.0 stringr_1.5.0 Biostrings_2.68.1 progress_1.2.2 munsell_0.5.0 GenomeInfoDb_1.36.0
[31] RCurl_1.98-1.12 lazyeval_0.2.2 tidyr_1.3.0 pillar_1.9.0 crayon_1.5.2 cachem_1.0.8
[37] tidyselect_1.2.0 digest_0.6.31 stringi_1.7.12 purrr_1.0.1 dplyr_1.1.2 biomaRt_2.56.1
[43] fastmap_1.1.1 grid_4.3.0 colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3 XML_3.99-0.14
[49] utf8_1.2.3 withr_2.5.0 prettyunits_1.1.1 filelock_1.0.2 scales_1.2.1 rappdirs_0.3.3
[55] bit64_4.0.5 XVector_0.40.0 httr_1.4.6 bit_4.0.5 png_0.1-8 hms_1.1.3
[61] memoise_2.0.1 IRanges_2.34.0 BiocFileCache_2.8.0 rlang_1.1.1 glue_1.6.2 DBI_1.1.3
[67] xml2_1.3.4 BiocGenerics_0.46.0 rstudioapi_0.14 R6_2.5.1 zlibbioc_1.46.0

Please let me know if you are able to help solve this

Best,
Mariana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests