-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue reading h5 files #120
Comments
Hi Harold, I am getting this error as well.
locale: attached base packages: other attached packages: loaded via a namespace (and not attached): Operating system: macOS Sierra version 10.12.5 Hope this helps! Rachel |
Hi Harold and Rachel, Same problem here.
######################################################### R version 3.3.2 (2016-10-31) locale: attached base packages: other attached packages: loaded via a namespace (and not attached): ############################################################# |
...continuing...
also without success. Regards, Jose |
Hi, I run the same script on a linux machine. This time I got errors/warnings, including: 1: In read_kallisto(path, read_bootstrap = TRUE, max_bootstrap = max_bootstrap) : Indeed I have run kallisto with the --plain-text option Now I am re-running kallisto withouth the option, and we will see what happens. Perhaps the R versions of sleuth on Mac and Windows are not reporting the errors/warnings above. Regards, Jose |
Hi, I rerun kallisto without the --plain-text option. Now the .h5 files were created in the expected subdirectories, it was not there before. when running the command so <- sleuth_prep(s2c, full_model = full_design) on a Windows machine I now get reading in kallisto results On looking the documentation on sleuth_prep at https://pachterlab.github.io/sleuth/docs/sleuth_prep.html SUGGESTION 1: SUGGESTION 2: On https://pachterlab.github.io/kallisto/manual The text Optional arguments: could be changed to (ADDED TEXT IN BOLD). Optional arguments: On the other hand, running the same script on a Linux machine after rerunning kallisto I got no error messages! Bingo!
Regards, Jose |
@jmcribeiro, it appears the documentation on the website is not up-to-date with the current version, as it is run separately, so it doesn't have the new options. If you go into R and do The option you want for |
Hi Warren, Thanks for your comment. your recommendation worked! Thanks. Please see my recommendation to make sure R in windows flags the plain-text flag as well to avoid other users getting lost. Regards, Jose |
Hello! Those are two great suggestions. For the Windows issue, we can set a quick patch to warn users that Windows does not support mclapply and switch num_cores to 1. Moving forward, we can explore switching to using the For the text files issue, I wonder if this is the reason most people are having issues? I think it would make sense for What do you think @pimentel of these two options? |
I have rerun kallisto and removed the --plain-text flag which removed the h5 error. However, now I get this error: .Error in process_bootstrap(i, samp_name, kal_path, num_transcripts, est_counts_sf[[i]], : Any help is greatly appreciated, Rachel |
Hi @rachelzoeb, What was your full It seems that you did not use the If you did use the If you are using the latest version of |
@warrenmcg thanks so much for fielding these questions. regarding the windows patch: that sounds like a great idea Unfortunately the bootstraps are not available via plaintext at all. This is because H5 provides nice compression that is a bit of a pain to get otherwise. Initially, plaintext |
Hi, I ran kallisto with quant --bootstrap-samples=100 --threads=16 and 4 out of 8 of my h5 files had the can-not-open error. My kallisto ran on a linux server and then I downloaded the h5 files to my local machine (mac OS) to run sleuth in R. Do you think there might be an error during the file transfer? Also, I have checked the file size of the error h5 and for 3 out of 4 files have the error, the h5 file size is smaller than the tsv file. Not sure it is related. Thanks in advance for any help! sessionInfo() in R -
OS - gcc -
|
I am currently having this issue. I have a data frame built as it is in the walkthrough, and it looks like this:
OS is CENTOS, 2.6.32-696.10.2.el6.x86_64
However, I don't know why it is trying to read H5 files. Lastly, the error is:
|
Hi had the same problem and solved it. The issue was due to the file structure I was using. Clearly this may not be the issue for everybody. When setting up the kr_dirs data frame as per the instructions the program assumes that each sample is found it own directory which has both the abundance.tsv with the file name unedited. When I arranged the files like this the error was not tripped. hope that helps |
Hello I am also having the same error message with one of my files (I have 46 and it only seems to be kicking up this one,which I have re-generated by re-running Kallisto) Error in process_bootstrap(i, samp_name, kal_path, num_transcripts, est_counts_sf[[i]], : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".File ../quant/WTCHG_412393_006/abundance.h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b". However, I used 100 bootstraps when I ran Kallisto and and when I look at the run info file also produced by kallisto it confirms this for this sample. { My sleuth prep command is this: so <- sleuth_prep(sample_to_condition, target_mapping = ttg, Any help appreciated! I used Kallisto v0.43.1 on our uni Linux server then am running Sleuth (latest version) on my macbook. Sarah |
@sarahharvey88, that is odd. Could you send the problematic h5 file so I can reproduce the error on my side? Email me at: warren-mcgee at fsm.northwestern.edu |
@miguelroboso, as has been mentioned previously, the plain text files do not have the bootstraps included. You should rerun pinging @pimentel: the offending line causing Miguel's user-unfriendly error is this one. The current version expects an H5 file to be present, so should we be more explicit about that requirement in |
Also get this error. NB samples were run using Nextflow and executed by PBS/Torque. When I rerun the offending samples 'interactively' they all work. Not ideal though... Kallisto command:
|
NB to find offending h5 files, you can use
|
Hello, I am experiencing a similar problem. sessionInfo()
Operating System:
GCC version:
The R instance is being run within a virtual machine hosted by a Windows OS, but I am not sure if that tells you anything or not. |
I get a slightly different H5-related error message:
Like Bruce's experience above, it only happens for some of my files, and if I re-run kallisto interactively for these files (instead of from a shell script), the resulting files can be read using sleuth with no issues.
|
Hello! I am still receiving this error message:
I am using kallisto 0.44.0. I ran the initial kallisto script using this command:
I then tried to run the sleuth_prep command in a couple of ways and got the same error both times.
I checked each one of my abundance.h5 files (384 total), and none of them seem to be the obvious offender. Is there anything obvious I missed that is preventing my analysis? Thank you! |
@cajames2, a few questions:
|
@lydiarck: sorry for the delayed response. It seems like in your situation, something is failing with kallisto or with your script. Depending on how exactly you're running the script, you might also be running into a memory issue that is causing certain kallisto runs to fail. Did you see anything suspicious with the log messages, or with the auxiliary files accompanying the corrupted runs? |
@warrenmcg: Thanks for your quick reply. I am using package ‘sleuth’ version 0.29.0. When I run the code suggested by brucemoran, each one of my .h5 files returns an error. This makes me think there may have been an issue with the initial kallisto run. However, I spot checked some of the abundance.tsv files and they are populated, so in practicality the kallisto run worked as expected. An example:
But, the abundance.tsv file for this sample has shows the transcript ids that aligned faithfully to my data set for that sample. For what it's worth, when I ran kallisto on the .fastq.gz file of my entire data set, my computer could not handle it. To get around that, I unzipped the file and demultiplexed all my samples and wrote a loop so that kallisto would run on each sample individually. It took about 8 hours but seemed to work fine. Do you think that maybe this was the issue? If not, I'm inclined to think my computer might not have sufficient RAM to handle this data set. Thanks for all your help. |
@cajames2, The problem is not with your files, but with the
If they don't work, that's the problem. Once those lines work, try repeating the suggested code above. In the meantime, my suspicion is that your computer can't handle the dataset on its own with the available RAM. This will be especially true if you're handling 384 samples while also sending data out to multiple cores. Because of how R does forking, a full copy of all data currently in the R workspace will be sent to each worker, and so RAM can balloon quite a lot if you have a lot of data already present. Unfortunately, not much we can do about that... To confirm that RAM is the issue, I would pull the activity monitor up and watch your RAM usage while the sleuth run is going. You could try processing the bootstraps using just one core -- it will take a while, but it may have a better chance of succeeding. |
Also experiencing the original error:
I believe the hdf5 files are corrupted and this has nothing to do with sleuth but here is the requested info. This happens with gcc 4.4.4
and 7.3.1
Using sleuth 0.30.0
Here is the kallisto
Now, sorting the runs by the size of their abundance.h5 file and running
So my take is that sleuth is fine and the hdf5 files are simply corrupted. This is supported by running kallisto's
On my end, I'm thinking this may be the batch system killing jobs which would explain the lack of error reported by kallisto. Looking at some of the scripts in this issue, I suspect some other users may be in the same situation. |
Hi @SRenan, If If it turns out to be an issue with your batch system, consult with the IT team at your institution to see what you can do to monitor your batch jobs. It may be as simple as adding the |
HI guys. I am happy to tell you that the issue is now fixed when using the latest bioconda packages of rhdf5 and rhdf5lib. It was indeed a combination of missing zlib support and problems when making the included szip library portable. For the future, we have protected ourselves against such problems by adding a test to the bioconductor-rhdf5 package that ensures kallisto compatibility. |
Great work @johanneskoester! I wonder what this means for the rhdf5 and rhdf5lib packages when downloading them directly from bioconductor? Do they have this issue? Was this only an issue if kallisto/rhdf5/rhdf5lib were all built using bioconda? |
So, this issue was more likely to appear when packaging it in a portable way. However, one issue that certainly occurs also when installing directly is that, if zlib headers are not found, rhdf5lib will silently compile without zlib compression support. Then, upon using it, you get these not very descriptive error messages posted here whenever reading a dataset with zlib compressed tables. |
I am having the same problem with rhdf5 and rhdf5lib packages when downloading them directly from bioconductor! |
Hi, lsb_release -a
Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): gcc --version |
@egenomics @marcora consider using the bioconda packages. Besides more reproducible analyses and easier management, they do not suffer from this problem anymore. |
They do actually! |
@marcora: are we talking bioconductor or bioconda? You mentioned installing from bioconductor. However, Johannes seems to have fixed this issue with the bioconda recipe for installing rhdf5, which is different than the standard way of installing rhdf5 through bioconductor. If you are still having an issue after installing the bioconda recipe, can you confirm? |
Within R, I use BiocManager::install() to install R packages... not conda. Is there a way to fix this issue when installing/compiling packages from within R directly? If not, I will try to manage my R environment via conda, but it is not optimal when some packages or package versions are not available in conda repos. |
Installing from bioconda failed as well. In the end I managed to reinstall everything in a 3.5 R and is working now... |
Only the very latest versions of the packages in bioconda are fixed. In particular for the latest R. So if you use e.g. an older R, you will get the bug again because conda fetches older versions of the packages. You need bioconductor-rhdf5 >=2.26.2 and bioconductor-rhdf5lib >= 1.4.2, together with r-base >=3.5.1. In that combination it should work. If not, please post the output of |
Hi @johanneskoester, thank you again for all of your hard work to troubleshoot this issue! Do you have an update on whether the bioconductor installations of rhdf5 and rhdf5lib are being fixed to address this issue? Has the package developer been informed of this issue? The typical R user will not be working with bioconda, and it would be great to have this working on all fronts. edit: I went ahead and opened an issue on the Rhdf5lib repo. |
…iles + This should make it clear to users when issue uncovered in pachterlab#120 might apply
I've thought about this some more, and my current suspicion is that this is a file locking issue and some other process is preventing rhdf5 from opening the .h5 file. If you're seeing this error please try running the following in R and then re-run the command that threw the error:
If this works please report back. As for why I'm not convinced this is a missing ZLIB issue, it's mostly due to the fact the error reported here occurs when opening the HDF5 file e.g.
However, a failure where ZLIB was required but not found would only occur when trying to read a dataset, opening the file should be fine regardless of filter availability, and the error would look something like:
Here's a little example demonstrating that you get this error if a second process tries to open an HDF5 file that is already open: ## download the example abundance file
h5_file <- tempfile(pattern = "abundance", fileext = ".h5")
download.file('https://raw.githubusercontent.com/pachterlab/sleuth/master/tests/testthat/small_test_data/kallisto/abundance.h5',
destfile = h5_file, mode = "wb")
## open a file handle and view
fid1 <- H5Fopen( h5_file )
fid1
#HDF5 FILE
# name /
# filename
#
# name otype dclass dim
#0 aux H5I_GROUP
#1 bootstrap H5I_GROUP
#2 est_counts H5I_DATASET FLOAT 15
## launch Rscript to run a new process accessing the same file
## this will fail
system2("Rscript",
args = paste0("-e 'fid2 <- rhdf5::H5Fopen(\"", h5_file, "\"); fid2'"))
#Error in rhdf5::H5Fopen("/tmp/RtmpYJppmS/abundance403f6b7e2c5f.h5") :
# HDF5. File accessibilty. Unable to open file.
#Execution halted
## close the file handle in this process and try again
H5Fclose(fid1)
system2("Rscript",
args = paste0("-e 'fid2 <- rhdf5::H5Fopen(\"", h5_file, "\"); fid2'"))
#HDF5 FILE
# name /
# filename
#
# name otype dclass dim
#0 aux H5I_GROUP
#1 bootstrap H5I_GROUP
#2 est_counts H5I_DATASET FLOAT 15 |
Hi, I am experiencing the same problem. I tried the Sys.setenv(HDF5_USE_FILE_LOCKING = "FALSE"). But does not work.
Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): operation system: gcc version: |
By the way, ‘kallisto’ is not available (for R version 3.5.3) and ‘sleuth’ is not available (for R version 3.5.3) |
I encounter this error regularly when running kallisto/sleuth on a large number of samples. This is persistent across Linux distros and Windows (WSL), containerized and non-containerized packages. In my experience, 100% of unreadable HDF5 messages have been attributed to silent h5 corruption upon generation by kallisto. Error rate of local/interactive run <<< linux server jobs. Time to pseudoalign on linux server jobs (despite more cores/RAM assigned) is wildly increased as well. I can only speculate that there is some minuscule basal rate of silent h5 corruption by kallisto that is exaggerated in some scenarios (server jobs e.g.). Pseudoaligning many samples may just reach the expected value for at least one corrupt h5. In agreement with this thread, simply checking the h5 files and regenerating those which have been corrupted works every time. Thanks @brucemoran for the means to conduct the file check. |
Dear all, i am using a specific pipeline which removes all .h5 files. I have only the abundance.tsv. This means i cannot run sleuth? Thanks in advance for any help |
AFAICT, |
Hi, |
|
@pragathisneha: What operating system are you on? The wasabi README mentiones a Windows issue with bootstrap information from salmon. |
Hello, I am having the same problem: so <- sleuth_prep(s2c, ~ condition) SessionInfo() R version 4.2.1 (2022-06-23) Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): macOS Monterey Kallisto version HDF5 FILES 1.12.2 rhdf5::h5version() I already tried to look for a more recent rhdf5 package that supports HDF5 FILES 1.12.2 with no success. How did you solve this issue? |
Hi, I am having the same issue to this day - I was wondering if anyone reached a concensus on the best way to tackle this issue? I am getting the following message: so <- sleuth_prep(kal_dirs_fixed, extra_bootstrap_summary = TRUE) reading in kallisto results sessionInfo() Matrix products: default locale: time zone: Europe/London attached base packages: other attached packages: loaded via a namespace (and not attached): Please let me know if you are able to help solve this Best, |
Some users have reported having issues reading the H5 files.
Here is the error:
I would like to track this down so if you are having this issue please respond with the following:
gcc --version
And any other information you think might be informative.
Thanks,
Harold
The text was updated successfully, but these errors were encountered: