Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Read h5 file after starsolo+cellbender #99

Closed
ayyildizd opened this issue Apr 18, 2023 · 13 comments
Closed

Cannot Read h5 file after starsolo+cellbender #99

ayyildizd opened this issue Apr 18, 2023 · 13 comments
Assignees

Comments

@ayyildizd
Copy link

Thanks for developing this nice tool.

I used STAR solo for preprocessing the samples and then CellBender using the STAR solo output folder containing (genes.tsv, matrix.mtx and barcodes.tsv) as an input. I am not able to read h5 files created in this way using scCustomize's funtions (both Read_CellBender_h5_Mat and Read_CellBender_h5_Multi_Directory). On the other hand, seurat function Read10X_h5() works.

The error I get is:

Error in x$exists(name) : HDF5-API Errors:
    error #000: ../../src/hdf5-1.12.2/src/H5L.c in H5Lexists(): line 943: unable to get link info
        class: HDF5
        major: Links
        minor: Can't get value

    error #001: ../../src/hdf5-1.12.2/src/H5VLcallback.c in H5VL_link_specific(): line 5173: unable to execute link specific callback
        class: HDF5
        major: Virtual Object Layer
        minor: Can't operate on object

    error #002: ../../src/hdf5-1.12.2/src/H5VLcallback.c in H5VL__link_specific(): line 5136: unable to execute link specific callback
        class: HDF5
        major: Virtual Object Layer
        minor: Can't operate on object

    error #003: ../../src/hdf5-1.12.2/src/H5VLnative_link.c in H5VL__native_link_specific(): line 329: unable to specific link info
        class: HDF5
        major: Links
        minor: Object not found

    error #004: ../../src/hdf5-1.12.2/src/H5L.c in H5L__exists(): line 3082: path doesn't exist
        class: HDF5
        major: Links
@samuel-marsh
Copy link
Owner

Hi,

Thanks for kind words on package! Can you provide me a bit more information about your environment by pasting output of sessionInfo() here?

Also could you let me know which version of CellBender you used?

Thanks!
Sam

@samuel-marsh samuel-marsh self-assigned this Apr 18, 2023
@ayyildizd
Copy link
Author

Thanks for quick response!

Here is the sessioninfo():

R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.0

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] qs_0.25.5               scCustomize_1.1.1       Seurat_4.9.9.9041       SeuratObject_4.9.9.9084 sp_1.6-0               
 [6] viridis_0.6.2           viridisLite_0.4.1       patchwork_1.1.2         lubridate_1.9.2         forcats_1.0.0          
[11] stringr_1.5.0           dplyr_1.1.1             purrr_1.0.1             readr_2.1.4             tidyr_1.3.0            
[16] tibble_3.2.1            ggplot2_3.4.2           tidyverse_2.0.0        

loaded via a namespace (and not attached):
  [1] spam_2.9-1             circlize_0.4.15        plyr_1.8.8             igraph_1.4.2           lazyeval_0.2.2        
  [6] splines_4.2.2          RcppHNSW_0.4.1         RApiSerialize_0.1.2    listenv_0.9.0          scattermore_0.8       
 [11] digest_0.6.31          htmltools_0.5.5        fansi_1.0.4            magrittr_2.0.3         tensor_1.5            
 [16] paletteer_1.5.0        cluster_2.1.4          ROCR_1.0-11            tzdb_0.3.0             globals_0.16.2        
 [21] RcppParallel_5.1.7     matrixStats_0.63.0     timechange_0.2.0       spatstat.sparse_3.0-1  colorspace_2.1-0      
 [26] ggrepel_0.9.3          xfun_0.38              jsonlite_1.8.4         progressr_0.13.0       spatstat.data_3.0-1   
 [31] stringfish_0.15.7      survival_3.5-5         zoo_1.8-12             glue_1.6.2             polyclip_1.10-4       
 [36] gtable_0.3.3           leiden_0.4.3           future.apply_1.10.0    shape_1.4.6            abind_1.4-5           
 [41] scales_1.2.1           spatstat.random_3.1-4  miniUI_0.1.1.1         Rcpp_1.0.10            xtable_1.8-4          
 [46] reticulate_1.28        bit_4.0.5              dotCall64_1.0-2        htmlwidgets_1.6.2      httr_1.4.5            
 [51] RColorBrewer_1.1-3     ellipsis_0.3.2         ica_1.0-3              pkgconfig_2.0.3        uwot_0.1.14           
 [56] deldir_1.0-6           utf8_1.2.3             janitor_2.2.0          tidyselect_1.2.0       rlang_1.1.0           
 [61] reshape2_1.4.4         later_1.3.0            munsell_0.5.0          tools_4.2.2            cli_3.6.1             
 [66] ggprism_1.0.4          generics_0.1.3         ggridges_0.5.4         evaluate_0.20          fastmap_1.1.1         
 [71] yaml_2.3.7             goftest_1.2-3          rematch2_2.1.2         bit64_4.0.5            knitr_1.42            
 [76] fitdistrplus_1.1-8     RANN_2.6.1             pbapply_1.7-0          future_1.32.0          nlme_3.1-162          
 [81] mime_0.12              ggrastr_1.0.1          hdf5r_1.3.8            compiler_4.2.2         rstudioapi_0.14       
 [86] beeswarm_0.4.0         plotly_4.10.1          png_0.1-8              spatstat.utils_3.0-2   stringi_1.7.12        
 [91] RSpectra_0.16-1        lattice_0.20-45        Matrix_1.5-3           vctrs_0.6.1            pillar_1.9.0          
 [96] lifecycle_1.0.3        spatstat.geom_3.1-0    lmtest_0.9-40          GlobalOptions_0.1.2    RcppAnnoy_0.0.20      
[101] data.table_1.14.8      cowplot_1.1.1          irlba_2.3.5.1          httpuv_1.6.9           R6_2.5.1              
[106] promises_1.2.0.1       KernSmooth_2.23-20     gridExtra_2.3          vipor_0.4.5            parallelly_1.35.0     
[111] codetools_0.2-19       fastDummies_1.6.3      MASS_7.3-58.3          withr_2.5.0            sctransform_0.3.5     
[116] parallel_4.2.2         hms_1.1.3              grid_4.2.2             rmarkdown_2.21         snakecase_0.11.0      
[121] Rtsne_0.16             spatstat.explore_3.1-0 shiny_1.7.4            ggbeeswarm_0.7.1      

And my version of CellBender was v0.2.2!

@samuel-marsh
Copy link
Owner

Ok let me look into it. Also not 100% necessary but may help. Is it possible for you to share the H5 file? If so I can send google drive link.

Best,
Sam

@ayyildizd
Copy link
Author

Thanks for help, yes I can share the file :)

@samuel-marsh
Copy link
Owner

Great. Send me email from account I should share link to. samuel.marsh@childrens.harvard.edu.

Thanks!
Sam

@samuel-marsh
Copy link
Owner

Ok got file and got passed initial error and now I'm getting series of H5 errors like yours but different. I;m going to keep looking at it but also wondering if you have tried re-running CellBender and whether issue is the same on new output file?

Best,
Sam

@ayyildizd
Copy link
Author

ayyildizd commented Apr 19, 2023

I did run CellBender not on the same sample, but different samples coming from different datasets. When I tried reading the file with other samples, it gave me the same error. I wonder if it could be related how CellBender writes h5 file coming from STARsolo output folder instead of CellRanger h5 file as an input? Because previously we managed to use your package after Cellranger+Cellbender. Somehow Seurat function Read10X_h5() is able to handle this file but scCustomize function is not.

@samuel-marsh
Copy link
Owner

Ok good to know. I’ll take deeper look into the h5 file and see what I can find. Since Seurat reads it I think it’s probably easy fix (I hope).

Best,
Sam

@samuel-marsh
Copy link
Owner

Ok just an update as I continue to look into this and figure out where the error is happening. Basically my CellBender read function is modified version of Seurat's that really should come into play with CellBender v3+ (currently in alpha) files because they contain slightly different format and my function is meant to enable reading of both whereas Seurat's Read10X_h5 fails with v3+.

I'm working on it.

Basically a CellBender output file when using 10X input looks like this:

              group                           name       otype  dclass         dim
0                 /                         matrix   H5I_GROUP                    
1           /matrix             ambient_expression H5I_DATASET   FLOAT       32285
2           /matrix                       barcodes H5I_DATASET  STRING       10152
3           /matrix  contamination_fraction_params H5I_DATASET   FLOAT           2
4           /matrix                           data H5I_DATASET INTEGER    14122269
5           /matrix                       features   H5I_GROUP                    
6  /matrix/features                   feature_type H5I_DATASET  STRING       32285
7  /matrix/features                         genome H5I_DATASET  STRING       32285
8  /matrix/features                             id H5I_DATASET  STRING       32285
9  /matrix/features                           name H5I_DATASET  STRING       32285
10          /matrix fraction_data_used_for_testing H5I_DATASET   FLOAT       ( 0 )
11          /matrix                        indices H5I_DATASET INTEGER    14122269
12          /matrix                         indptr H5I_DATASET INTEGER       10153
13          /matrix              lambda_multiplier H5I_DATASET   FLOAT       ( 0 )
14          /matrix           latent_RT_efficiency H5I_DATASET   FLOAT       10152
15          /matrix        latent_cell_probability H5I_DATASET   FLOAT       10152
16          /matrix           latent_gene_encoding H5I_DATASET   FLOAT 100 x 10152
17          /matrix                   latent_scale H5I_DATASET   FLOAT       10152
18          /matrix  overdispersion_mean_and_scale H5I_DATASET   FLOAT           2
19          /matrix                          shape H5I_DATASET INTEGER           2
20          /matrix     target_false_positive_rate H5I_DATASET   FLOAT       ( 0 )
21          /matrix                      test_elbo H5I_DATASET   FLOAT          60
22          /matrix                     test_epoch H5I_DATASET INTEGER          60
23          /matrix        training_elbo_per_epoch H5I_DATASET   FLOAT         300

Whereas yours looks like this:

                 group                           name       otype  dclass        dim
0                    /             background_removed   H5I_GROUP                   
1  /background_removed             ambient_expression H5I_DATASET   FLOAT      36601
2  /background_removed                       barcodes H5I_DATASET  STRING       8595
3  /background_removed  contamination_fraction_params H5I_DATASET   FLOAT          2
4  /background_removed                           data H5I_DATASET INTEGER    7918894
5  /background_removed fraction_data_used_for_testing H5I_DATASET   FLOAT      ( 0 )
6  /background_removed                     gene_names H5I_DATASET  STRING      36601
7  /background_removed                          genes H5I_DATASET  STRING      36601
8  /background_removed                        indices H5I_DATASET INTEGER    7918894
9  /background_removed                         indptr H5I_DATASET INTEGER       8596
10 /background_removed              lambda_multiplier H5I_DATASET   FLOAT      ( 0 )
11 /background_removed           latent_RT_efficiency H5I_DATASET   FLOAT       8595
12 /background_removed        latent_cell_probability H5I_DATASET   FLOAT       8595
13 /background_removed           latent_gene_encoding H5I_DATASET   FLOAT 100 x 8595
14 /background_removed                   latent_scale H5I_DATASET   FLOAT       8595
15 /background_removed  overdispersion_mean_and_scale H5I_DATASET   FLOAT          2
16 /background_removed                          shape H5I_DATASET INTEGER          2
17 /background_removed     target_false_positive_rate H5I_DATASET   FLOAT      ( 0 )
18 /background_removed                      test_elbo H5I_DATASET   FLOAT         30
19 /background_removed                     test_epoch H5I_DATASET INTEGER         30
20 /background_removed        training_elbo_per_epoch H5I_DATASET   FLOAT        150

As you can see all of the data is there but it's just arranged/named differently. I'll work on internal test in the function to support this.

I hope to have fix for you by later this week/early next week.

Best,
Sam

@samuel-marsh
Copy link
Owner

Hi @ayyildizd,

So sorry for the delay without any updates! Happy to report though that full functionality is now part of scCustomize in develop branch (v1.1.1.9010).

Basically needed to add optional parameters to support preV3 CellRanger or STARsolo outputs. I tested with H5 file that you sent and it worked. For your files it was just an error that features were named "genes" instead of the newer "features". So just need to use new optional parameter feature_slot_name:

starsolo_mat <- Read_CellBender_h5_Mat(file_name = "~/Downloads/SRR16194335_cellbender_output_filtered.h5", feature_slot_name = "genes")

feature_slot_name has also been added toRead_CellBender_h5_Multi_Directory and Read_CellBender_h5_Multi_File to support multi-file reading.

If you run into any issues using the function let me know here and I'll reopen the issue.
Sorry again for delay!

Best,
Sam

@samuel-marsh
Copy link
Owner

I should also mention I added an error check in the function as well so that it throws interpretable error if something goes wrong instead of the h5 error which is hard to interpret.

Best,
Sam

@ayyildizd
Copy link
Author

Thank you very much!!

@samuel-marsh
Copy link
Owner

Happy to help! Thanks for submitting the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants