Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch from rhdf5 to hdf5r #169

Merged
merged 49 commits into from
Jul 18, 2024
Merged

switch from rhdf5 to hdf5r #169

merged 49 commits into from
Jul 18, 2024

Conversation

rcannood
Copy link
Collaborator

@rcannood rcannood commented Jun 27, 2024

Continuation of #166 but from an internal branch.

After some minor improvements, this implementation seems to be passing a few more tests in comparison to the rhdf5 implementation in main.

lazappi and others added 30 commits December 4, 2023 17:08
…o into write-h5ad-categoricals

* 'write-h5ad-categoricals' of github.com:scverse/scverseio:
  fix styling
  Update write_h5ad_categorical
Replace repeated code in individual writers
@rcannood rcannood mentioned this pull request Jun 27, 2024
rcannood added a commit that referenced this pull request Jul 4, 2024
rcannood added a commit that referenced this pull request Jul 8, 2024
* port rownames-related changes from #166 and #169

* run styler

* fix test

* style

* style

* fix docs

* fix documentation

* simplify helper functions

* simplify test

* add more documentation to AnnData

* fix docs
@rcannood rcannood changed the title Pr166 hdf5r rownames rework switch from rhdf5 to hdf5r Jul 11, 2024
@rcannood
Copy link
Collaborator Author

Since more tests are succeeding, I'm merging this branch :)

@rcannood rcannood merged commit 1194af2 into main Jul 18, 2024
7 checks passed
@lazappi
Copy link
Collaborator

lazappi commented Jul 22, 2024

Does this mean we are officially moving from {rhdf5} to {hdf5r} (and from Bioconductor to CRAN)? I thought that wasn't necessary any more after I got everything working with {rhdf5}? Or is there something else I'm missing?

@rcannood
Copy link
Collaborator Author

I did merge the PR because the hdf5r implementation is working better atm, though I don't know which of the two will work best in the long run.

I'd been considering simply creating two backends (an HDF5RAnnData and an RHDF5AnnData, if you will) to see with which framework achieving full anndata compatibility is the easiest. Without a direct comparison, deciding between rhdf5 and hdf5r seems like a judgement call.

hdf5r:

  • Supports HDF5 1.8 - 1.12
  • Native support for boolean types
  • CRAN is easier to submit to and easier to install packages from

rhdf5:

  • HDF5 1.10
  • More frequent commits (that is, better maintained?)

both:

  • We could temporarily create both backends (hdf5r and rhdf5) so we can directly compare how well each framework works
  • Criteria
    • Fewest compatibility issues
    • Read / write speed
    • Which versions of hdf5 are supported
    • How well maintained the project is

wdyt?

@lazappi
Copy link
Collaborator

lazappi commented Jul 23, 2024

I guess if they are both implemented already I wouldn't be entirely opposed to that in a testing phase but I definitely wouldn't release both of them. It does seem like it might be a lot of work though, depending on where things are at.

Unless there is a significant issue with one implementation then I think CRAN vs Bioconductor is probably a bigger question than which HDF5 implementation to use. This probably has more to do with the community we want to interact with/reach than anything technical. We did discuss this when we started the project at the hackathon but maybe we need to do it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants