Add additional features to support SDF to Zarr conversion #77
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changelogs
path/to/data.zarr/col#0:10).SDFConverterclass to convert a SDF object to a Polaris Dataset.DatasetFactoryas a unified system for creating datasets.create_dataset_from_fileutility method.Adapters to convert datapoints during data loading. These can be saved alongside a dataset.Checklist:
Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.feature,fixortest(or ask a maintainer to do it for you).This PR started to add support for the MARCEL benchmark.
There were three critical changes to support MARCEL:
path/to/data.zarr/col#0:10)Adapter, which during data loading adapts the content saved in the Zarr file to a desired format (in this case, from a bytestring to a RDKit Mol object).In addition, I long wanted to start on a system to make it easier to create dataset. I'm quite happy with this as a first version. It's a little crude still, but think it will be easy to extend as we work to make adding datasets more and more easy.
I added a tutorial and docstrings, but here is a short snippet: