Skip to content

Conversation

@cwognum
Copy link
Collaborator

@cwognum cwognum commented Dec 4, 2023

Changelogs

Metadata for benchmarks

  • Fixes Additional attributes for Benchmark  #63
    • Adds the task type (i.e. multi task or single task), computed automatically.
    • Adds the target type (i.e. regression or classification), can be manually specified, but if not lib tries to infer automatically using sklearn's type_of_target().
    • Adds the train set size, computed automatically.
    • Adds the test set(s) size(s), computed automatically.
    • Adds the number of classes for classification tasks, computed automatically.
    • Adds the number of test sets

Metadata for datasets

  • Fixes Feat: Add precomputed fields #18
    • Adds the dtype to the column annotations, computed automatically.
    • Adds the no. datapoints and no. columns, computed automatically.
    • Adds a n optional field with a reference to the curation.

Misc

  • Add additional syntax to the dataset to make interacting with it easier, similar to pandas.DataFrame. E.g. dataset[row, col] or dataset[:, "smiles"].

Checklist:

  • Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
  • Add tests to cover the fixed bug(s) or the newly introduced feature(s) (if appropriate).
  • Update the API documentation if a new function is added, or an existing one is deleted.
  • Write concise and explanatory changelogs above.
  • If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

@cwognum cwognum added the feature Annotates any PR that adds new features; Used in the release process label Dec 4, 2023
@cwognum cwognum requested a review from hadim as a code owner December 4, 2023 19:13
@cwognum cwognum requested a review from zhu0619 December 4, 2023 19:13
@cwognum
Copy link
Collaborator Author

cwognum commented Dec 4, 2023

Worth noting that we will also have the size of the dataset (in bytes), but I will save this on the Hub side since the Hub already receives this information anyways.

@cwognum cwognum changed the title Add additional meta-data to the columns, dataset and benchmark Add additional metadata to the columns, dataset and benchmark Dec 4, 2023
@cwognum
Copy link
Collaborator Author

cwognum commented Dec 4, 2023

Quick! Change from meta-data to metadata before @jstlaurent and @hadim spot it! 👀

Copy link
Contributor

@zhu0619 zhu0619 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Cas.
It looks good to me.
I have only one minor comment.

@cwognum cwognum merged commit e1b33ef into main Dec 7, 2023
@cwognum cwognum deleted the feat/more-meta-data branch December 7, 2023 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Annotates any PR that adds new features; Used in the release process

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Additional attributes for Benchmark Feat: Add precomputed fields

4 participants