Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR follows up on discussion in #1900 to have an explicit set of basic dtypes for datasets.
This moves away from str(pyarrow.DataType) as the method of choice for creating dtypes, favoring an explicit mapping to a list of supported Value dtypes.
I believe in practice this should be backward compatible, since anyone previously using Value() would only have been able to use dtypes that had an identically named pyarrow factory function, which are all explicitly supported here, with
float32
andfloat64
acting as the official datasets dtypes, which resolves the tension betweendouble
being the pyarrow dtype andfloat64
being the pyarrow type factory function.