-
Notifications
You must be signed in to change notification settings - Fork 81
Refactor DType definition into dtypes_core.py #468
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D38358389 |
Summary: Pull Request resolved: pytorch#468 dtypes.py contains of two parts: (1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct) (2) Utility functions around DType The DType definition is quite standalone and stable; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA In theory we could let dtypes.py only contain DType definition, and move other things into `dtype_util.py`. Starting with this first step. Differential Revision: D38358389 fbshipit-source-id: d855fcd00ad9ff2dc018832038bf79a75d698d47
19b8cc3 to
0583471
Compare
|
This pull request was exported from Phabricator. Differential Revision: D38358389 |
|
|
||
| @dataclass(frozen=True) | ||
| class List(DType): | ||
| item_dtype: DType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we use item_dtype and fixed_size while in PyArrow it's called value_type and list_size: https://arrow.apache.org/docs/python/generated/pyarrow.list_.html
Doesn't feel strong about fixed_size vs. list_size . I guess item can be renamed to value in future diffs :). (since ListColumn also contains values and offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, element_dtype as per Velox and Trino/Presto:
| @dataclass(frozen=True) | ||
| class Map(DType): | ||
| key_dtype: DType | ||
| item_dtype: DType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this key/item is coming from PyArrow's Map Type: https://arrow.apache.org/docs/python/generated/pyarrow.map_.html
But in many columnar Database (Presto/Velox) calls it key/value instead. key/value also seems to align with Pythonic API (in Python Dict, item means the (key, value) pair).
Summary: Pull Request resolved: pytorch#468 dtypes.py contains of two parts: (1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct) (2) Utility functions around DType The DType definition is quite standalone and stable; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA In theory we could let dtypes.py only contain DType definition, and move other things into `dtype_util.py`. Starting with this first step. Reviewed By: dracifer Differential Revision: D38358389 fbshipit-source-id: 243e93bece169c91ccdd6a8ea3d6ac4d087b078f
|
This pull request was exported from Phabricator. Differential Revision: D38358389 |
0583471 to
41f6f3e
Compare
Summary: Pull Request resolved: #468 dtypes.py contains of two parts: (1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct) (2) Utility functions around DType The DType definition is quite standalone and stable; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA In theory we could let dtypes.py only contain DType definition, and move other things into `dtype_util.py`. Starting with this first step. Reviewed By: dracifer Differential Revision: D38358389 fbshipit-source-id: 632037498290222a60794e26d0db0c98c73f1391
Summary: Pull Request resolved: #468 dtypes.py contains of two parts: (1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct) (2) Utility functions around DType The DType definition is quite standalone and stable; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA In theory we could let dtypes.py only contain DType definition, and move other things into `dtype_util.py`. Starting with this first step. Reviewed By: dracifer Differential Revision: D38358389 fbshipit-source-id: 632037498290222a60794e26d0db0c98c73f1391
Summary:
dtypes.py contains of two parts:
(1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct)
(2) Utility functions around DType
The DType definition is quite standalone and stable now; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA
In theory we could let dtypes.py only contain DType definition, and move other things into
dtype_util.py. Starting with this first step.Differential Revision: D38358389