Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

Conversation

@wenleix
Copy link
Contributor

@wenleix wenleix commented Aug 2, 2022

Summary:
dtypes.py contains of two parts:
(1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct)
(2) Utility functions around DType

The DType definition is quite standalone and stable now; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA

In theory we could let dtypes.py only contain DType definition, and move other things into dtype_util.py. Starting with this first step.

Differential Revision: D38358389

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Aug 2, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38358389

wenleix added a commit to wenleix/torcharrow that referenced this pull request Aug 2, 2022
Summary:
Pull Request resolved: pytorch#468

dtypes.py contains of two parts:
(1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct)
(2) Utility functions around DType

The DType definition is quite standalone and stable; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA

In theory we could let dtypes.py only contain DType definition, and move other things into `dtype_util.py`. Starting with this first step.

Differential Revision: D38358389

fbshipit-source-id: d855fcd00ad9ff2dc018832038bf79a75d698d47
@wenleix wenleix force-pushed the export-D38358389-to-fbsync branch from 19b8cc3 to 0583471 Compare August 2, 2022 23:49
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38358389


@dataclass(frozen=True)
class List(DType):
item_dtype: DType
Copy link
Contributor Author

@wenleix wenleix Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we use item_dtype and fixed_size while in PyArrow it's called value_type and list_size: https://arrow.apache.org/docs/python/generated/pyarrow.list_.html

Doesn't feel strong about fixed_size vs. list_size . I guess item can be renamed to value in future diffs :). (since ListColumn also contains values and offsets.

Copy link
Contributor Author

@wenleix wenleix Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dataclass(frozen=True)
class Map(DType):
key_dtype: DType
item_dtype: DType
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this key/item is coming from PyArrow's Map Type: https://arrow.apache.org/docs/python/generated/pyarrow.map_.html

But in many columnar Database (Presto/Velox) calls it key/value instead. key/value also seems to align with Pythonic API (in Python Dict, item means the (key, value) pair).

Velox: https://github.com/facebookincubator/velox/blob/2148956b208132e4e37684392d28651e5ef9ff07/velox/vector/ComplexVector.h#L401-L402

Summary:
Pull Request resolved: pytorch#468

dtypes.py contains of two parts:
(1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct)
(2) Utility functions around DType

The DType definition is quite standalone and stable; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA

In theory we could let dtypes.py only contain DType definition, and move other things into `dtype_util.py`. Starting with this first step.

Reviewed By: dracifer

Differential Revision: D38358389

fbshipit-source-id: 243e93bece169c91ccdd6a8ea3d6ac4d087b078f
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38358389

@wenleix wenleix force-pushed the export-D38358389-to-fbsync branch from 0583471 to 41f6f3e Compare August 6, 2022 00:36
facebook-github-bot pushed a commit that referenced this pull request Aug 6, 2022
Summary:
Pull Request resolved: #468

dtypes.py contains of two parts:
(1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct)
(2) Utility functions around DType

The DType definition is quite standalone and stable; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA

In theory we could let dtypes.py only contain DType definition, and move other things into `dtype_util.py`. Starting with this first step.

Reviewed By: dracifer

Differential Revision: D38358389

fbshipit-source-id: 632037498290222a60794e26d0db0c98c73f1391
wenleix added a commit that referenced this pull request Aug 6, 2022
Summary:
Pull Request resolved: #468

dtypes.py contains of two parts:
(1) Standard Arrow-compatible DataType definition (Int8/16/32/64, Float32/64, String, List, Map, Struct)
(2) Utility functions around DType

The DType definition is quite standalone and stable; Refactor DataType definition into seperate file so allows reuse in next gen TorchArrow, such as TorchArrow-UPM or Tensor-based TA

In theory we could let dtypes.py only contain DType definition, and move other things into `dtype_util.py`. Starting with this first step.

Reviewed By: dracifer

Differential Revision: D38358389

fbshipit-source-id: 632037498290222a60794e26d0db0c98c73f1391
@wenleix wenleix closed this in a9c7452 Aug 6, 2022
@wenleix wenleix deleted the export-D38358389-to-fbsync branch August 6, 2022 06:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants