`asarray`: Add support for NumPy scalars #90914

ysiraichi · 2022-12-15T12:10:14Z

Follow up from: Quansight-Labs/numpy_pytorch_interop#3

This PR adds support for NumPy scalars for torch.asarray.

Before: treats the scalar as an object that implements the buffer protocol. Thus, interprets the data as the default data type (float32)

>>> torch.asarray(numpy.float64(0.5))
tensor([0.0000, 1.7500])

After: identifies the NumPy scalar, and does the "right" thing. i.e. creates a 0-dimensional tensor from the NumPy array that doesn't share its memory

>>> torch.asarray(numpy.float64(0.5))
tensor(0.5000, dtype=torch.float64)

pytorch-bot · 2022-12-15T12:10:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90914

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 91a2bfc:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lezcano

You mentioned that this method already works for numpy scalars if you add the dtype. Wouldn't it be possible (and simpler) to extract the type from the NumPy scalar and, if the dtype was not set, set it to that of the numpy scalar, and if the dtype was set, assert that both the dtypes are the same?

ysiraichi · 2022-12-18T09:58:32Z

I would say that that is another solution to this problem. Here's a comparison between the 2:

Step	Current Implementation	Suggestion
1	NumPy scalar to NumPy array	Check for NumPy scalar + Set the dtype
2	NumPy array to Tensor	Buffer protocol to Tensor
3	Clone the tensor	Slice* + Clone the tensor

* Slicing is needed so we return a 0-dimensional tensor.

Observing the steps above, I believe they are not significantly different (in added logic).
However, I would argue that:

Implementing NumPy scalar support near NumPy array support makes the code easier to understand
- Common NumPy checks are coalesced into a single execution
It keeps the linearity of the prioritized type list (in the documentation)

rgommers

Thanks @ysiraichi. This LGTM, module a small comment on the docs. I don't have a clear preference between the two ways of implementing, they seem pretty similar and both work. Probably easiest to stay with the current logic.

rgommers · 2022-12-18T13:29:25Z

torch/_torch_docs.py

+
+    NumPy scalars also implement the buffer protocol. However, NumPy scalars are
+    prioritized over buffer protocols. In other words, if :attr:`obj` is a NumPy scalar,
+    it will not share memory, and its type will be inferred.


This note is probably too prominent, and most users won't really care about the details here. The paragraph above suffices I think.

rgommers · 2022-12-18T13:30:06Z

torch/_torch_docs.py

 buffer protocol then the buffer is interpreted as an array of bytes grouped according to
 the size of the datatype passed to the :attr:`dtype` keyword argument. (If no datatype is
 passed then the default floating point datatype is used, instead.) The returned tensor
 will have the specified datatype (or default floating point datatype if none is specified)
 and, by default, be on the CPU device and share memory with the buffer.

-When :attr:`obj` is none of the above but a scalar or sequence of scalars then the
+When :attr:`obj` is none of the above but a NumPy scalar, a scalar or a sequence of scalars then the


Instead of "but a NumPy scalar, a scalar or", how about "but a Python or NumPy scalar, or"?

That sounds better, indeed.

lezcano

Fair enough. There are many errors, but they don't seem related? Perhaps rebasing would make them go away?

mruberry · 2022-12-19T18:08:09Z

torch/_torch_docs.py

 buffer protocol then the buffer is interpreted as an array of bytes grouped according to
 the size of the datatype passed to the :attr:`dtype` keyword argument. (If no datatype is
 passed then the default floating point datatype is used, instead.) The returned tensor
 will have the specified datatype (or default floating point datatype if none is specified)
 and, by default, be on the CPU device and share memory with the buffer.

-When :attr:`obj` is none of the above but a scalar or sequence of scalars then the
+When :attr:`obj` is none of the above but a Python or NumPy scalar, or a sequence of scalars then the


This is a little confusing. The current documentation defines the follow priorities:

tensor, NumPy array, DLPack capsule

object implementing Python's buffer protocol

scalar or sequence of scalars

Doesn't this PR want to add NumPy scalar to the first category, taking precedence over objects that implement the buffer protocol? What happens if a sequence of NumPy scalars is given? Also, this paragraph says that the datatype of the returned tensor is inferred from the scalar values, but for a NumPy scalar isn't the returned tensor's dtype mapped from the NumPy scalar's dtype?

Doesn't this PR want to add NumPy scalar to the first category, taking precedence over objects that implement the buffer protocol?

Yes, that's right.

What happens if a sequence of NumPy scalars is given?

They are treated as a Python sequence (i.e. datatype is inferred only if it's not explicitly specified).

Also, this paragraph says that the datatype of the returned tensor is inferred from the scalar values, but for a NumPy scalar isn't the returned tensor's dtype mapped from the NumPy scalar's dtype?

That's correct.
What if we added a new paragraph before this last one, like:

"When obj is a NumPy scalar, the returned tensor will be a 0-dimensional tensor that lives on the CPU device and doesn't share its memory (i.e. copy=True). Its datatype won't change unless otherwise specified."

That addition LGTM, but let's see what Mike has to say.

Seems pretty reasonable. Small tweak suggestion:

"When obj is a NumPy scalar, the returned tensor will be a 0-dimensional tensor on the CPU and that doesn't share its memory (i.e. copy=True). By default datatype will be the PyTorch datatype corresponding to the NumPy's scalar's datatype."

?

lezcano · 2023-01-12T12:08:29Z

PTAL @mruberry

ysiraichi · 2023-01-17T23:21:03Z

@mruberry This is a friendly reminder. Do you have some time to take a look at this PR?

mruberry · 2023-01-18T04:42:28Z

test/test_tensor_creation_ops.py

+
+        tensor = torch.asarray(scalar)
+        self.assertEqual(tensor.dim(), 0)
+        self.assertEqual(tensor.item(), scalar.item())


Shouldn't this also test that tensor.dtype is float64?

Yes! Good catch

mruberry

Cool! See one testing suggestion

ysiraichi · 2023-01-21T07:10:27Z

@pytorchbot rebase

pytorchmergebot · 2023-01-21T07:12:13Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-01-21T07:12:20Z

Successfully rebased numpy-scalar-asarray onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout numpy-scalar-asarray && git pull --rebase)

ysiraichi · 2023-01-23T06:56:05Z

@pytorchbot merge

pytorchmergebot · 2023-01-23T06:58:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-23T07:08:15Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-x86-64-lite-interpreter / build

Details for Dev Infra team

Raised by workflow job

lezcano · 2023-01-24T08:03:54Z

@pytorchbot merge

pytorchmergebot · 2023-01-24T08:09:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchbot added the open source label Dec 15, 2022

ysiraichi changed the title ~~Numpy scalar asarray~~ asarray: Add support for NumPy scalars Dec 16, 2022

ysiraichi requested review from rgommers, mruberry and lezcano December 16, 2022 10:57

lezcano reviewed Dec 16, 2022

View reviewed changes

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 16, 2022

rgommers reviewed Dec 18, 2022

View reviewed changes

lezcano approved these changes Dec 18, 2022

View reviewed changes

mruberry reviewed Dec 19, 2022

View reviewed changes

ysiraichi force-pushed the numpy-scalar-asarray branch from a6549cf to f68e282 Compare January 12, 2023 11:44

mruberry reviewed Jan 18, 2023

View reviewed changes

mruberry approved these changes Jan 18, 2023

View reviewed changes

ysiraichi added 7 commits January 21, 2023 07:12

Add support to NumPy scalars.

794865c

Add test for NumPy scalars.

ba541fa

Updated the asarray documentation.

6865252

Clone the tensor with temporary storage.

3b739b8

Fix documentation.

38fadac

Add new Numpy scalar paragraph to the documentation.

e397cf0

Add dtype check to test.

91a2bfc

pytorchmergebot force-pushed the numpy-scalar-asarray branch from e11d8e9 to 91a2bfc Compare January 21, 2023 07:12

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 23, 2023

pytorchmergebot added the Merged label Jan 24, 2023

pytorchmergebot closed this in 3f64c96 Jan 24, 2023

mruberry mentioned this pull request Mar 27, 2023

Torch 2 regression: torch.asarray(np.array(1)) fail (scalar array) #97021

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`asarray`: Add support for NumPy scalars #90914

`asarray`: Add support for NumPy scalars #90914

ysiraichi commented Dec 15, 2022

pytorch-bot bot commented Dec 15, 2022 •

edited

lezcano left a comment

ysiraichi commented Dec 18, 2022 •

edited

rgommers left a comment

rgommers Dec 18, 2022

rgommers Dec 18, 2022

ysiraichi Dec 18, 2022

lezcano left a comment

mruberry Dec 19, 2022

ysiraichi Jan 10, 2023

lezcano Jan 10, 2023

mruberry Jan 10, 2023

lezcano commented Jan 12, 2023

ysiraichi commented Jan 17, 2023

mruberry Jan 18, 2023

ysiraichi Jan 20, 2023

mruberry left a comment

ysiraichi commented Jan 21, 2023

pytorchmergebot commented Jan 21, 2023

pytorchmergebot commented Jan 21, 2023

ysiraichi commented Jan 23, 2023

pytorchmergebot commented Jan 23, 2023

pytorchmergebot commented Jan 23, 2023

lezcano commented Jan 24, 2023

pytorchmergebot commented Jan 24, 2023

asarray: Add support for NumPy scalars #90914

asarray: Add support for NumPy scalars #90914

Conversation

ysiraichi commented Dec 15, 2022

pytorch-bot bot commented Dec 15, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90914

✅ No Failures

lezcano left a comment

Choose a reason for hiding this comment

ysiraichi commented Dec 18, 2022 • edited

rgommers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lezcano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lezcano commented Jan 12, 2023

ysiraichi commented Jan 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mruberry left a comment

Choose a reason for hiding this comment

ysiraichi commented Jan 21, 2023

pytorchmergebot commented Jan 21, 2023

pytorchmergebot commented Jan 21, 2023

ysiraichi commented Jan 23, 2023

pytorchmergebot commented Jan 23, 2023

Merge started

pytorchmergebot commented Jan 23, 2023

Merge failed

lezcano commented Jan 24, 2023

pytorchmergebot commented Jan 24, 2023

Merge started

`asarray`: Add support for NumPy scalars #90914

`asarray`: Add support for NumPy scalars #90914

pytorch-bot bot commented Dec 15, 2022 •

edited

ysiraichi commented Dec 18, 2022 •

edited