Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-0010: TensorRef #16

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

RFC-0010: TensorRef #16

wants to merge 3 commits into from

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Mar 3, 2021

This proposal introduces a new class TensorRef, to replace all
places in our codebase where we currently use const Tensor&. The
distinguishing characteristics of this class are:

  1. It is non-owning
  2. It is as safe as other by-value reference types (like c10::ArrayRef or
    std::string_view)
  3. It is implicitly convertible (with some exceptions) to const Tensor&
    (i.e., it can be introduced incrementally)

Rendered

Signed-off-by: Edward Z. Yang ezyang@fb.com

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
we unsafely use reclaim/release to ensure that no refcount bump occurs
on construction/destruction of the object. We verified with Godbolt
that the compiler is able to eliminate the `base_` destructor. This
makes it possible to take out a `const Tensor&`, which makes it easier
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as the Tensor is purely internal to TensorRef, this could be safe since we have control over the invariants and where refcounting occurs. But allowing users to get a const Tensor& out of it means that the "internal" Tensor object could be copied outside of your control and you have to be much more careful about correctness, especially since Tensor doesn't seem to know if it's borrowed or not. I saw one of your earlier proposals had Tensor know about this, maybe we should do that?

Or, if the const Tensor& conversion is only there to avoid having to rewrite all ops at once, maybe there's a regex/codemod that would allow us to rewrite them more easily and then not allow TensorRef -> const Tensor& conversions? Or, if we manage to eliminate most but not all call sites, we might be able to make it a TensorRef -> Tensor (by value) conversion and be ok with the refcount if somebody does that. Seems like a less dangerous design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But allowing users to get a const Tensor& out of it means that the "internal" Tensor object could be copied outside of your control and you have to be much more careful about correctness

A copy of const Tensor& to Tensor is OK, because this always induces a refcount bump! Most of the bad situations @swolchok was able to come with required you to have a mutable reference/pointer to the internal Tensor, which we just forbid here.

Or, if the const Tensor& conversion is only there to avoid having to rewrite all ops at once, maybe there's a regex/codemod that would allow us to rewrite them more easily and then not allow TensorRef -> const Tensor& conversions?

Yeah. I allude to this in the second alternative proposal, where for Intel ABI reasons, it is much better if the long term API doesn't permit conversion to const Tensor& (so we can make the class trivial). So ideally we'd get rid of this BC crutch eventually.

* Instead of trying to force the above implementation of `TensorRef` to
be used everywhere, it could instead be a utility class used in
limited situations to improve interoperability with code that expects
a `const Tensor&` when you don't have a `Tensor` available. The true,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow why there are two objects - a trivial and a nontrivial one. Can you give some more details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nontrivial TensorRef class is given in this proposal. The trivial class is:

class TrivialTensorRef {
  TensorImpl* impl_;
};

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see. I like the trivial one much better from a design point of view ;) But I guess you agree on that point and the non-trivial one only exists for the bc reasons discussed above.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@ezyang
Copy link
Contributor Author

ezyang commented Apr 10, 2021

A variation of this has been implemented in pytorch/pytorch#55685

@swolchok
Copy link

A variation of this has been implemented in pytorch/pytorch#55685

As with the TensorRef in the proposal, MaybeOwned is hobbled by the Itanium ABI requirement to pass it by reference always, so it doesn't solve the argument passing problem either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants