Skip to content

Conversation

jamesr66a
Copy link
Collaborator

@jamesr66a jamesr66a commented Jul 10, 2019

Stack from ghstack:

This significantly reduces the overhead of operator dispatch. Benchmark:

import torch, time

NITER = 10000000

def foo(x):
    for i in range(10000000):
        x = torch.neg(x)
    return x

with torch.no_grad():
    scripted = torch.jit.script(foo)
    x = torch.rand([])

    s = time.time()
    scripted(x)
    print('script time', (time.time() - s) / NITER / 1e-6, 'us')

Before

script time 1.3357892751693727 us

After

script time 1.129352855682373 us

TODO: I don't know if this is valid if the Storage is shared

Differential Revision: D16192220

@pytorchbot pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general oncall: distributed Add this issue/PR to distributed oncall triage queue module: internals Related to internal abstractions in c10 and ATen labels Jul 10, 2019
@jamesr66a jamesr66a requested review from zdevito and zheng-xq July 10, 2019 23:53
data_impl->set_allow_tensor_metadata_change(allow_tensor_metadata_change);
data_impl->set_autograd_meta(c10::guts::make_unique<Variable::AutogradMeta>(data_impl.get(), requires_grad));
return Variable(std::move(data_impl));
if (data.getIntrusivePtr().use_count() == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this unsafe in a multithreaded environment? Isn't it possible to have two threads invoking make_variable(tensor) concurrently on the same tensor instance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't two threads have to move() the same instance into the call to make_variable? Or is there another sequencing you're thinking of?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or more directly: i believe this is safe because data is taken by value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, you're right - taking by value makes this safe. The interleaving I was thinking of two threads invoking make_variable(t); one threads passes the use_count check tries to move, and the other thread fails the use_count check and tries to copy from the moved-from value, but that can't happen since neither could pass the use_count check.

…ence"

make_variable consumes the Tensor if it only has one reference

gh-metadata: pytorch pytorch 22705 gh/jamesr66a/23/head
Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NGNT

James Reed added 3 commits July 12, 2019 13:26
…ence"

make_variable consumes the Tensor if it only has one reference

gh-metadata: pytorch pytorch 22705 gh/jamesr66a/23/head
…ence"

make_variable consumes the Tensor if it only has one reference

gh-metadata: pytorch pytorch 22705 gh/jamesr66a/23/head
…ence"

Update on "make_variable consumes the Tensor if it only has one reference"

make_variable consumes the Tensor if it only has one reference

gh-metadata: pytorch pytorch 22705 gh/jamesr66a/23/head
…ence"

make_variable consumes the Tensor if it only has one reference

gh-metadata: pytorch pytorch 22705 gh/jamesr66a/23/head
@facebook-github-bot
Copy link
Contributor

@jamesr66a merged this pull request in 815e73b.

@facebook-github-bot facebook-github-bot deleted the gh/jamesr66a/23/head branch October 28, 2019 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants