Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move addcmul to Aten #22874

Closed
wants to merge 7 commits into from
Closed

Conversation

VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin commented Jul 15, 2019

Move CPU implementation of the addcmul operator to Aten ( #22797 )

before

In [11]: timeit x.addcmul(a, b)                                                                                                                           
1.31 ms ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

after

In [9]: timeit x.addcmul(a, b)                                                                                                                  
588 µs ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Adding custom code for the case when value == 1, doesn't provide significant performance gain.

@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: internals Related to internal abstractions in c10 and ATen module: operators labels Jul 15, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ifedan ifedan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have tests?

@VitalyFedyunin
Copy link
Contributor Author

@pytorchbot retest this please

@VitalyFedyunin
Copy link
Contributor Author

Do you have tests?

Already tested by common_methods_invocations.py

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@gchanan gchanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious -- wouldn't it only be a few more lines to move the CUDA implementation as well?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor Author

I'm curious -- wouldn't it only be a few more lines to move the CUDA implementation as well?

Only to show how to separate migration, and have this PR as example of it.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 3b5daef.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Aug 1, 2019
Summary:
Move CPU implementation of the `addcmul` operator to Aten ( pytorch/pytorch#22797 )

### before

```python
In [11]: timeit x.addcmul(a, b)
1.31 ms ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

### after

```python
In [9]: timeit x.addcmul(a, b)
588 µs ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

Adding custom code for the case when `value == 1`, doesn't provide significant performance gain.
Pull Request resolved: pytorch/pytorch#22874

Differential Revision: D16359348

Pulled By: VitalyFedyunin

fbshipit-source-id: 941ead835672fca78a1fcc762da052e64308b111
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: internals Related to internal abstractions in c10 and ATen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants