Skip to content

Conversation

@pbelevich
Copy link
Contributor

No description provided.

@fmassa
Copy link
Member

fmassa commented Jul 2, 2019

We can potentially get better performance tha is to vectorization if you move this code to the cpu folder, see binary ops there for an example

@ifedan ifedan requested a review from VitalyFedyunin July 2, 2019 21:54
@ifedan ifedan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 2, 2019
@pytorchbot pytorchbot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jul 5, 2019
@VitalyFedyunin
Copy link
Contributor

Please include performance comparison and measurements strategy as part of PR description.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbelevich has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@pbelevich
Copy link
Contributor Author

pbelevich commented Jul 6, 2019

data = torch.randn(1000000)
timeit hardshrink(data)

CPU_tensor_apply2: 4.88 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cpu_kernel:         223 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
OMP_NUM_THREADS=1:  3.9 ms ± 192 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cpu_kernel_vec:    40.1 µs ± 7.64 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
OMP_NUM_THREADS=1:  336 µs ± 2.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@pbelevich pbelevich requested a review from VitalyFedyunin July 15, 2019 15:06
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo submodule changes

@pbelevich pbelevich force-pushed the hardshrink_cpu_refactoring branch from b7467c9 to 626a1af Compare July 16, 2019 02:33
@pbelevich
Copy link
Contributor Author

@pytorchbot retest this please

1 similar comment
@pbelevich
Copy link
Contributor Author

@pytorchbot retest this please

@pbelevich pbelevich requested a review from VitalyFedyunin July 16, 2019 16:47
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, you may or may not apply additional changes request before landing.

@pbelevich
Copy link
Contributor Author

@pytorchbot retest this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbelevich is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pbelevich merged this pull request in bcfa023.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 17, 2019
…e::cpu_kernel

Summary: Pull Request resolved: pytorch/pytorch#22459

Differential Revision: D16132625

Pulled By: pbelevich

fbshipit-source-id: d7eb1cd6ed04eba3d0c54feaca1e5ab2836211b5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants