New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SLEEF for libm float and double #6725
Conversation
.gitmodules
Outdated
@@ -77,3 +77,6 @@ | |||
[submodule "third_party/onnx"] | |||
path = third_party/onnx | |||
url = https://github.com/onnx/onnx.git | |||
[submodule "aten/src/ATen/cpu/sleef"] |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
672eef1
to
9113cd6
Compare
79ba742
to
dba6e93
Compare
Timings for newly vectorized functions.
|
@cpuhrsch Could you provide me with the test scripts? I think it's not a bad idea to compare the perf of the version with that of vml. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't reviewed CMake. Mostly LGTM
backends: | ||
- CPU | ||
- CUDA | ||
name: _tan |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
||
Tensor& fill_(Tensor& self, const Tensor& value) { | ||
return self._fill_(value); | ||
} |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_torch.py
Outdated
@@ -396,7 +399,8 @@ def cosh(x): | |||
try: | |||
return math.cosh(x) | |||
except OverflowError: | |||
return float('inf') if x > 0 else float('-inf') | |||
# http://en.cppreference.com/w/cpp/numeric/math/cosh |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@@ -260,6 +260,9 @@ def _test_math(self, torchfn, mathfn, input=None): | |||
input = [] | |||
input.append(list(range(-5, 5))) | |||
input.append([x + 1e-6 for x in range(-5, 5)]) | |||
# Some vectorized implementations don't support large ranges | |||
input.append([x + 1e10 for x in range(-5, 5)]) | |||
input.append([x - 1e10 for x in range(-5, 5)]) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@MlWoo Thank you for offering to look into this. I'm fully convinced Intel MKL VML is faster than this. But it doesn't provide us with short vector implementations. Intel SVML does, but it's specific to icc as far as I know. Also VML doesn't provide us with an implementation of softmax or other more ML specific function approximations as far as I know. So, we'll need to add SLEEF in any case. Also, as far as I know, MKL VML is not open source and comes with it's own threading library. The plan is to integrate VML in any case, since many users to have access to MKL, but I'd prefer to do that in another PR. I'm adding a lot more timings soon to get a wider understanding of the perf here, I'll add benchmark script alongside that. |
@MlWoo I used this script to get these timings. I'll still need to check for regressions for functions I touched without vectorizing them: cosh, sinh, logh, tanh. Except for erf we see an average speedup of 2x, which is skewed by some of the 10x+ speedups. |
This branch is usually a bit slower on average than master (~5%) on these non-vectorized functions, however much faster in some cases, presumably because it knows how to sort strides. Not that all of this is restricted to sinh/cosh/tanh. |
Why did you modify CMakeLists.txt in sleef? |
@shibatch Thanks for looking at this! Sleef is a fantastic library, thank you very much! I went through many changes, because it wouldn't build with the standard CMAKE setup on all our different platforms. I'm happy to remove all of that again and it is straightforward to drop in the original build setup. I'd want to do this as part of a separate PR. I'll send one out with all this removed and tag you in it, if you want. Then we can resolve the issues that the CI shows. |
CC @orionr on cmake stuff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpuhrsch Okay. And I will add VML-style API to the development plan. |
@@ -0,0 +1,71 @@ | |||
IF(MSVC) | |||
option(BUILD_SHARED_LIBS "Build shared libs" ON) | |||
ELSE(MSVC) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
option(SLEEF_SHOW_ERROR_LOG "Show cmake error log." OFF) | ||
|
||
set(SLEEF_VERSION_MAJOR 3) | ||
set(SLEEF_VERSION_MINOR 2) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
endif() | ||
|
||
# Function used to generate safe command arguments for add_custom_command | ||
function(command_arguments PROPNAME) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
I didn't read the rest of the diff, but I'm signing LGTM for the cmake stuff. It would be nice to have some description of what the changes are, so it's easier to upstream the necessary changes (since you don't have to go through a diff with a fine toothed comb to find out what was changed.) Also, so you don't forget if you have to put this aside for a bit :) |
Speeds up vectorized operations on contiguous tensors while preserving the ULP required by libc's math library.
In particular, the "_u10" part of the function means that it has a ULP of 1.0.
This library comes with different valid ranges. For example sinf has a valid range of [-5e+9, 5e+9]. This needs to be accounted for manually (work in progress).
EDIT: It was decided that we'll add a flag to functions where the corresponding vectorized implementation does not cover the entire range of floating pointer values and will disable them by default.