Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Generate inline headers #283
This patch implements the functionality described in issue #282.
The header files for inlining sleef functions are generated if -DBUILD_INLINE_HEADERS=TRUE is specified as a cmake option.
In order to use one of these headers, the following 3 macros have to be defined.
Each header file corresponds to one vector extension. Only one of the header files can be included from the same source file.
It also builds libsleefinline.a, that is referred from the inlined functions.
Aarch32 is not supported at this time.
The main change to the source codes can be summarized into the following two points.
It seems that parallel build does not work even on linux computers. According to the following page, parallel build is always unsafe if multiple COMMANDs are used in add_custom_command. Accordingly, I changed the CI settings to remove parallel builds.
Currently, Jenkins servers are not available. I hope they will be available again before this patch is approved. I manually ran the builds and tests on every platform.
@colesbury Github does not allow me to add you as a reviewer, but please tell us your thoughts.
Thanks, this addresses the request in #230
In my limited testing, the functions are now successfully inlined at call sites. In our sigmoid function, I measured a ~14% improvement for 1 million floats by inlining the call to Sleef_expf8_u10. I haven't done measurements of other functions.
Currently, we include Sleef as a submodule and build it as part of the PyTorch build process. The changes to the build make this more difficult. I think we may switch to building Sleef separately and just committing the artifacts (e.g. sleefinline_avx2.h) to the PyTorch repo. I think that should be fine.
Because of a policy change in network security at my institute, I now have to move the CI servers to a different network segment.