Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate inline headers #283

Open
wants to merge 11 commits into
base: master
from
Open

Generate inline headers #283

wants to merge 11 commits into from

Conversation

@shibatch
Copy link
Owner

shibatch commented Feb 21, 2020

This patch implements the functionality described in issue #282.

The header files for inlining sleef functions are generated if -DBUILD_INLINE_HEADERS=TRUE is specified as a cmake option.

In order to use one of these headers, the following 3 macros have to be defined.

  • SLEEF_ALWAYS_INLINE : This is the function attribute specified for helper functions
  • SLEEF_INLINE : This is the function attribute specified for SLEEF functions.
  • SLEEF_CONST : This is usually const attribute.

Each header file corresponds to one vector extension. Only one of the header files can be included from the same source file.

It also builds libsleefinline.a, that is referred from the inlined functions.

Aarch32 is not supported at this time.

--

The main change to the source codes can be summarized into the following two points.

  1. src/libm/CMakeLists.txt : This patch adds instructions for building the header files. It applies C preprocessor and sed command multiple times to generate the header files.
  2. Addition to helper files : This patch adds lines beginning with //@ which is specially treated during generation of the header files.

It seems that parallel build does not work even on linux computers. According to the following page, parallel build is always unsafe if multiple COMMANDs are used in add_custom_command. Accordingly, I changed the CI settings to remove parallel builds.

https://stackoverflow.com/questions/60105230/how-to-run-cmake-commands-in-add-custom-command-in-order

Currently, Jenkins servers are not available. I hope they will be available again before this patch is approved. I manually ran the builds and tests on every platform.

@colesbury Github does not allow me to add you as a reviewer, but please tell us your thoughts.

shibatch added 5 commits Feb 21, 2020
shibatch
shibatch
shibatch
shibatch
shibatch
@colesbury

This comment has been minimized.

Copy link
Contributor

colesbury commented Feb 24, 2020

Thanks, I will test this out today

@colesbury

This comment has been minimized.

Copy link
Contributor

colesbury commented Feb 24, 2020

Thanks, this addresses the request in #230

In my limited testing, the functions are now successfully inlined at call sites. In our sigmoid function, I measured a ~14% improvement for 1 million floats by inlining the call to Sleef_expf8_u10. I haven't done measurements of other functions.

Currently, we include Sleef as a submodule and build it as part of the PyTorch build process. The changes to the build make this more difficult. I think we may switch to building Sleef separately and just committing the artifacts (e.g. sleefinline_avx2.h) to the PyTorch repo. I think that should be fine.

@shibatch

This comment has been minimized.

Copy link
Owner Author

shibatch commented Feb 24, 2020

Good.
Please credit SLEEF if the generated source code will be mixed into the source tree of PyTorch.

@@ -15,10 +15,10 @@ pipeline {
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=../install -DSLEEF_SHOW_CONFIG=1 -DENFORCE_TESTER3=TRUE -DBUILD_QUAD=TRUE ..
make -j 6 all
make -j 1 all

This comment has been minimized.

Copy link
@fpetrogalli

fpetrogalli Feb 25, 2020

Collaborator

Why are changes like this needed in this PR?

This comment has been minimized.

Copy link
@shibatch

shibatch Feb 25, 2020

Author Owner

Because add_custom_command does not support parallel builds. In this patch, many COMMANDS are executed within one add_custom_command, and parallel builds fail even on linux computers.

https://stackoverflow.com/questions/60105230/how-to-run-cmake-commands-in-add-custom-command-in-order


#define ENABLE_DP
//@#define ENABLE_DP

This comment has been minimized.

Copy link
@fpetrogalli

fpetrogalli Feb 25, 2020

Collaborator

What is this syntax for? I have seen it in other places, so I suspect there is a reason for using it?

This comment has been minimized.

Copy link
@shibatch

shibatch Feb 25, 2020

Author Owner

Please read my first comment. Generation of the header files requires multiple processing with cpp, and some of the macros are required in later use.

Addition to helper files : This patch adds lines beginning with //@ which is specially treated during generation of the header files.

@@ -649,6 +664,7 @@ static INLINE vargquad vcast_aq_vm2(vmask2 vm2) {
return a;
#endif
}
#endif

This comment has been minimized.

Copy link
@fpetrogalli

fpetrogalli Feb 25, 2020

Collaborator
Suggested change
#endif
#endif // #if !defined(SLEEF_GENHEADER)

This comment has been minimized.

Copy link
@shibatch

shibatch Feb 26, 2020

Author Owner

Done.


//

#ifdef ENABLE_DP
int check_featureDP() {
if (vavailability_i(1) == 0) return 0;
int check_featureDP(double d) {

This comment has been minimized.

Copy link
@fpetrogalli

fpetrogalli Feb 25, 2020

Collaborator

Is this needed for the header functionality? Same question for the float version below.

This comment has been minimized.

Copy link
@shibatch

shibatch Feb 25, 2020

Author Owner

With the headers, functions like this have greater tendency of being optimized away.
This change is needed to prevent optimizer from removing this function.

OUTPUT ${INLINE_HEADER_FILE}

COMMAND echo Generating sleefinline_${SIMDLC}.h
COMMAND "${CMAKE_C_COMPILER}" ${FLAG_PREPROCESS} ${FLAG_PRESERVE_COMMENTS} ${FLAG_INCLUDE}${PROJECT_SOURCE_DIR}/src/common ${FLAG_INCLUDE}${PROJECT_SOURCE_DIR}/src/arch ${FLAG_INCLUDE}${CMAKE_CURRENT_BINARY_DIR}/include/ ${FLAG_DEFINE}SLEEF_GENHEADER ${FLAG_DEFINE}ENABLE_${SIMD} ${FLAG_DEFINE}DORENAME ${CMAKE_CURRENT_SOURCE_DIR}/sleefsimddp.c > ${CMAKE_CURRENT_BINARY_DIR}/sleef${SIMD}.h.tmp1

This comment has been minimized.

Copy link
@fpetrogalli

fpetrogalli Feb 25, 2020

Collaborator

Please break long lines. It is hard to understand what is going on here. Please try not to pass the 80 chars limit.

This comment has been minimized.

Copy link
@shibatch

shibatch Feb 26, 2020

Author Owner

I added comments in this section.

#if !defined(SLEEF_GENHEADER)
#define FUNCATR NOEXPORT ALIGNED(64)
#else
#define FUNCATR EXPORT ALIGNED(64)
#endif
Comment on lines +8 to +12

This comment has been minimized.

Copy link
@fpetrogalli

fpetrogalli Feb 25, 2020

Collaborator

Why is this needed?

This comment has been minimized.

Copy link
@shibatch

shibatch Feb 25, 2020

Author Owner

This table is referenced from the source code including the generated header.
So, this table has to be compiled into a library.

#if !defined(SLEEF_GENHEADER)
#include "helpersse2.h"
#else
#include "macroonlySSE2.h"
#endif
Comment on lines +27 to +31

This comment has been minimized.

Copy link
@fpetrogalli

fpetrogalli Feb 25, 2020

Collaborator

Do we need to header files? Couldn't we conditionally compile the same header file, exposing only the macros when SLEEF_GENHEADER is defined? This could avoid the need of generating the macroonly* files.

This comment has been minimized.

Copy link
@shibatch

shibatch Feb 26, 2020

Author Owner

Instead of only exposing some macros, I chose the approach to define some of the macros beginning with //@.

Below is comparison between pros and cons.
Approach 1: Exposing a part of the macros according to whether SLEEF_GENHEADER is defined.
Pros: The mechanism is straightforward.
Cons: We need to protect most of the part in the macro with large #ifdef. This is hard to read.
Generation of macros is not very understandable anyway.

Approach 2: Define some of the macros beginning with //@
Pros: Easier to read and manage the source code.
Cons: This is a dedicated way for SLEEF. and requires understanding how the macros are generated.

@@ -2,4 +2,4 @@
set -ev
mkdir sleef.build
cd sleef.build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DSLEEF_SHOW_CONFIG=1 -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl -DENFORCE_TESTER3=TRUE -DBUILD_QUAD=TRUE ..
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DSLEEF_SHOW_CONFIG=1 -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl -DENFORCE_TESTER3=TRUE -DBUILD_QUAD=TRUE -DBUILD_INLINE_HEADERS=TRUE ..

This comment has been minimized.

Copy link
@fpetrogalli

fpetrogalli Feb 25, 2020

Collaborator

Do I understand correctly that you are always generating the header files because you want to test them on each architecture with the iuti* executables? Has this increased the running time in CI?

This comment has been minimized.

Copy link
@shibatch

shibatch Feb 26, 2020

Author Owner

Yes, running time in CI is increased.
If that is a problem, we can just turn off generation.

@fpetrogalli

This comment has been minimized.

Copy link
Collaborator

fpetrogalli commented Feb 25, 2020

How comes that CI hasn't completed yet?

@shibatch

This comment has been minimized.

Copy link
Owner Author

shibatch commented Feb 25, 2020

Because of a policy change in network security at my institute, I now have to move the CI servers to a different network segment.
I had a meeting with the people in charge of network management, and I was told to change the network settings of the servers.
However, they did not know what CI servers are, and I was asked to have a meeting with them again.
In the second meeting, I told them how the servers work, and they are now saying that additional security measures are required.

shibatch and others added 6 commits Feb 26, 2020
shibatch
shibatch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.