Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate inline headers #283

Merged
merged 24 commits into from
Jul 23, 2020
Merged

Generate inline headers #283

merged 24 commits into from
Jul 23, 2020

Conversation

shibatch
Copy link
Owner

This patch implements the functionality described in issue #282.

The header files for inlining sleef functions are generated if -DBUILD_INLINE_HEADERS=TRUE is specified as a cmake option.

In order to use one of these headers, the following 3 macros have to be defined.

  • SLEEF_ALWAYS_INLINE : This is the function attribute specified for helper functions
  • SLEEF_INLINE : This is the function attribute specified for SLEEF functions.
  • SLEEF_CONST : This is usually const attribute.

Each header file corresponds to one vector extension. Only one of the header files can be included from the same source file.

It also builds libsleefinline.a, that is referred from the inlined functions.

Aarch32 is not supported at this time.

--

The main change to the source codes can be summarized into the following two points.

  1. src/libm/CMakeLists.txt : This patch adds instructions for building the header files. It applies C preprocessor and sed command multiple times to generate the header files.
  2. Addition to helper files : This patch adds lines beginning with //@ which is specially treated during generation of the header files.

It seems that parallel build does not work even on linux computers. According to the following page, parallel build is always unsafe if multiple COMMANDs are used in add_custom_command. Accordingly, I changed the CI settings to remove parallel builds.

https://stackoverflow.com/questions/60105230/how-to-run-cmake-commands-in-add-custom-command-in-order

Currently, Jenkins servers are not available. I hope they will be available again before this patch is approved. I manually ran the builds and tests on every platform.

@colesbury Github does not allow me to add you as a reviewer, but please tell us your thoughts.

@colesbury
Copy link
Contributor

Thanks, I will test this out today

@colesbury
Copy link
Contributor

Thanks, this addresses the request in #230

In my limited testing, the functions are now successfully inlined at call sites. In our sigmoid function, I measured a ~14% improvement for 1 million floats by inlining the call to Sleef_expf8_u10. I haven't done measurements of other functions.

Currently, we include Sleef as a submodule and build it as part of the PyTorch build process. The changes to the build make this more difficult. I think we may switch to building Sleef separately and just committing the artifacts (e.g. sleefinline_avx2.h) to the PyTorch repo. I think that should be fine.

@shibatch
Copy link
Owner Author

Good.
Please credit SLEEF if the generated source code will be mixed into the source tree of PyTorch.

Jenkinsfile Outdated
@@ -15,10 +15,10 @@ pipeline {
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=../install -DSLEEF_SHOW_CONFIG=1 -DENFORCE_TESTER3=TRUE -DBUILD_QUAD=TRUE ..
make -j 6 all
make -j 1 all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are changes like this needed in this PR?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because add_custom_command does not support parallel builds. In this patch, many COMMANDS are executed within one add_custom_command, and parallel builds fail even on linux computers.

https://stackoverflow.com/questions/60105230/how-to-run-cmake-commands-in-add-custom-command-in-order


#define ENABLE_DP
//@#define ENABLE_DP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this syntax for? I have seen it in other places, so I suspect there is a reason for using it?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read my first comment. Generation of the header files requires multiple processing with cpp, and some of the macros are required in later use.

Addition to helper files : This patch adds lines beginning with //@ which is specially treated during generation of the header files.

@@ -649,6 +664,7 @@ static INLINE vargquad vcast_aq_vm2(vmask2 vm2) {
return a;
#endif
}
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#endif
#endif // #if !defined(SLEEF_GENHEADER)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


//

#ifdef ENABLE_DP
int check_featureDP() {
if (vavailability_i(1) == 0) return 0;
int check_featureDP(double d) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for the header functionality? Same question for the float version below.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the headers, functions like this have greater tendency of being optimized away.
This change is needed to prevent optimizer from removing this function.

OUTPUT ${INLINE_HEADER_FILE}

COMMAND echo Generating sleefinline_${SIMDLC}.h
COMMAND "${CMAKE_C_COMPILER}" ${FLAG_PREPROCESS} ${FLAG_PRESERVE_COMMENTS} ${FLAG_INCLUDE}${PROJECT_SOURCE_DIR}/src/common ${FLAG_INCLUDE}${PROJECT_SOURCE_DIR}/src/arch ${FLAG_INCLUDE}${CMAKE_CURRENT_BINARY_DIR}/include/ ${FLAG_DEFINE}SLEEF_GENHEADER ${FLAG_DEFINE}ENABLE_${SIMD} ${FLAG_DEFINE}DORENAME ${CMAKE_CURRENT_SOURCE_DIR}/sleefsimddp.c > ${CMAKE_CURRENT_BINARY_DIR}/sleef${SIMD}.h.tmp1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break long lines. It is hard to understand what is going on here. Please try not to pass the 80 chars limit.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments in this section.

Comment on lines +8 to +12
#if !defined(SLEEF_GENHEADER)
#define FUNCATR NOEXPORT ALIGNED(64)
#else
#define FUNCATR EXPORT ALIGNED(64)
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table is referenced from the source code including the generated header.
So, this table has to be compiled into a library.

Comment on lines +27 to +31
#if !defined(SLEEF_GENHEADER)
#include "helpersse2.h"
#else
#include "macroonlySSE2.h"
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to header files? Couldn't we conditionally compile the same header file, exposing only the macros when SLEEF_GENHEADER is defined? This could avoid the need of generating the macroonly* files.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of only exposing some macros, I chose the approach to define some of the macros beginning with //@.

Below is comparison between pros and cons.
Approach 1: Exposing a part of the macros according to whether SLEEF_GENHEADER is defined.
Pros: The mechanism is straightforward.
Cons: We need to protect most of the part in the macro with large #ifdef. This is hard to read.
Generation of macros is not very understandable anyway.

Approach 2: Define some of the macros beginning with //@
Pros: Easier to read and manage the source code.
Cons: This is a dedicated way for SLEEF. and requires understanding how the macros are generated.

@@ -2,4 +2,4 @@
set -ev
mkdir sleef.build
cd sleef.build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DSLEEF_SHOW_CONFIG=1 -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl -DENFORCE_TESTER3=TRUE -DBUILD_QUAD=TRUE ..
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DSLEEF_SHOW_CONFIG=1 -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl -DENFORCE_TESTER3=TRUE -DBUILD_QUAD=TRUE -DBUILD_INLINE_HEADERS=TRUE ..
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that you are always generating the header files because you want to test them on each architecture with the iuti* executables? Has this increased the running time in CI?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, running time in CI is increased.
If that is a problem, we can just turn off generation.

@fpetrogalli
Copy link
Collaborator

How comes that CI hasn't completed yet?

@shibatch
Copy link
Owner Author

shibatch commented Feb 25, 2020

Because of a policy change in network security at my institute, I now have to move the CI servers to a different network segment.
I had a meeting with the people in charge of network management, and I was told to change the network settings of the servers.
However, they did not know what CI servers are, and I was asked to have a meeting with them again.
In the second meeting, I told them how the servers work, and they are now saying that additional security measures are required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants