Skip to content

UCS/SYS: Disable "always inline" when -Og flag is present#11251

Merged
yosefe merged 6 commits intoopenucx:masterfrom
shasson5:gcc
Mar 21, 2026
Merged

UCS/SYS: Disable "always inline" when -Og flag is present#11251
yosefe merged 6 commits intoopenucx:masterfrom
shasson5:gcc

Conversation

@shasson5
Copy link
Copy Markdown
Contributor

What?

Disable "always inline" when -Og flag is present

Why?

Support -Og optimization level

Comment thread src/uct/sm/mm/base/mm_ep.c Outdated
uct_pack_callback_t pack_cb, void *arg, const uct_iov_t *iov,
size_t iovcnt, unsigned flags)
{
uint8_t elem_flags = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is not needed?

Comment thread src/uct/sm/mm/base/mm_ep.c
Comment thread config/m4/compiler.m4 Outdated
#
# Check if -Og flag was supplied
#
AS_IF([echo " $BASE_CFLAGS $CFLAGS " | grep -q -- ' -Og '],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe enhance --enable-compiler-opt to support 0,1,2,3,g since as far as I understand they are all mutually exclusive? and maybe AC_DEFINE HAVE_OG_OPTIMIZATION for --enable-compiler-opt=g?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice that -Og can be passed from CFLAGS which comes from env (see example in CI job).
I think that the existing block should remain as is because it has a single well-defined output (BASE_CFLAGS).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, which functions fail to compile with -Og ?
Maybe we could convert them to non-forced inline w/o performance impact

Copy link
Copy Markdown
Contributor Author

@shasson5 shasson5 Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the failed functions:

  1. ucp_proto_request_bcopy_complete_success
  2. ucp_proto_put_offload_bcopy_send_func
  3. ucp_proto_rndv_put_common_flush_send
  4. ucp_proto_rndv_put_common_data_sent
  5. ucp_proto_rndv_put_mtype_send_func

how would you suggest converting them?

Copy link
Copy Markdown
Contributor Author

@shasson5 shasson5 Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or did you mean to convert only them? @yosefe

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then if we don't recognize -O3 the impact will not be so bad, worse case some functions may be non inline which may impact the performance a little. The impact of mis detection will not fail compilation.

Copy link
Copy Markdown
Contributor Author

@shasson5 shasson5 Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that built-in gcc flags __ OPTIMIZE __ and __ NO_INLINE __ are set for all opt levels
except O0, so it's unuseful for us.

So basically the plan is:

  1. Add a flag in complier.m4 to identify high opt level (which includes O2 and O3).
  2. if this flag is set, we define UCS_F_INLINE_OPTIMIZED in complier_def.h
  3. apply UCS_F_INLINE_OPTIMIZED to list of functions above

@yosefe is this correct?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that built-in gcc flags __ OPTIMIZE __ and __ NO_INLINE __ are set for all opt levels
except O0, so it's unuseful for us.

Right

Regarding the plan:

  1. It's enough to do it as part of compiler-opt logic.
  2. If we set optimization level to 2 or higher as part of this macro - define UCS_F_INLINE_OPTIMIZED to always_inline. Otherwise, define it as regular inline
  3. Yes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It's enough to do it as part of compiler-opt logic.

without considering CFLAGS=-Og?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just setting CFLAGS=-Og without compiler-opt will keep UCS_F_INLINE_OPTIMIZED as "inline"

@tvegas1
Copy link
Copy Markdown
Contributor

tvegas1 commented Mar 17, 2026

regarding -Og passed as CFLAGS, if it is only about the line below, maybe we could fix this use-case to pass it using --enable-compiler-opt=g and avoid option detection, there could still be new users of always inline?

./buildlib/tools/builds.sh:	build_gcc CFLAGS=-Og CXXFLAGS=-Og

@shasson5
Copy link
Copy Markdown
Contributor Author

shasson5 commented Mar 18, 2026

regarding -Og passed as CFLAGS, if it is only about the line below, maybe we could fix this use-case to pass it using --enable-compiler-opt=g and avoid option detection

so I guess the question is how you want handle a contradiction between CFLAGS and --enable-compiler-opt.
You can either decide to always give precedence to CFLAGS and ignore --enable-compliler-opt or treat it as error usecase (and then throw responsibility to the user or display error message)

there could still be new users of always inline?

can you explain this question?

@tvegas1
Copy link
Copy Markdown
Contributor

tvegas1 commented Mar 18, 2026

so I guess the question is how you want handle a contradiction between CFLAGS and --enable-compiler-opt. You can either decide to always give precedence to CFLAGS and ignore --enable-compliler-opt or treat it as error usecase (and then throw responsibility to the user or display error message)

maybe we can handle --enable-compiler-opt=g only as i think by default add -O3 anyways, which could already conflict with an -O specified from CFLAGS?

there could still be new users of always inline?

can you explain this question?

I understand the -Og can cause build failure for always inline. now we are able to remove the failures by moving to always inline to inline. But we would still need to have a macro set when -Og, to downgrade always inline to inline.

@shasson5
Copy link
Copy Markdown
Contributor Author

shasson5 commented Mar 18, 2026

maybe we can handle --enable-compiler-opt=g only as i think by default add -O3 anyways, which could already conflict with an -O specified from CFLAGS?

won't it cause CFLAGS=-Og to fail compilation?

I understand the -Og can cause build failure for always inline. now we are able to remove the failures by moving to always inline to inline. But we would still need to have a macro set when -Og, to downgrade always inline to inline.

right that's the idea

@shasson5 shasson5 requested review from tvegas1 and yosefe March 18, 2026 18:39
Comment thread config/m4/compiler.m4 Outdated
#
# Define OPTIMIZE_HIGH for optimization levels -O2 or -O3
#
AS_IF([echo " $BASE_CFLAGS $CFLAGS " | grep -qE -- ' -O(2|3) '],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i am getting confused, but the case where CI sets -Og would not be handled right, as below comes with -O3 and then -Og, and AFAIU, gcc will use -Og, where it should not add the inlines?

$  ./contrib/configure-devel CFLAGS=-Og
configure:               CFLAGS:   -O3 -g -Wall ... -Og

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@shasson5 shasson5 requested a review from tvegas1 March 19, 2026 16:15
Comment thread config/m4/compiler.m4
@shasson5
Copy link
Copy Markdown
Contributor Author

@yosefe please review

@shasson5 shasson5 requested a review from tvegas1 March 19, 2026 21:06
Comment thread config/m4/compiler.m4 Outdated
Comment on lines +87 to +92
opt_level=""
for flag in $BASE_CFLAGS $CFLAGS; do
case $flag in -O*) opt_level=$flag;; esac
done
AS_IF([test "x$opt_level" = "x-O2" || test "x$opt_level" = "x-O3"],
[AC_DEFINE([OPTIMIZE_HIGH], 1, [Compiled with high optimization level])])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the approach is not bad. i would fine tune the implementation:

  • use shell pattern substitution to set "opt_level" to the exact optimization level that follows the -O argument. The -O argument can be a number or a string /(like "z", "fast", "g", etc)
  • define OPTIMIZATION_LEVEL macro to be the numeric optimization level. If the optimization is not a number, the macro is left undefined
  • in compiler_def.h we CANNOT use OPTIMIZATION_LEVEL macro, because it does not include config.h. need to use compiler.h
  • in compiler.h, need to check if OPTIMIZATION_LEVEL is defined and its level is >=2

@yosefe yosefe enabled auto-merge (squash) March 21, 2026 09:26
@yosefe yosefe merged commit 4f24dbe into openucx:master Mar 21, 2026
152 checks passed
shasson5 added a commit to shasson5/ucx that referenced this pull request Mar 22, 2026
)

* UCS/SYS: Disable "always inline" when -Og flag is present

* UCS/SYS: CR fixes

* UCS/SYS: added complier.m4 fix

* UCT/MM: reverted change

* UCS/BUILD: CR fix

* CONFIG: Claude CR fixes

---------

Co-authored-by: Yossi Itigin <yosefe@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants