Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcc build is failing for test targets that depend on //source/exe:envoy_common_lib #16196

Closed
yanavlasov opened this issue Apr 27, 2021 · 24 comments · Fixed by #16711
Closed

gcc build is failing for test targets that depend on //source/exe:envoy_common_lib #16196

yanavlasov opened this issue Apr 27, 2021 · 24 comments · Fixed by #16711
Assignees
Labels
area/build no stalebot Disables stalebot from closing an issue priority/high

Comments

@yanavlasov
Copy link
Contributor

The //source/exe:envoy_common_lib target builds all extensions and as a result creates a huge list of virtual include paths to each extension directory (which by themselves are also long).
In this case gcc fails to spawn its frontend:

gcc: fatal error: cannot execute ‘/usr/lib/gcc/x86_64-linux-gnu/9/cc1plus’: execv: Argument list too long

As a short term solution the three test targets have been modified to depend on the //source/exe;envoy_common_with_core_extensions_lib which only includes core extensions. Two tests in question do not need all extensions, and one is enough to test on clang.

Going forward I'm engaging with the bazel team to see how this can be addressed.
Plan B is to ask RBE team to increase ulimit -s on RBE VMs.

@yanavlasov yanavlasov added bug triage Issue requires triage area/build no stalebot Disables stalebot from closing an issue priority/high and removed bug triage Issue requires triage labels Apr 27, 2021
@yanavlasov
Copy link
Contributor Author

The short term bandaid was applied in PRs #16193 and #16187

@yanavlasov yanavlasov self-assigned this Apr 27, 2021
@antoniovicente
Copy link
Contributor

#15814 (comment)

//test/exe:version_out_test is just a bash test that depends on //source/exe:envoy-static, so it seems that we're really close to not being able to build envoy-static under gcc.

@antoniovicente
Copy link
Contributor

This work around is no longer enough after just 2 weeks, see #16438

bazel build //source/exe:envoy_main_entry_lib fails under gcc.

@antoniovicente
Copy link
Contributor

When building :envoy_main_entry_lib virtual includes account for 1178 out of 1529 arguments, and 112kb out of 122kb in the gcc command line. Does anyone know of options to reduce virtual include use and/or reduce the verbosity of the include expressions?

@tsaarni
Copy link
Member

tsaarni commented May 12, 2021

I have the same "Argument list too long" failure to build //test/exe:pie_test in #15711. It seems very similar to //test/exe:version_out_test. Hoping for solution as well.

@yanavlasov
Copy link
Contributor Author

We are actively working on resolution. I expect a workaround by tomorrow.

@antoniovicente
Copy link
Contributor

Post submits are hitting the build error: https://dev.azure.com/cncf/envoy/_build/results?buildId=75509&view=results

@yanavlasov
Copy link
Contributor Author

Testing new gcc toolchain config.

@yanavlasov
Copy link
Contributor Author

yanavlasov commented May 14, 2021

The issue is quite a bit more complicated than originally thought. Note that the failure occurs when gcc invokes cc1plus. Not then bazel invokes gcc. execve limits the total size of arguments and environment to 1/4 of stack limit (in my tests it was 8Mb, so at least 2Mb of args and env should work). However any single string such as single parameter or environment variable is limited to 128Kb. The error output from gcc shows about 124Kb of arguments.

Further stracing bazel shows that gcc creates env variable COLLECT_GCC_OPTIONS and puts the entire command line there as a single line with each parameter quoted. And this blows over the 128Kb limit and causes the error.

Similar error for a different reason reported here: NixOS/nixpkgs#41340

There is nothing I can find that can easily fix this error. I'm not sure if the using file to pass command line arguments to gcc would work (it may just read the file and still create COLLECT_GCC_OPTIONS.

I will inquire internally. Hopefully workaround with excluded extensions will last for some time.

@phlax
Copy link
Member

phlax commented May 14, 2021

I'm not sure if the using file to pass command line arguments to gcc would work (it may just read the file and still create COLLECT_GCC_OPTIONS.

im reasonably confident it would since from my reading of it - that was why the @argfile syntax was introduced

for discussion (of using this approach in the context of make) see - https://stackoverflow.com/a/64202310

@phlax
Copy link
Member

phlax commented May 14, 2021

hmmm. perhaps you are right - its hard to tell - here is the gcc upstream bug (from the posted nix link) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86030

@phlax
Copy link
Member

phlax commented May 14, 2021

what seems to have been a fix in gcc9 here - https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=e7208389c8381feaf8c7c60d975b06c446978006

and related discussion bazelbuild/bazel#12191

@yanavlasov
Copy link
Contributor Author

Not so great news. We have modified bazel to use parameter files with gcc instead of command line arguments, but unfortunately gcc invokes cc1plus the same way anyway. So it pulls all command line arguments from the file and creates the COLLECT_GCC_OPTIONS variable for cc1plus.

Trying to see if any gcc wizards may have any suggestions.

@antoniovicente
Copy link
Contributor

Worth filing a gcc bug requesting use of a better scaling mechanism to pass arguments to cc1plus?

@yanavlasov
Copy link
Contributor Author

We could try. Given that this issue has been flagged 3 years ago and seen no movement, I'm not too optimistic.

@phlax
Copy link
Member

phlax commented May 19, 2021

i asked about this on freenode#gcc - seems like there is not an immediate solution

Worth filing a gcc bug requesting use of a better scaling mechanism to pass arguments to cc1plus?

not sure there is any point raising a new bug - one of the devs has linked this issue here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86030#c4

@yanavlasov
Copy link
Contributor Author

It looks like fixing gcc is a dead end. bazel team is currently looking into how either the number of includes or length of include directories can be reduced.

@antoniovicente
Copy link
Contributor

Can we drop support for gcc?

@yanavlasov
Copy link
Contributor Author

We have some hope.
This new bazel option is what you were suggesting Antonio for extensions. https://docs.bazel.build/versions/master/be/c-cpp.html#cc_library.implementation_deps This was not released yet. I will try it with the custom bazel build.

Also it was pointed out that virtual include directories are created when include_prefix or strip_prefix are used. Which we use for every library. This will need a much broader refactor of all #include directives, but it may be worth while as it will likely improve build performance as well.

@antoniovicente
Copy link
Contributor

Deprecation of envoy_include_prefix usage to strip out include/ and source/ seems tractable. Should we make that change?

if path.startswith("source/") or path.startswith("include/"):

@yanavlasov
Copy link
Contributor Author

yanavlasov commented May 26, 2021

Here is I think the way forward with this issue. The https://docs.bazel.build/versions/master/be/c-cpp.html#cc_library.implementation_deps solves the issue on gcc but we need to wait a few weeks for this to be released. Bazel then will need to be updated in our build config and toolchains. I do not think this should be a problem. This should kick the can way down the road (quite possibly for good), when we accumulate enough core libraries to hit this issue again.

If we want to solve this issue once and for all we need to wean ourselves from using include_prefix. This will be jumbo refactor that will requires us using full include paths to headers in include and source. So it will be #include "include/common/callback.h" instead of current #include "common/callback.h" There definitely some loss of aesthetics.

How do people feel about using full paths in #include directives? Please indicate with thumbs up/down emoji.

@antoniovicente
Copy link
Contributor

A refactor to add include and source to #include paths would be very mechanical to implement. It is only my opinion, but knowing from the path that common/callback.h is in include instead of source is mildly useful. Granted, it may require some adjustments to #include paths projects that depend on Envoy like nighthawk so the effort may not be limited only to the envoy core repo.

I do think that we will get significant benefit from https://docs.bazel.build/versions/master/be/c-cpp.html#cc_library.implementation_deps, but I think that will only allow us to omit source/extensions include paths. The gcc command line will only go down by ~30% due to this change.

@mattklein123
Copy link
Member

If the include path trimming is causing issues I would just remove it, though I agree with @antoniovicente that there may be an annoying knock on effect for consumers.

@yanavlasov
Copy link
Contributor Author

Yes, I agree 30% is not quite "way down the road" as I thought. I'll start on fixing the include paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment