-
Notifications
You must be signed in to change notification settings - Fork 1.6k
treewide: add C++ modules support #1605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for testing the build of module, make sure you have clang-16 and cmake 3.26 first, and then: $ cmake -DCMAKE_CXX_STANDARD=20 -DCMAKE_CXX_COMPILER=clang++ -DSeastar_MODULE=ON -GNinja -B build/module
$ ninja -C build/module hello_cxx_module please note, this change has been tested by compiling scylla with an updated submodule of seastar with this change. |
@avikivity @denesb hi Avi and Botond, could you help review this change? |
b3d2a27
to
3f910bd
Compare
👍 brilliant work! (also means a lot of new stuff to learn for me, tired but excited)
Will it be an option of configure.py? |
i think the reason to have an option for a feature in |
@@ -32,6 +33,7 @@ namespace seastar { | |||
|
|||
/// Facility to tie a timeout with an abort source | |||
/// Can be used to make abortanle fibers also support timeouts | |||
SEASTAR_MODULE_EXPORT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just export namespace seastar
? This saves us from having to repeat it for every name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export the whole namespace means we are putting all declarations and definitions in the namespace into public. this has following requirements and consequences:
- we cannot hide the internal symbols from users. this practically defeats the second goal of introducing Seastar. please see the commit message and cover letter.
- since all symbols in the namespace have to be exported, they will have external linkage. i'm afraid that this might hurt the performance of linker.
so, i think it's not worthy to go after the "saving" in this area. what i imaging, is that the macro of SEASTAR_MODULE_EXPORT
is a strong sign that: "a public symbol follows, change it with caution!"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seastar already has the convention that all public names are exported (but names in seastar::internal are reserved). It's true that some internal names predate the convention and are slowly migrated to seastar::internal, but we can live with it.
The goal is to reduce noise. Suppose we have module.cc that exports the seastar namespace, then #includes all public headers. Does this help?
If not, we can keep the current patch as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seastar already has the convention that all public names are exported (but names in seastar::internal are reserved). It's true that some internal names predate the convention and are slowly migrated to seastar::internal, but we can live with it.
The goal is to reduce noise. Suppose we have module.cc that exports the seastar namespace, then #includes all public headers. Does this help?
not quite. seastar.cc
includes not only public headers but also private headers. we use the export
keyword to change the visibility and linkage of a block or a single symbol to external. the ones not marked with export
are visible only from within the module.
If not, we can keep the current patch as is.
#include <exception> | ||
#include <optional> | ||
#include <type_traits> | ||
#include <utility> | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this #ifndef. How does it compile? and why shouldn't we include the headers, as long as we don't export the contents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explained in the commit message. i am quoting it here
when it comes to header files, since all third-party headers are
included in seastar.cc's global fragment, and we don't want to
include them in the module's purview, they should be excluded
when compiling the module. so, if SEASTAR_MODULE is defined,
the third-party headers are not included in the header file.
when implementing modules, we need to keep in mind that each entity, i.e., symbol,
is attached to a module. the module in question could the global one, which does not have a name, or a certain named module. in our case, the "named module" is the "seastar" module. even if we don't export the entities declared / defined by third-party headers, by including them in the purview which is started by the statement of export module seastar;
, we
- attach all those entities into our named module, with the module linkage. this is, well, wrong. as our named module does not "own" them.
- in the purview, the macros "exported" by them, are not visible to other
#include
ed headers . this potentially breaks the expectation of some headers. their behaviors change or even depend on the predefined macros. so this is a show stopper. - and the definition of these symbols have to agreed in the scope of the named module. this is difficult, that's why i want to have them in
seastar.cc
. i also put the explanation in the commit message.
anyhow, i think there are lots of nuts and bolts in the C++20 modules, but how the preprocessor directive works and linkage works are two foremost important things we need to take care of, as they are not quite the same as non-module world. the items listed above are just my understanding to address your question. i don't dare to claim that they are correct. so might want to read the related spec, papers and references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the idea is that these includes are moved before some export statement so it doesn't see them?
That's sad, can't we export only things in namespace seastar somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the idea is that these includes are moved before some export statement so it doesn't see them?
yes. by moving them into the global module.
That's sad, can't we export only things in namespace seastar somehow?
we can. and i think the closest thing is the "separate module partition" solution. but still we need to structure the headers in a way that is
- backward compatible.
- they can consume the internal/external symbols exposed by seastar module
but the "separate module partition" solution has its own drawbacks. i explained the considerations in the commit message.
@@ -32,7 +35,7 @@ static constexpr int ulong_bits = std::numeric_limits<unsigned long>::digits; | |||
* which is returned when value == 1. | |||
*/ | |||
template<typename T> | |||
inline size_t count_leading_zeros(T value) noexcept; | |||
size_t count_leading_zeros(T value) noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This smells like a compiler bug, no? I don't mind the change, but if it's a bug, let's report it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i just was just not 100% sure if it was indeed a bug. as C++ modules is quite a new thing to me. wanted to collect more facts or at least do my home work before filing an issue. anyhow, here it is: llvm/llvm-project#62043
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will extract it into a separate commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i put it back in the latest revision. as this behavior is compliant to the C++23 spec, see http://eel.is/c++draft/dcl.inline#7. i amended the commit message to explain the rationales behind the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so the cause is that is SEASTAR_MODULE_EXPORT_BEGIN doesn't export anything, then count_trailing_zeros us undefined? But then how does the module see it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so the cause is that is SEASTAR_MODULE_EXPORT_BEGIN doesn't export anything,
not quite true. in our case, SEASTAR_MODULE_EXPORT_BEGIN
is defined as export {
. so all stuff after it before ``SEASTAR_MODULE_EXPORT_END` is exported.
then count_trailing_zeros us undefined? But then how does the module see it?
good question. i gave up too soon. let me continue the discussion with Chuanqi at llvm/llvm-project#62043
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. i mistook the specialization of a template as its definition. please refer to the link above for more details.
template <typename T = void> | ||
future<T> current_exception_as_future() noexcept; | ||
|
||
extern template | ||
future<void> current_exception_as_future() noexcept; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why delete this? It will waste some compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i forgot to export it as well. as all visibility properties of the same symbol should agree, if any of them are different with the others, compiler just chokes. added, and put the declaration back. the tree compiles.
|
||
template <typename T> | ||
inline | ||
future<T> internal::make_exception_future(future_state_base&& state) noexcept { | ||
return future<T>(exception_future_marker(), std::move(state)); | ||
} | ||
|
||
SEASTAR_MODULE_EXPORT_BEGIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the logic of these _END/_BEGINs. Is it to avoid exporting something in the internal namespace? I think we can export it even though it's internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it something to do with inline
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it to avoid exporting something in the internal namespace? I think we can export it even though it's internal.
yes, i really want to get it right.
# define SEASTAR_MODULE_EXPORT_END | ||
# define SEASTAR_BEGIN_INTERNAL_NAMESPACE namespace internal { | ||
# define SEASTAR_END_INTERNAL_NAMESPACE } | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the #else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the #else
branch is in seastar.cc
. i don't want to pretend that these macros are defined multi times in different translation units. i want to make it very clear that: otherwise, it is define in a single place for only once.
the #include
in purview does not leak the preprocessor macros. so this concern does not make sense at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you think a symmetric definition is better, we can have something like
#ifdef SEASTAR_MODULE
# define SEASTAR_MODULE_EXPORT export
# define SEASTAR_MODULE_EXPORT_BEGIN export {
# define SEASTAR_MODULE_EXPORT_END }
# define SEASTAR_BEGIN_INTERNAL_NAMESPACE \
} \
namespace internal {
# define SEASTAR_END_INTERNAL_NAMESPACE \
} \
export {
#else
# define SEASTAR_MODULE_EXPORT
# define SEASTAR_MODULE_EXPORT_BEGIN
# define SEASTAR_MODULE_EXPORT_END
# define SEASTAR_BEGIN_INTERNAL_NAMESPACE namespace internal {
# define SEASTAR_END_INTERNAL_NAMESPACE }
#endif
but this makes the preprocessed header file a little bit bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied the change in the latest revision.
include/seastar/util/modules.hh
Outdated
# define SEASTAR_MODULE_EXPORT | ||
# define SEASTAR_MODULE_EXPORT_BEGIN | ||
# define SEASTAR_MODULE_EXPORT_END | ||
# define SEASTAR_BEGIN_INTERNAL_NAMESPACE namespace internal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like not to hide C++ keywords from the source file. So if we need to insert something before namespace internal
, let's keep it in the original source and add only the additions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. instead of using
SEASTAR_BEGIN_INTERNAL_NAMESPACE
// details go here
SEASTAR_MODULE_EXPORT_END
in the next revision, will have something like
SEASTAR_MODULE_EXPORT_END
namespace internal {
// details go here
}
SEASTAR_MODULE_EXPORT_BEGIN
|
||
#ifdef SEASTAR_MODULE | ||
module seastar; | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have something like SEASTAR_CONCEPT() that can wrap a statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, what you are proposing is something like
/// util/modules.hh
#ifdef SEASTAR_CXX_MODULE
# define SEASTAR_MODULE(x...) x
#else
# define SEASTAR_MODULE(x...)
#endif
/// core/memory.cc
#include <seastar/util/modules.hh>
SEASTAR_MODULE(module);
/// third-party headers
SEASTAR_MODULE(module seastar);
#ifndef SEASTAR_CXX_MODULE
/// seastar headers here
#endif
the upside is that we can drop some of #if ... #endif
guards from the .cc files, but the downside is that we are hiding more C++ keywords, and we need to have the seastar/util/modules.hh
header sitting at the top of every .cc file. IMHO, it looks more weird than the #ifdef ... #endif
pairs. but if you insist, i will update all .cc file to reflect this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wouldn't be hiding any keywords, since the macros don't supply keywords themselves. They only replace #ifdef/#endif pairs.
I don't insist though. For concepts it really helps because the concept is part of a definition, here it's just decoration at the top/bottom of a file.
#include <sys/inotify.h> | ||
|
||
#ifdef SEASTAR_MODULE | ||
module seastar; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some logic in the #includes now. I wonder if we can move it to the header files?
#define SEASTAR_BEGIN_INCLUDES module;
#define SEASTAR_END_INCLUDES module seastar;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess this comment is in the same theme of the previous one: to reduce the #if .. #endif
s. they are probably distracting. but i want to make the macro names reflect what they are doing instead of hiding the internals with names not quite relevant. as, IMHO, module
and module seastar
is not strictly related to header files. they are here to split the module implementation unit into different fragments. and more importantly, module seastar;
does not mark the end of #includes
. on the contrary, it declare a module implementation unit.
so, in comparison to the SEASTAR_MODULE(x...)
suggestion, i am in favor of the latter. as SEASTAR_MODULE(x...)
is closer to C++ source file, and easier to read. but again, could you re-evaluate the ugly-ness level ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep the C++ keywords visible and not hide them. It's tricky enough without the reader having to learn a new wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am confused. if we go with:
#define SEASTAR_BEGIN_INCLUDES module;
the module
keyword is hidden by the macro. so with existing approach, this keyword is more visible, IMHO.
Thanks for the detailed commitlog. I didn't understand everything, undoubtedly some has to be learned fighting hand-to-hand with the tools. I think the biggest challenge is testing so we don't break it tomorrow. Especially the header situation looks fragile. Let's see if we can improve it. Let's work to update the test toolchain to something that can use modules (outside this patch). Then, the patch should include an application that uses modules as a demo, and CI should link it in all the toolchains that support it (barely one so far, but still). |
so, for testing the last two stable releases of GCC and Clang, in addition to gcc-11, gcc-12, clang-14 and clang 15, we need to use clang 16, newer ninja and CMake, i see only two options forward:
but still, i think fedora 38 is simpler regarding the workload of preparing the toolchina. BTW, since Clang 16 has been officially released. we need to test using it and drop Clang-14 from the testing matrix.
there is already a demo named |
changelog
not changed:
|
d5eec75
to
5b53cb7
Compare
@avikivity thank you for your review. could you take another look? |
@avikivity ping? |
Since clang 16 is released, can't we keep Ubuntu (perhaps update the baseline?) the advantage is that Ubuntu packages multiple toolchains in one release. |
applog.info("Hello world!"); | ||
return make_ready_future<>(); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet
Sorry for the delay. I'd really like to see if we can reduce the churn. The idea is that people who add new names (or new headers) will not get confused by what macros to add and where. |
sure. just realized that GCC-13 was also released! guess i've been using it for too long =D there is not reason to stick with ubuntu:kinetic. see #1196. @avikivity could you regenerate the toolchain, and point |
i see. i even had a branch to use
i see two approaches to address your concern:
|
changelog
|
we are trying to bump up the toolchain for building scylladb at scylladb/scylladb#13730, and it is blocked by llvm/llvm-project#62842 |
@avikivity in the light of the scope of this change, this changeset is getting stale very quickly. before rebasing it against master HEAD, may i get more inputs on how we can progress it? is my proposal viable? |
Sorry again. Please rebase and I'll merge. |
this change brings C++20 modules to seastar. the C++20 modules support are added for couple reasons: - allow developer to import Seastar in a more efficient way in the sense of compile-time speed, as the compiler does not need to compile the headers for every source file including the Seastar headers -- the compiler just need to read the precompiled module interface, or in Clang's terminology the BMI (Built Module Interface) files. - better encapsulation. without C++20 modules, we just add guards around the non-public declaration/definitions in the headers using an "internal" namespace. but developer and compiler at Seastar application's side is still able to access them. this is but a coding convention. but with C++20 modules, it enforces this with different linkages. So only the exported symbols have the external linkage, while the internal ones only have the so called, module linkage. in other words, the internal symbols are not visible from the application. this ensures Seastar applications do not access the private symbols even if they are located in the public header files In this change, only a single "seastar" module is exposed. We have some other alternatives: - to have one module for each subsystem, like `seastar.core` and `seastar.utils`. but due to the circular dependencies between these subsystems, without addressing it first, this is leading nowhere. also, to have multiple modules would need some discussions beforehand, as changing the boundary of each module would be a API change, so instead of making decision in the preliminary phase of the modularization change. - to have two modules for experimental and non-experimental APIs. this is a more viable and safer direction. we will need to implement this in a follow-up change. For building a single "seastar" module, there are more than one way to structure the module system. the author of this change experimented all of them, but they either do not work at all, or are more complicated than the solution used in this change. All of the experimented solutions are listed below: - to build each subsystem as a separate module partition by including all headers of individual subsystem in, for instance, "seastar-core.cppm", and use an umbrella "seastar.cppm" to import and export all these partitions. this would face exactly the same problem encountered by one-module-for-each-subsystem due to the circular dependencies. - to build each header file as a separate module partition. and let `seastar.cc` to import and export each of these partition. this solution is techinically viable. the upside of this solution is that * we don't have a monolithic `seastar.cc` source file which includes all third-party headers by each of these partitions. so each partition can manage its own global headers. and we don't need to replicate them in `seastar.cc`. but the downside is that * we need to pay extra attention on the order of `#include`s, so the symbols in every partition's global fragment agree, compiler would emit error if it believes that any symbol has different definition either because it is compiled with different macro definition or different compiling options, as it would violate ODR. * in each header , we would have to export its own partition, and import all used partitions in each header, take `abort_fifo.hh` for an example, we need follow statements before starting its purview: ```c++ export module seastar:core.abortable_fifo; import :core.abort_source; import :core.chunked_fifo; import :util.optimized_optional; ``` quite like how we use the "#include" preprocessor directive in C/C++. - to include all Seastar headers in `seastar.cc` in its global fragment, and then expose all public symbols using "using public_symbol" in its `export {}` block in its purview. the upside is that * it has minimum impact to the existing source code. actually, this solution does not change the existing source files at all. it just *adds* a new `seastar.cc` which consumes the public header files. the downside is that * there are numerous public interfaces in Seastar, it would be a pain to replicate them in a separate file, not to mention it's error-prone. it's quite easy to forget to add a new `using` statement in `seastar.cc` when adding a new public symbol. * if I recall correctly, I ran into an error where the compiler refuse to "friend" a class exposed with `using` in the `export` block. so this is deal breaker. - to include all third-party headers in the global fragment of `seastar.cc`, include all Seasetar headers in the purview `seastar.cc`, and expose individual symbols in the headers where they are originally declared or defined. the upside is that * we don't need to care about the `export`/`import` dependencies between different partitions. as the header and source files have access to the whole module by default. * the visibility of symbols are managed by their own headers, this reduces the burden of maintainers. * we don't need to worry about the ODR violations which might arise when we include third-party headers in different order in different Seastar header. the downside is that * the long list of third-party headers and macro definitions in the single file -- `seastar.cc`. they practically replicate the ones in all included Seastar header files. * in comparison to the `using` based solution, we have to use macros to enable/disable the module related statements in header / source files. One goal of this change is to preserve the backward compatibility in the sense that existing Seastar application can continue using Seastar as a plain library without having to switch over to C++20 modules. Since we choose to include all Seastar headers in the purview of `seastar.cc`, the problem boils down to two sub problems: - header files included by the module interface unit, i.e., in our case, `seastar.cc` - source files compiled separately as module implementation units. first, a new macro `SEASTAR_MODULE` is defined when we are building a C++ module. when it comes to header files, since all third-party headers are included in `seastar.cc`'s global fragment, and we don't want to include them in the module's purview, they should be excluded when compiling the module. so, if `SEASTAR_MODULE` is defined, the third-party headers are not included in the header file. in order to export public symbols in each header file, a set of macros are introduced to hide the differences between two build modes: `SEASTAR_MODULE_EXPORT` and `SEASTAR_MODULE_EXPORT_{BEGIN, END}`. these macros are defined in `util/modules.hh`. when building the C++20 module, they are defined as the `export` keyword; when building the regular library, these macro are no-ops. that's why this new header file is included in every Seastar header now. fortunately, it's a very smaller one. under some circumstances, both `SEASTAR_MODULE_EXPORT` and the `SEASTAR_MODULE_EXPORT_{BEGIN, END}` can be used, in these cases, we prefer simpler solution. for instance, if the whole namespace is public, we use `SEASTAR_MODULE_EXPORT` to export the namespace. but if a certain namespace includes private symbols, we use `SEASTAR_MODULE_EXPORT_{BEGIN, END}` to enclose the public symbols. in the case of source files, because they are not included by `seastar.cc`, and they are compiled individually, we need to keep the used third-party headers in its global fragment. but please note, the required headers might be more than what we generally need when compiling the source file to a regular library. because, unlike the textual inclusion, the source file will not include the headers previously included by the Seastar headers anymore. now, the .cc files only have access to the symbols attached "seastar" module by default. so we need to include them explicity in the global fragment. that's why a bunch of "new" includes are added in this change. in hope to minimize the impact to the existing source code, the author also experimented another way to insert the module related statements by preprocessing the all source files using CMake. so, for instance, it can translate the markers in comments or add `module;` at the very beginning of a source file. and let the compiler compile the preprocessed files. this works fine. the only drawback exhibits itself in the development cycle. when developer tries to fix a compiling warning / error, he/she always has to manually go to the original source file, as the compiler cannot connect the error to the original file. this greatly hurts developers' experience when building the library as a module. we have following requirements on the toolchain for building the C++20 module: - CMake 3.26: CMake 3.26 has builtin supports for building C++ modules with Clang 16. - Ninja 1.11: it has dyndep feature used by CMake. - Clang 17: for better support of C++ modules. as the C++ modules changes are not backported to Clang 15, we need Clang 16 or up. and because Clang 16 refuses to expose symbols marked with `extern "C"` defined in purview to with C linkage. but we have a bunch of C APIs overriding the functions like "free()", "malloc()". and these APIs use in-module symbols, in theory, we could extract these symbols out, but that involves even more code churn. so let's stick with Clang 17. even it has not been released at the time of writing. GCC does have basic C++20 modules support, see https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Modules.html. But for two reasons, it is not supported * it still does not support p1689r5 yet. so it cannot provide necessary support required by CMake for figuring out the inter translation unit dependencies. * it does not support private module fragment. it is used for including the private headers. also, please note, neither ccache nor distcc now supports C++20 modules at the time of writing. in addition to the changes in the building system and those directly related to C++20 module, in this change, we also * remove the `static` specifier applied to the exported symbols, because, otherwise these symbols will have module linkage and will not be exported. * remove the `static` specifier applied to the symbols shared in the module. as otherwise, it will only be visible in the same translation unit. * add `extern "C++"` specifier if a symbol is not supposed to be exported as part of the module. as the symbols exported by module has a different mangling rule. this applies to the `new` and `delete` operator defined in `memory.cc`. * remove `inline` specifier from the symbol attached to the module. as per http://eel.is/c++draft/dcl.inline#7, we should otherwise define it in the same domain. fortunately, both GCC and Clang don't complain after the specifier is dropped when building regular library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
@avikivity rebased and repushed. changelog
|
treewide: add C++ modules support
this change brings C++20 modules to seastar.
the C++20 modules support are added for couple reasons:
the sense of compile-time speed, as the compiler does not
need to compile the headers for every source file including the
Seastar headers -- the compiler just need to read the precompiled
module interface, or in Clang's terminology the BMI (Built Module
Interface) files.
around the non-public declaration/definitions in the headers using
an "internal" namespace. but developer and compiler at Seastar
application's side is still able to access them. this is but a
coding convention. but with C++20 modules, it enforces this with
different linkages. So only the exported symbols have the external
linkage, while the internal ones only have the so called,
module linkage. in other words, the internal symbols are not
visible from the application. this ensures Seastar applications
do not access the private symbols even if they are located in the
public header files
In this change, only a single "seastar" module is exposed. We have
some other alternatives:
seastar.core
andseastar.utils
. but due to the circular dependencies betweenthese subsystems, without addressing it first, this is leading
nowhere. also, to have multiple modules would need some
discussions beforehand, as changing the boundary of each module
would be a API change, so instead of making decision in the
preliminary phase of the modularization change.
this is a more viable and safer direction. we will need to
implement this in a follow-up change.
For building a single "seastar" module, there are more than one way
to structure the module system. the author of this change
experimented all of them, but they either do not work at all, or
are more complicated than the solution used in this change. All of
the experimented solutions are listed below:
to build each subsystem as a separate module partition by including
all headers of individual subsystem in, for instance,
"seastar-core.cppm", and use an umbrella "seastar.cppm" to import
and export all these partitions. this would face exactly the same
problem encountered by one-module-for-each-subsystem due to the
circular dependencies.
to build each header file as a separate module partition. and
let
seastar.cc
to import and export each of these partition.this solution is techinically viable.
the upside of this solution is that
seastar.cc
source file whichincludes all third-party headers by each of these partitions.
so each partition can manage its own global headers. and
we don't need to replicate them in
seastar.cc
.but the downside is that
#include
s, sothe symbols in every partition's global fragment agree,
compiler would emit error if it believes that any symbol
has different definition either because it is compiled with
different macro definition or different compiling options,
as it would violate ODR.
import all used partitions in each header, take
abort_fifo.hh
for an example, we need follow statements before starting its
purview:
export module seastar:core.abortable_fifo; import :core.abort_source; import :core.chunked_fifo; import :util.optimized_optional;
C/C++.
to include all Seastar headers in
seastar.cc
in its globalfragment, and then expose all public symbols using
"using public_symbol" in its
export {}
block in its purview.the upside is that
this solution does not change the existing source files at
all. it just adds a new
seastar.cc
which consumes thepublic header files.
the downside is that
a pain to replicate them in a separate file, not to mention
it's error-prone. it's quite easy to forget to add a new
using
statement inseastar.cc
when adding a new publicsymbol.
refuse to "friend" a class exposed with
using
in theexport
block. so this is deal breaker.
to include all third-party headers in the global fragment of
seastar.cc
, include all Seasetar headers in the purviewseastar.cc
, and expose individual symbols in the headerswhere they are originally declared or defined.
the upside is that
export
/import
dependenciesbetween different partitions. as the header and source files
have access to the whole module by default.
this reduces the burden of maintainers.
arise when we include third-party headers in different order
in different Seastar header.
the downside is that
in the single file --
seastar.cc
. they practically replicatethe ones in all included Seastar header files.
using
based solution, we have to usemacros to enable/disable the module related statements in
header / source files.
One goal of this change is to preserve the backward compatibility
in the sense that existing Seastar application can continue using
Seastar as a plain library without having to switch over to C++20
modules. Since we choose to include all Seastar headers in the
purview of
seastar.cc
, the problem boils down to two sub problems:case,
seastar.cc
first, a new macro
SEASTAR_MODULE
is defined when we are buildinga C++ module.
when it comes to header files, since all third-party headers are
included in
seastar.cc
's global fragment, and we don't want toinclude them in the module's purview, they should be excluded
when compiling the module. so, if
SEASTAR_MODULE
is defined,the third-party headers are not included in the header file.
in order to export public symbols in each header file, a set of
macros are introduced to hide the differences between two build
modes:
SEASTAR_MODULE_EXPORT
andSEASTAR_MODULE_EXPORT_{BEGIN, END}
.these macros are defined in
util/modules.hh
. when building theC++20 module, they are defined as the
export
keyword; when buildingthe regular library, these macro are no-ops. that's why this new
header file is included in every Seastar header now. fortunately,
it's a very smaller one.
in the case of source files, because they are not included by
seastar.cc
, and they are compiled individually, we need to keepthe used third-party headers in its global fragment. but please
note, the required headers might be more than what we generally
need when compiling the source file to a regular library. because,
unlike the textual inclusion, the source file will not include
the headers previously included by the Seastar headers anymore.
now, the .cc files only have access to the symbols attached "seastar"
module by default. so we need to include them explicity in the
global fragment. that's why a bunch of "new" includes are added
in this change.
in hope to minimize the impact to the existing source code, the
author also experimented another way to insert the module related
statements by preprocessing the all source files using CMake. so,
for instance, it can translate the markers in comments or add
module;
at the very beginning of a source file. and let thecompiler compile the preprocessed files. this works fine. the
only drawback exhibits itself in the development cycle. when
developer tries to fix a compiling warning / error, he/she always
has to manually go to the original source file, as the compiler
cannot connect the error to the original file. this greatly hurts
developers' experience when building the library as a module.
we have following requirements on the toolchain for building
the C++20 module:
C++ modules with Clang 16.
changes are not backported to Clang 15, we need Clang 16 or up.
GCC does have basic C++20 modules support, see
https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Modules.html. But
for two reason it is not supported
necessary support required by CMake for figuring out the
inter translation unit dependencies.
including the private headers.
also, please note, neither ccache nor distcc now supports C++20
modules at the time of writing.
in addition to the changes in the building system and those
directly related to C++20 module, in this change, we also
static
specifier applied to the exported symbols,because, otherwise these symbols will have module linkage and
will not be exported.
static
specifier applied to the symbols shared inthe module. as otherwise, it will only be visible in the same
translation unit.
extern "C++"
specifier if a symbol is not supposed to beexported as part of the module. as the symbols exported by
module has a different mangling rule. this applies to the
new
anddelete
operator defined inmemory.cc
.inline
specifier from the symbol attached to the module.as per http://eel.is/c++draft/dcl.inline#7, we should otherwise
define it in the same domain. fortunately, both GCC and Clang don't
complain after the specifier is dropped when building regular
library.