diff --git a/clang/docs/InternalsManual.rst b/clang/docs/InternalsManual.rst index 5732e9b9af03d2..906b6d153f2b4b 100644 --- a/clang/docs/InternalsManual.rst +++ b/clang/docs/InternalsManual.rst @@ -572,6 +572,422 @@ The Frontend Library The Frontend library contains functionality useful for building tools on top of the Clang libraries, for example several methods for outputting diagnostics. +Compiler Invocation +------------------- + +One of the classes provided by the Frontend library is ``CompilerInvocation``, +which holds information that describe current invocation of the Clang frontend. +The information typically comes from the command line constructed by the Clang +driver or from clients performing custom initialization. The data structure is +split into logical units used by different parts of the compiler, for example +``PreprocessorOptions``, ``LanguageOptions`` or ``CodeGenOptions``. + +Command Line Interface +---------------------- + +The command line interface of the Clang ``-cc1`` frontend is defined alongside +the driver options in ``clang/Driver/Options.td``. The information making up an +option definition includes its prefix and name (for example ``-std=``), form and +position of the option value, help text, aliases and more. Each option may +belong to a certain group and can be marked with zero or more flags. Options +accepted by the ``-cc1`` frontend are marked with the ``CC1Option`` flag. + +Command Line Parsing +-------------------- + +Option definitions are processed by the ``-gen-opt-parser-defs`` tablegen +backend during early stages of the build. Options are then used for querying an +instance ``llvm::opt::ArgList``, a wrapper around the command line arguments. +This is done in the Clang driver to construct individual jobs based on the +driver arguments and also in the ``CompilerInvocation::CreateFromArgs`` function +that parses the ``-cc1`` frontend arguments. + +Command Line Generation +----------------------- + +Any valid ``CompilerInvocation`` created from a ``-cc1`` command line can be +also serialized back into semantically equivalent command line in a +deterministic manner. This enables features such as implicitly discovered, +explicitly built modules. + +.. + TODO: Create and link corresponding section in Modules.rst. + +Adding new Command Line Option +------------------------------ + +When adding a new command line option, the first place of interest is the header +file declaring the corresponding options class (e.g. ``CodeGenOptions.h`` for +command line option that affects the code generation). Create new member +variable for the option value: + +.. code-block:: diff + + class CodeGenOptions : public CodeGenOptionsBase { + + + /// List of dynamic shared object files to be loaded as pass plugins. + + std::vector PassPlugins; + + } + +Next, declare the command line interface of the option in the tablegen file +``clang/include/clang/Driver/Options.td``. This is done by instantiating the +``Option`` class (defined in ``llvm/include/llvm/Option/OptParser.td``). The +instance is typically created through one of the helper classes that encode the +acceptable ways to specify the option value on the command line: + +* ``Flag`` - the option does not accept any value, +* ``Joined`` - the value must immediately follow the option name within the same + argument, +* ``Separate`` - the value must follow the option name in the next command line + argument, +* ``JoinedOrSeparate`` - the value can be specified either as ``Joined`` or + ``Separate``, +* ``CommaJoined`` - the values are comma-separated and must immediately follow + the option name within the same argument (see ``Wl,`` for an example). + +The helper classes take a list of acceptable prefixes of the option (e.g. +``"-"``, ``"--"`` or ``"/"``) and the option name: + +.. code-block:: diff + + // Options.td + + + def fpass_plugin_EQ : Joined<["-"], "fpass-plugin=">; + +Then, specify additional attributes via mix-ins: + +* ``HelpText`` holds the text that will be printed besides the option name when + the user requests help (e.g. via ``clang --help``). +* ``Group`` specifies the "category" of options this option belongs to. This is + used by various tools to filter certain options of interest. +* ``Flags`` may contain a number of "tags" associated with the option. This + enables more granular filtering than the ``Group`` attribute. +* ``Alias`` denotes that the option is an alias of another option. This may be + combined with ``AliasArgs`` that holds the implied value. + +.. code-block:: diff + + // Options.td + + def fpass_plugin_EQ : Joined<["-"], "fpass-plugin=">, + + Group, Flags<[CC1Option]>, + + HelpText<"Load pass plugin from a dynamic shared object file.">; + +New options are recognized by the Clang driver unless marked with the +``NoDriverOption`` flag. On the other hand, options intended for the ``-cc1`` +frontend must be explicitly marked with the ``CC1Option`` flag. + +Next, parse (or manufacture) the command line arguments in the Clang driver and +use them to construct the ``-cc1`` job: + +.. code-block:: diff + + void Clang::ConstructJob(const ArgList &Args /*...*/) const { + ArgStringList CmdArgs; + // ... + + + for (const Arg *A : Args.filtered(OPT_fpass_plugin_EQ)) { + + CmdArgs.push_back(Args.MakeArgString(Twine("-fpass-plugin=") + A->getValue())); + + A->claim(); + + } + } + +The last step is implementing the ``-cc1`` command line argument +parsing/generation that initializes/serializes the option class (in our case +``CodeGenOptions``) stored within ``CompilerInvocation``. This can be done +automatically by using the marshalling annotations on the option definition: + +.. code-block:: diff + + // Options.td + + def fpass_plugin_EQ : Joined<["-"], "fpass-plugin=">, + Group, Flags<[CC1Option]>, + HelpText<"Load pass plugin from a dynamic shared object file.">, + + MarshallingInfoStringVector>; + +Inner workings of the system are introduced in the :ref:`marshalling +infrastructure ` section and the available annotations are +listed :ref:`here `. + +In case the marshalling infrastructure does not support the desired semantics, +consider simplifying it to fit the existing model. This makes the command line +more uniform and reduces the amount of custom, manually written code. Remember +that the ``-cc1`` command line interface is intended only for Clang developers, +meaning it does not need to mirror the driver interface, maintain backward +compatibility or be compatible with GCC. + +If the option semantics cannot be encoded via marshalling annotations, you can +resort to parsing/serializing the command line arguments manually: + +.. code-block:: diff + + // CompilerInvocation.cpp + + static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args /*...*/) { + // ... + + + Opts.PassPlugins = Args.getAllArgValues(OPT_fpass_plugin_EQ); + } + + static void GenerateCodeGenArgs(const CodeGenOptions &Opts, + SmallVectorImpl &Args, + CompilerInvocation::StringAllocator SA /*...*/) { + // ... + + + for (const std::string &PassPlugin : Opts.PassPlugins) + + GenerateArg(Args, OPT_fpass_plugin_EQ, PassPlugin, SA); + } + +Finally, you can specify the argument on the command line: +``clang -fpass-plugin=a -fpass-plugin=b`` and use the new member variable as +desired. + +.. code-block:: diff + + void EmitAssemblyHelper::EmitAssemblyWithNewPassManager(/*...*/) { + // ... + + for (auto &PluginFN : CodeGenOpts.PassPlugins) + + if (auto PassPlugin = PassPlugin::Load(PluginFN)) + + PassPlugin->registerPassBuilderCallbacks(PB); + } + +.. _OptionMarshalling: + +Option Marshalling Infrastructure +--------------------------------- + +The option marshalling infrastructure automates the parsing of command line +arguments into ``CompilerInvocation`` and their generation from +``CompilerInvocation``. The system replaces lots of repetitive C++ code with +simple, declarative tablegen annotations and it's being used for the majority of +the ``-cc1`` command line interface. This section provides an overview of the +system. + +To read and modify contents of ``CompilerInvocation``, the marshalling system +uses key paths, which are declared in two steps. First, a tablegen definition +for the ``CompilerInvocation`` member is created by inheriting from +``KeyPathAndMacro``: + +.. code-block:: + + // Options.td + + class LangOpts : KeyPathAndMacro<"LangOpts->", field, "LANG_"> {} + // CompilerInvocation member ^^^^^^^^^^ + // OPTION_WITH_MARSHALLING prefix ^^^^^ + +The first argument to the parent class is the beginning of the key path that +references the ``CompilerInvocation`` member. This argument ends with ``->`` if +the member is a pointer type or with ``.`` if it's a value type. The child class +takes a single parameter ``field`` that is forwarded as the second argument to +the base class. The child class can then be used like so: +``LangOpts<"IgnoreExceptions">``, constructing a key path to the field +``LangOpts->IgnoreExceptions``. The third argument passed to the parent class is +a string that the tablegen backend uses as a prefix to the +``OPTION_WITH_MARSHALLING`` macro. Using the key path as a mix-in on an +``Option`` instance instructs the backend to generate the following code: + +.. code-block:: c++ + + // Options.inc + + #ifdef LANG_OPTION_WITH_MARSHALLING + LANG_OPTION_WITH_MARSHALLING([...], LangOpts->IgnoreExceptions, [...]) + #endif // LANG_OPTION_WITH_MARSHALLING + +Such definition can be used used in the function for parsing and generating +command line: + +.. code-block:: c++ + + // clang/lib/Frontend/CompilerInvoation.cpp + + bool CompilerInvocation::ParseLangArgs(LangOptions *LangOpts, ArgList &Args, + DiagnosticsEngine &Diags) { + bool Success = true; + + #define LANG_OPTION_WITH_MARSHALLING( \ + PREFIX_TYPE, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM, \ + HELPTEXT, METAVAR, VALUES, SPELLING, SHOULD_PARSE, ALWAYS_EMIT, KEYPATH, \ + DEFAULT_VALUE, IMPLIED_CHECK, IMPLIED_VALUE, NORMALIZER, DENORMALIZER, \ + MERGER, EXTRACTOR, TABLE_INDEX) \ + PARSE_OPTION_WITH_MARSHALLING(Args, Diags, Success, ID, FLAGS, PARAM, \ + SHOULD_PARSE, KEYPATH, DEFAULT_VALUE, \ + IMPLIED_CHECK, IMPLIED_VALUE, NORMALIZER, \ + MERGER, TABLE_INDEX) + #include "clang/Driver/Options.inc" + #undef LANG_OPTION_WITH_MARSHALLING + + // ... + + return Success; + } + + void CompilerInvocation::GenerateLangArgs(LangOptions *LangOpts, + SmallVectorImpl &Args, + StringAllocator SA) { + #define LANG_OPTION_WITH_MARSHALLING( \ + PREFIX_TYPE, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM, \ + HELPTEXT, METAVAR, VALUES, SPELLING, SHOULD_PARSE, ALWAYS_EMIT, KEYPATH, \ + DEFAULT_VALUE, IMPLIED_CHECK, IMPLIED_VALUE, NORMALIZER, DENORMALIZER, \ + MERGER, EXTRACTOR, TABLE_INDEX) \ + GENERATE_OPTION_WITH_MARSHALLING( \ + Args, SA, KIND, FLAGS, SPELLING, ALWAYS_EMIT, KEYPATH, DEFAULT_VALUE, \ + IMPLIED_CHECK, IMPLIED_VALUE, DENORMALIZER, EXTRACTOR, TABLE_INDEX) + #include "clang/Driver/Options.inc" + #undef LANG_OPTION_WITH_MARSHALLING + + // ... + } + +The ``PARSE_OPTION_WITH_MARSHALLING`` and ``GENERATE_OPTION_WITH_MARSHALLING`` +macros are defined in ``CompilerInvocation.cpp`` and they implement the generic +algorithm for parsing and generating command line arguments. + +.. _OptionMarshallingAnnotations: + +Option Marshalling Annotations +------------------------------ + +How does the tablegen backend know what to put in place of ``[...]`` in the +generated ``Options.inc``? This is specified by the ``Marshalling`` utilities +described below. All of them take a key path argument and possibly other +information required for parsing or generating the command line argument. + +**Positive Flag** + +The key path defaults to ``false`` and is set to ``true`` when the flag is +present on command line. + +.. code-block:: + + def fignore_exceptions : Flag<["-"], "fignore-exceptions">, Flags<[CC1Option]>, + MarshallingInfoFlag>; + +**Negative Flag** + +The key path defaults to ``true`` and is set to ``false`` when the flag is +present on command line. + +.. code-block:: + + def fno_verbose_asm : Flag<["-"], "fno-verbose-asm">, Flags<[CC1Option]>, + MarshallingInfoNegativeFlag>; + +**Negative and Positive Flag** + +The key path defaults to the specified value (``false``, ``true`` or some +boolean value that's statically unknown in the tablegen file). Then, the key +path is set to the value associated with the flag that appears last on command +line. + +.. code-block:: + + defm legacy_pass_manager : BoolOption<"f", "legacy-pass-manager", + CodeGenOpts<"LegacyPassManager">, DefaultFalse, + PosFlag, + NegFlag, + BothFlags<[CC1Option]>>; + +With most such pair of flags, the ``-cc1`` frontend accepts only the flag that +changes the default key path value. The Clang driver is responsible for +accepting both and either forwarding the changing flag or discarding the flag +that would just set the key path to its default. + +The first argument to ``BoolOption`` is a prefix that is used to construct the +full names of both flags. The positive flag would then be named +``flegacy-pass-manager`` and the negative ``fno-legacy-pass-manager``. +``BoolOption`` also implies the ``-`` prefix for both flags. It's also possible +to use ``BoolFOption`` that implies the ``"f"`` prefix and ``Group``. +The ``PosFlag`` and ``NegFlag`` classes hold the associated boolean value, an +array of elements passed to the ``Flag`` class and the help text. The optional +``BothFlags`` class holds an array of ``Flag`` elements that are common for both +the positive and negative flag and their common help text suffix. + +**String** + +The key path defaults to the specified string, or an empty one, if omitted. When +the option appears on the command line, the argument value is simply copied. + +.. code-block:: + + def isysroot : JoinedOrSeparate<["-"], "isysroot">, Flags<[CC1Option]>, + MarshallingInfoString, [{"/"}]>; + +**List of Strings** + +The key path defaults to an empty ``std::vector``. Values specified +with each appearance of the option on the command line are appended to the +vector. + +.. code-block:: + + def frewrite_map_file : Separate<["-"], "frewrite-map-file">, Flags<[CC1Option]>, + MarshallingInfoStringVector>; + +**Integer** + +The key path defaults to the specified integer value, or ``0`` if omitted. When +the option appears on the command line, its value gets parsed by ``llvm::APInt`` +and the result is assigned to the key path on success. + +.. code-block:: + + def mstack_probe_size : Joined<["-"], "mstack-probe-size=">, Flags<[CC1Option]>, + MarshallingInfoStringInt, "4096">; + +**Enumeration** + +The key path defaults to the value specified in ``MarshallingInfoString`` +prefixed by the contents of ``NormalizedValuesScope`` and ``::``. This ensures +correct reference to an enum case is formed even if the enum resides in different +namespace or is an enum class. If the value present on command line does not +match any of the comma-separated values from ``Values``, an error diagnostics is +issued. Otherwise, the corresponding element from ``NormalizedValues`` at the +same index is assigned to the key path (also correctly scoped). The number of +comma-separated string values and elements of the array within +``NormalizedValues`` must match. The ``AutoNormalizeEnum`` mix-in denotes the +key path option should be treated as an enum and not as a string. + +.. code-block:: + + def mthread_model : Separate<["-"], "mthread-model">, Flags<[CC1Option]>, + Values<"posix,single">, NormalizedValues<["POSIX", "Single"]>, + NormalizedValuesScope<"LangOptions::ThreadModelKind">, + MarshallingInfoString, "POSIX">, AutoNormalizeEnum; + +.. + Intentionally omitting MarshallingInfoBitfieldFlag. It's adding some + complexity to the marshalling infrastructure and might be removed. + +It is also possible to define relationships between options. + +**Implication** + +The key path defaults to the default value from the primary ``Marshalling`` +annotation. Then, if any of the elements of ``ImpliedByAnyOf`` evaluate to true, +the key path value is changed to the specified value or ``true`` if missing. +Finally, the command line is parsed according to the primary annotation. + +.. code-block:: + + def fms_extensions : Flag<["-"], "fms-extensions">, Flags<[CC1Option]>, + MarshallingInfoFlag>, + ImpliedByAnyOf<[fms_compatibility.KeyPath], "true">; + +**Condition** + +The option is parsed only if the expression in ``ShouldParseIf`` evaluates to +true. + +.. code-block:: + + def fopenmp_enable_irbuilder : Flag<["-"], "fopenmp-enable-irbuilder">, Flags<[CC1Option]>, + MarshallingInfoFlag>, + ShouldParseIf; + The Lexer and Preprocessor Library ==================================