-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[clang] Proofread InternalsManual.rst #164057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kazutakahirata
merged 1 commit into
llvm:main
from
kazutakahirata:cleanup_20251017_proofread
Oct 18, 2025
Merged
[clang] Proofread InternalsManual.rst #164057
kazutakahirata
merged 1 commit into
llvm:main
from
kazutakahirata:cleanup_20251017_proofread
Oct 18, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
@llvm/pr-subscribers-clang Author: Kazu Hirata (kazutakahirata) ChangesPatch is 26.67 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164057.diff 1 Files Affected:
diff --git a/clang/docs/InternalsManual.rst b/clang/docs/InternalsManual.rst
index c677ddfa5ecc1..eff46ab46e1ca 100644
--- a/clang/docs/InternalsManual.rst
+++ b/clang/docs/InternalsManual.rst
@@ -10,7 +10,7 @@ Introduction
This document describes some of the more important APIs and internal design
decisions made in the Clang C front-end. The purpose of this document is to
-both capture some of this high level information and also describe some of the
+both capture some of this high-level information and also describe some of the
design decisions behind it. This is meant for people interested in hacking on
Clang, not for end-users. The description below is categorized by libraries,
and does not describe any of the clients of the libraries.
@@ -20,7 +20,7 @@ LLVM Support Library
The LLVM ``libSupport`` library provides many underlying libraries and
`data-structures <https://llvm.org/docs/ProgrammersManual.html>`_, including
-command line option processing, various containers and a system abstraction
+command line option processing, various containers, and a system abstraction
layer, which is used for file system access.
The Clang "Basic" Library
@@ -34,7 +34,7 @@ and information about the subset of the language being compiled for.
Part of this infrastructure is specific to C (such as the ``TargetInfo``
class), other parts could be reused for other non-C-based languages
(``SourceLocation``, ``SourceManager``, ``Diagnostics``, ``FileManager``).
-When and if there is future demand we can figure out if it makes sense to
+When and if there is future demand, we can figure out if it makes sense to
introduce a new library, move the general classes somewhere else, or introduce
some other solution.
@@ -96,7 +96,7 @@ The ``EXTENSION`` and ``EXTWARN`` severities are used for extensions to the
language that Clang accepts. This means that Clang fully understands and can
represent them in the AST, but we produce diagnostics to tell the user their
code is non-portable. The difference is that the former are ignored by
-default, and the later warn by default. The ``WARNING`` severity is used for
+default, and the latter warn by default. The ``WARNING`` severity is used for
constructs that are valid in the currently selected source language but that
are dubious in some way. The ``REMARK`` severity provides generic information
about the compilation that is not necessarily related to any dubious code. The
@@ -106,7 +106,7 @@ These *severities* are mapped into a smaller set (the ``Diagnostic::Level``
enum, {``Ignored``, ``Note``, ``Remark``, ``Warning``, ``Error``, ``Fatal``}) of
output
*levels* by the diagnostics subsystem based on various configuration options.
-Clang internally supports a fully fine grained mapping mechanism that allows
+Clang internally supports a fully fine-grained mapping mechanism that allows
you to map almost any diagnostic to the output level that you want. The only
diagnostics that cannot be mapped are ``NOTE``\ s, which always follow the
severity of the previously emitted diagnostic and ``ERROR``\ s, which can only
@@ -116,18 +116,18 @@ example).
Diagnostic mappings are used in many ways. For example, if the user specifies
``-pedantic``, ``EXTENSION`` maps to ``Warning``, if they specify
``-pedantic-errors``, it turns into ``Error``. This is used to implement
-options like ``-Wunused_macros``, ``-Wundef`` etc.
+options like ``-Wunused_macros``, ``-Wundef``, etc.
Mapping to ``Fatal`` should only be used for diagnostics that are considered so
severe that error recovery won't be able to recover sensibly from them (thus
-spewing a ton of bogus errors). One example of this class of error are failure
+spewing a ton of bogus errors). One example of this class of error is failure
to ``#include`` a file.
Diagnostic Wording
^^^^^^^^^^^^^^^^^^
The wording used for a diagnostic is critical because it is the only way for a
user to know how to correct their code. Use the following suggestions when
-wording a diagnostic.
+wording a diagnostic:
* Diagnostics in Clang do not start with a capital letter and do not end with
punctuation.
@@ -162,7 +162,7 @@ wording a diagnostic.
cannot be null in well-defined C++ code``.
* Prefer diagnostic wording without contractions whenever possible. The single
quote in a contraction can be visually distracting due to its use with
- syntactic constructs and contractions can be harder to understand for non-
+ syntactic constructs, and contractions can be harder to understand for non-
native English speakers.
The Format String
@@ -195,14 +195,14 @@ the C++ code that :ref:`produces them <internals-producing-diag>`, and are
referenced by ``%0`` .. ``%9``. If you have more than 10 arguments to your
diagnostic, you are doing something wrong :). Unlike ``printf``, there is no
requirement that arguments to the diagnostic end up in the output in the same
-order as they are specified, you could have a format string with "``%1 %0``"
+order as they are specified; you could have a format string with "``%1 %0``"
that swaps them, for example. The text in between the percent and digit are
formatting instructions. If there are no instructions, the argument is just
turned into a string and substituted in.
Here are some "best practices" for writing the English format string:
-* Keep the string short. It should ideally fit in the 80 column limit of the
+* Keep the string short. It should ideally fit in the 80-column limit of the
``DiagnosticKinds.td`` file. This avoids the diagnostic wrapping when
printed, and forces you to think about the important point you are conveying
with the diagnostic.
@@ -227,7 +227,7 @@ used to achieve this sort of thing in a localizable way, see below.
Formatting a Diagnostic Argument
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Arguments to diagnostics are fully typed internally, and come from a couple
+Arguments to diagnostics are fully typed internally and come from a couple of
different classes: integers, types, names, and random strings. Depending on
the class of the argument, it can be optionally formatted in different ways.
This gives the ``DiagnosticConsumer`` information about what the argument means
@@ -268,7 +268,7 @@ Description:
This format specifier is used to merge multiple related diagnostics together
into one common one, without requiring the difference to be specified as an
English string argument. Instead of specifying the string, the diagnostic
- gets an integer argument and the format string selects the numbered option.
+ gets an integer argument, and the format string selects the numbered option.
In this case, the "``%0``" value must be an integer in the range [0..2]. If
it is 0, it prints "unary", if it is 1 it prints "binary" if it is 2, it
prints "unary or binary". This allows other language translations to
@@ -287,7 +287,7 @@ Description:
additionally generates a namespace, enumeration, and enumerator list based on
the format string given. In the above case, a namespace is generated named
``FrobbleKind`` that has an unscoped enumeration with the enumerators
- ``VarDecl`` and ``FuncDecl`` which correspond to the values 0 and 1. This
+ ``VarDecl`` and ``FuncDecl``, which correspond to the values 0 and 1. This
permits a clearer use of the ``Diag`` in source code, as the above could be
called as: ``Diag(Loc, diag::frobble) << diag::FrobbleKind::VarDecl``.
@@ -407,7 +407,7 @@ Example:
def note_ovl_candidate : Note<
"candidate %sub{select_ovl_candidate}3,2,1 not viable">;
- and will act as if it was written
+ and will act as if it were written
``"candidate %select{function|constructor}3%select{| template| %1}2 not viable"``.
Description:
This format specifier is used to avoid repeating strings verbatim in multiple
@@ -447,7 +447,7 @@ For example, the binary expression error comes from code like this:
<< lex->getType() << rex->getType()
<< lex->getSourceRange() << rex->getSourceRange();
-This shows that use of the ``Diag`` method: it takes a location (a
+This shows the use of the ``Diag`` method: it takes a location (a
:ref:`SourceLocation <SourceLocation>` object) and a diagnostic enum value
(which matches the name from ``Diagnostic*Kinds.td``). If the diagnostic takes
arguments, they are specified with the ``<<`` operator: the first argument
@@ -586,7 +586,7 @@ Strangely enough, the ``SourceLocation`` class represents a location within the
source code of the program. Important design points include:
#. ``sizeof(SourceLocation)`` must be extremely small, as these are embedded
- into many AST nodes and are passed around often. Currently it is 32 bits.
+ into many AST nodes and are passed around often. Currently, it is 32 bits.
#. ``SourceLocation`` must be a simple value object that can be efficiently
copied.
#. We should be able to represent a source location for any byte of any input
@@ -605,7 +605,7 @@ In practice, the ``SourceLocation`` works together with the ``SourceManager``
class to encode two pieces of information about a location: its spelling
location and its expansion location. For most tokens, these will be the
same. However, for a macro expansion (or tokens that came from a ``_Pragma``
-directive) these will describe the location of the characters corresponding to
+directive), these will describe the location of the characters corresponding to
the token and the location where the token was used (i.e., the macro
expansion point or the location of the ``_Pragma`` itself).
@@ -621,7 +621,7 @@ token. This concept maps directly to the "spelling location" for the token.
.. mostly taken from https://discourse.llvm.org/t/code-ranges-of-tokens-ast-elements/16893/2
Clang represents most source ranges by [first, last], where "first" and "last"
-each point to the beginning of their respective tokens. For example consider
+each point to the beginning of their respective tokens. For example, consider
the ``SourceRange`` of the following statement:
.. code-block:: text
@@ -632,7 +632,7 @@ the ``SourceRange`` of the following statement:
To map from this representation to a character-based representation, the "last"
location needs to be adjusted to point to (or past) the end of that token with
either ``Lexer::MeasureTokenLength()`` or ``Lexer::getLocForEndOfToken()``. For
-the rare cases where character-level source ranges information is needed we use
+the rare cases where character-level source ranges information is needed, we use
the ``CharSourceRange`` class.
The Driver Library
@@ -651,17 +651,17 @@ The Frontend Library
====================
The Frontend library contains functionality useful for building tools on top of
-the Clang libraries, for example several methods for outputting diagnostics.
+the Clang libraries, including several methods for outputting diagnostics.
Compiler Invocation
-------------------
One of the classes provided by the Frontend library is ``CompilerInvocation``,
-which holds information that describe current invocation of the Clang ``-cc1``
+which holds information that describes the current invocation of the Clang ``-cc1``
frontend. The information typically comes from the command line constructed by
the Clang driver or from clients performing custom initialization. The data
structure is split into logical units used by different parts of the compiler,
-for example ``PreprocessorOptions``, ``LanguageOptions`` or ``CodeGenOptions``.
+for example, ``PreprocessorOptions``, ``LanguageOptions``, or ``CodeGenOptions``.
Command Line Interface
----------------------
@@ -698,7 +698,7 @@ Adding new Command Line Option
------------------------------
When adding a new command line option, the first place of interest is the header
-file declaring the corresponding options class (e.g. ``CodeGenOptions.h`` for
+file declaring the corresponding options class (e.g., ``CodeGenOptions.h`` for
command line option that affects the code generation). Create new member
variable for the option value:
@@ -739,7 +739,7 @@ The helper classes take a list of acceptable prefixes of the option (e.g.
Then, specify additional attributes via mix-ins:
* ``HelpText`` holds the text that will be printed besides the option name when
- the user requests help (e.g. via ``clang --help``).
+ the user requests help (e.g., via ``clang --help``).
* ``Group`` specifies the "category" of options this option belongs to. This is
used by various tools to categorize and sometimes filter options.
* ``Flags`` may contain "tags" associated with the option. These may affect how
@@ -779,7 +779,7 @@ use them to construct the ``-cc1`` job:
}
The last step is implementing the ``-cc1`` command line argument
-parsing/generation that initializes/serializes the option class (in our case
+parsing/generation that initializes/serializes the option class (in our case,
``CodeGenOptions``) stored within ``CompilerInvocation``. This can be done
automatically by using the marshalling annotations on the option definition:
@@ -946,13 +946,13 @@ described below. All of them take a key path argument and possibly other
information required for parsing or generating the command line argument.
**Note:** The marshalling infrastructure is not intended for driver-only
-options. Only options of the ``-cc1`` frontend need to be marshalled to/from
+options. Only options of the ``-cc1`` frontend need to be marshalled to/from a
``CompilerInvocation`` instance.
**Positive Flag**
The key path defaults to ``false`` and is set to ``true`` when the flag is
-present on command line.
+present on the command line.
.. code-block:: text
@@ -963,7 +963,7 @@ present on command line.
**Negative Flag**
The key path defaults to ``true`` and is set to ``false`` when the flag is
-present on command line.
+present on the command line.
.. code-block:: text
@@ -1041,7 +1041,7 @@ and the result is assigned to the key path on success.
The key path defaults to the value specified in ``MarshallingInfoEnum`` prefixed
by the contents of ``NormalizedValuesScope`` and ``::``. This ensures correct
-reference to an enum case is formed even if the enum resides in different
+reference to an enum case is formed even if the enum resides in a different
namespace or is an enum class. If the value present on the command line does not
match any of the comma-separated values from ``Values``, an error diagnostic is
issued. Otherwise, the corresponding element from ``NormalizedValues`` at the
@@ -1410,7 +1410,7 @@ or a clear engineering tradeoff -- should desugar minimally and wrap the result
in a construct representing the original source form.
For example, ``CXXForRangeStmt`` directly represents the syntactic form of a
-range-based for statement, but also holds a semantic representation of the
+range-based for statement but also holds a semantic representation of the
range declaration and iterator declarations. It does not contain a
fully-desugared ``ForStmt``, however.
@@ -1425,7 +1425,7 @@ with the same or similar semantics.
The ``Type`` class and its subclasses
-------------------------------------
-The ``Type`` class (and its subclasses) are an important part of the AST.
+The ``Type`` class (and its subclasses) is an important part of the AST.
Types are accessed through the ``ASTContext`` class, which implicitly creates
and uniques them as they are needed. Types have a couple of non-obvious
features: 1) they do not capture type qualifiers like ``const`` or ``volatile``
@@ -1474,7 +1474,7 @@ various operators (for example, the type of ``*Y`` is "``foo``", not
is an instance of the ``TypedefType`` class, which indicates that the type of
these expressions is a typedef for "``foo``".
-Representing types like this is great for diagnostics, because the
+Representing types like this is great for diagnostics because the
user-specified type is always immediately available. There are two problems
with this: first, various semantic checks need to make judgements about the
*actual structure* of a type, ignoring typedefs. Second, we need an efficient
@@ -1521,7 +1521,7 @@ know it exists. To continue the example, the result type of the indirection
operator is the pointee type of the subexpression. In order to determine the
type, we need to get the instance of ``PointerType`` that best captures the
typedef information in the program. If the type of the expression is literally
-a ``PointerType``, we can return that, otherwise we have to dig through the
+a ``PointerType``, we can return that; otherwise, we have to dig through the
typedefs to find the pointer type. For example, if the subexpression had type
"``foo*``", we could return that type as the result. If the subexpression had
type "``bar``", we want to return "``foo*``" (note that we do *not* want
@@ -1552,7 +1552,7 @@ that sets a bit), and remove one or more type qualifiers (just return a
``QualType`` with the bitfield set to empty).
Further, because the bits are stored outside of the type itself, we do not need
-to create duplicates of types with different sets of qualifiers (i.e. there is
+to create duplicates of types with different sets of qualifiers (i.e., there is
only a single heap allocated "``int``" type: "``const int``" and "``volatile
const int``" both point to the same heap allocated "``int``" type). This
reduces the heap size used to represent bits and also means we do not have to
@@ -1972,7 +1972,7 @@ and optimize code for it, but it's used as parsing continues to detect further
errors in the input. Clang-based tools also depend on such ASTs, and IDEs in
particular benefit from a high-quality AST for broken code.
-In presence of errors, clang uses a few error-recovery strategies to present the
+In the presence of errors, clang uses a few error-recovery strategies to present the
broken code in the AST:
- correcting errors: in cases where clang is confident about the fix, it
@@ -1981,7 +1981,7 @@ broken code in the AST:
provide more accurate subsequent diagnostics. Typo correction is a typical
example.
- representing invalid node: the invalid node is preserved in the AST in some
- form, e.g. when the "declaration" part of the declaration contains semantic
+ form, e.g., when the "declaration" part of the declaration contains semantic
errors, the Decl node is marked as invalid.
- dropping invalid node: this often happens for errors that we don’t have
graceful recovery. Prior to Recovery AST, a mismatched-argument function call
@@ -1994,9 +1994,9 @@ for broken code.
Recovery AST
^^^^^^^^^^^^
-The idea of Recovery AST is to use recovery nodes which act as a placeholder to
+The idea of Recovery AST is to use recovery nodes, which act as a placeholder to
maintain the rough structure of the parsing tree, preserve locations and
-children but have no language semantics attached to them.
+children, but have no language semantics attached to them.
For example, consider the following mismatched function call:
@@ -2031,10 +2031,10 @@ With Recovery AST, the AST looks like:
`-DeclRefExpr <col:9> 'int' lvalue ParmVar 'abc' 'int'
-An alternative is to use existing Exprs, e.g. CallExpr for the above example.
-This would capture more call details (e.g. locations of parentheses) and allow
+An alternative is to use existing Exprs, e.g., CallExpr for the above example.
+This would capture more call details (e.g., locations of parentheses) and allow
it to be treated uniformly with valid CallExprs. However, jamming the data we
-have into CallExpr forces us to weaken its invariants, e.g. arg count may be
+have into CallExpr forces us to weaken its invariants, e.g., arg count may be
wrong. This would introduce a huge burden on consumers of the AST to handle such
"impossible" cases. So when we're representing (rather than correcting) errors,
we use a distinct recovery node type with extremely weak invariants instead.
@@ -2048,7 +2048,7 @@ Types and dependence
^^^^^^^^^^^^^^^^^^^^
``RecoveryExpr`` is an ``Expr``, so it must have a type. In many cases the true
-type can't really be known until the code is corrected (e.g. a call to a
+type can't really be known until the code is corrected (e.g., a call to a
function that doesn't exist). And it means that we can't properly perform type
checks ...
[truncated]
|
zyn0217
approved these changes
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.