From 511bf66dbe58893f2a0a9b1eb436afc7a8c24585 Mon Sep 17 00:00:00 2001 From: Petr Skocik Date: Tue, 4 Apr 2017 15:08:00 +0200 Subject: [PATCH] more manpage fixes --- re2c/doc/manpage.rst.in | 184 ++++++++++++++++++++-------------------- 1 file changed, 90 insertions(+), 94 deletions(-) diff --git a/re2c/doc/manpage.rst.in b/re2c/doc/manpage.rst.in index 1529146c7..9555d1a9c 100644 --- a/re2c/doc/manpage.rst.in +++ b/re2c/doc/manpage.rst.in @@ -41,7 +41,7 @@ OPTIONS ``-d --debug-output`` Creates a parser that dumps information about the current position and the state the parser is in. - This is useful to debug parser issues and states. If you use this + This is useful for debugging parser issues and states. If you use this switch, you need to define a ``YYDEBUG`` macro, which will be called like a function with two parameters: ``void YYDEBUG (int state, char current)``. The first parameter receives the state or ``-1`` and the second parameter @@ -55,8 +55,8 @@ OPTIONS ``-e --ecb`` Generate a parser that supports EBCDIC. The generated code can deal with any character up to 0xFF. In this mode, ``re2c`` assumes - that input character size is 1 byte. This switch is incompatible with - ``-w``, ``-x``, ``-u`` and ``-8``. + an input character size of 1 byte. This switch is incompatible with + ``-w``, ``-x``, ``-u``, and ``-8``. ``-f --storable-state`` Generate a scanner with support for storable state. @@ -64,22 +64,22 @@ OPTIONS ``-F --flex-syntax`` Partial support for flex syntax. When this flag is active, named definitions must be surrounded by curly braces and - can be defined without an equal sign and the terminating semi colon. + can be defined without an equal sign and the terminating semicolon. Instead, names are treated as direct double quoted strings. ``-g --computed-gotos`` Generate a scanner that utilizes GCC's - computed goto feature. That is, ``re2c`` generates jump tables whenever a - decision is of a certain complexity (e.g., a lot of if conditions are + computed-goto feature. That is, ``re2c`` generates jump tables whenever a + decision is of certain complexity (e.g., a lot of if conditions would be otherwise necessary). This is only usable with compilers that support this feature. Note that this implies ``-b`` and that the complexity threshold can be configured - using the inplace configuration ``cgoto:threshold``. + using the ``cgoto:threshold`` inplace configuration. ``-i --no-debug-info`` Do not output ``#line`` information. This is - useful when you want use a CMS tool with the ``re2c`` output which you - might want if you do not require your users to have ``re2c`` themselves - when building from your source. + useful when you want use a CMS tool with ``re2c``'s output. You might + want to do this if you do not want to impose re2c as a build requirement + for your source. ``-o OUTPUT --output=OUTPUT`` Specify the ``OUTPUT`` file. @@ -89,9 +89,9 @@ OPTIONS In this mode, no ``/*!re2c */`` block and exactly one ``/*!rules:re2c */`` must be present. The rules are saved and used by every ``/*!use:re2c */`` block that follows. These blocks can contain inplace configurations, especially ``re2c:flags:e``, - ``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u`` and ``re2c:flags:8``. + ``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u``, and ``re2c:flags:8``. That way it is possible to create the same scanner multiple times for - different character types, different input mechanisms or different output mechanisms. + different character types, different input mechanisms, or different output mechanisms. The ``/*!use:re2c */`` blocks can also contain additional rules that will be appended to the set of rules in ``/*!rules:re2c */``. @@ -107,41 +107,41 @@ OPTIONS ``-u --unicode`` Generate a parser that supports UTF-32. The generated code can deal with any valid Unicode character up to 0x10FFFF. In this - mode ``re2c`` assumes that input character size is 4 bytes. This switch is - incompatible with ``-e``, ``-w``, ``-x`` and ``-8``. This implies ``-s``. + mode, ``re2c`` assumes an input character size of 4 bytes. This switch is + incompatible with ``-e``, ``-w``, ``-x``, and ``-8``. This implies ``-s``. ``-v --version`` Show version information. ``-V --vernum`` - Show the version as a number XXYYZZ. + Show the version as a number in the MMmmpp (Majorm, minor, patch) format. ``-w --wide-chars`` Generate a parser that supports UCS-2. The generated code can deal with any valid Unicode character up to 0xFFFF. - In this mode ``re2c`` assumes that input character size is 2 bytes. This - switch is incompatible with ``-e``, ``-x``, ``-u`` and ``-8``. This implies + In this mode, ``re2c`` assumes an input character size of 2 bytes. This + switch is incompatible with ``-e``, ``-x``, ``-u``, and ``-8``. This implies ``-s``. ``-x --utf-16`` Generate a parser that supports UTF-16. The generated code can deal with any valid Unicode character up to 0x10FFFF. In this - mode ``re2c`` assumes that input character size is 2 bytes. This switch is - incompatible with ``-e``, ``-w``, ``-u`` and ``-8``. This implies ``-s``. + mode, ``re2c`` assumes an input character size of 2 bytes. This switch is + incompatible with ``-e``, ``-w``, ``-u``, and ``-8``. This implies ``-s``. ``-8 --utf-8`` Generate a parser that supports UTF-8. The generated code can deal with any valid Unicode character up to 0x10FFFF. In this - mode ``re2c`` assumes that input character size is 1 byte. This switch is - incompatible with ``-e``, ``-w``, ``-x`` and ``-u``. + mode, ``re2c`` assumes an input character size of 1 byte. This switch is + incompatible with ``-e``, ``-w``, ``-x``, and ``-u``. ``--case-insensitive`` - All strings are case insensitive, so all - "-expressions are treated in the same way '-expressions are. + Makes all strings case insensitive. This makes + "-quoted expressions behave as '-quoted expressions. ``--case-inverted`` Invert the meaning of single and double quoted - strings. With this switch single quotes are case sensitive and double + strings. With this switch, single quotes are case sensitive and double quotes are case insensitive. ``--no-generation-date`` @@ -156,16 +156,15 @@ OPTIONS ``--encoding-policy POLICY`` Specify how ``re2c`` must treat Unicode surrogates. ``POLICY`` can be one of the following: ``fail`` (abort with - error when surrogate encountered), ``substitute`` (silently substitute - surrogate with error code point 0xFFFD), ``ignore`` (treat surrogates as - normal code points). By default ``re2c`` ignores surrogates (for backward - compatibility). Unicode standard says that standalone surrogates are + an error when a surrogate is encountered), ``substitute`` (silently replace + surrogates with the error code point 0xFFFD), ``ignore`` (treat surrogates as + normal code points). By default, ``re2c`` ignores surrogates (for backward + compatibility). The Unicode standard says that standalone surrogates are invalid code points, but different libraries and programs treat them differently. ``--input INPUT`` - Specify re2c input API. ``INPUT`` can be one of the - following: ``default``, ``custom``. + Specify re2c's input API. ``INPUT`` can be either ``default`` or ``custom``. ``-S --skeleton`` Instead of embedding re2c-generated code into C/C++ @@ -173,21 +172,22 @@ OPTIONS for correctness and performance testing. ``--empty-class POLICY`` - What to do if user inputs empty character + What to do if the user uses an empty character class. ``POLICY`` can be one of the following: ``match-empty`` (match empty input: pretty illogical, but this is the default for backwards - compatibility reason), ``match-none`` (fail to match on any input), + compatibility reasons), ``match-none`` (fail to match on any input), ``error`` (compilation error). Note that there are various ways to - construct empty class, e.g: [], [^\\x00-\\xFF], + construct an empty class, e.g., [], [^\\x00-\\xFF], [\\x00-\\xFF][\\x00-\\xFF]. ``--dfa-minimization `` - Internal algorithm used by re2c to minimize the DFA (defaults to ``moore``). - Both table filling and Moore's algorithms should produce the same DFA (up to states relabelling). + The internal algorithm used by re2c to minimize the DFA (defaults to ``moore``). + Both the table filling algorithm and the Moore algorithm should produce the same DFA (up to states relabeling). The table filling algorithm is much simpler and slower; it serves as a reference implementation. + ``-1 --single-pass`` - Deprecated and does nothing (single pass is by default now). + Deprecated. Does nothing (single pass is the default now). .. ./gh-pages-gen/src/manual/warnings/warnings_general.rst @@ -196,8 +196,8 @@ OPTIONS ``-Werror`` Turn warnings into errors. Note that this option alone - doesn't turn on any warnings; it only affects those warnings that have - been turned on so far or will be turned on later. + doesn't turn on any warnings at all; it only affects those warnings that have + been turned on so far or those that will be turned on later. ``-W`` Turn on a ``warning``. @@ -216,33 +216,33 @@ OPTIONS ``-Wcondition-order`` Warn if the generated program makes implicit - assumptions about condition numbering. One should use either ``-t, --type-header`` option or - ``/*!types:re2c*/`` directive to generate mapping of condition names to numbers and use - autogenerated condition names. + assumptions about condition numbering. You should use either the ``-t, --type-header`` option or + the ``/*!types:re2c*/`` directive to generate a mapping of condition names to numbers and then use + the autogenerated condition names. ``-Wempty-character-class`` - Warn if regular expression contains empty - character class. From the rational point of view trying to match empty + Warn if a regular expression contains an empty + character class. Rationally, trying to match an empty character class makes no sense: it should always fail. However, for - backwards compatibility reasons ``re2c`` allows empty character class and - treats it as empty string. Use ``--empty-class`` option to change default + backwards compatibility reasons, ``re2c`` allows empty character classes and + treats them as empty strings. Use the ``--empty-class`` option to change the default behavior. ``-Wmatch-empty-string`` - Warn if regular expression in a rule is - nullable (matches empty string). If DFA runs in a loop and empty match - is unintentional (input position in not advanced manually), lexer may - get stuck in eternal loop. + Warn if a regular expression in a rule is + nullable (matches an empty string). If the DFA runs in a loop and an empty match + is unintentional (the input position in not advanced manually), the lexer may + get stuck in an infinite loop. ``-Wswapped-range`` - Warn if range lower bound is greater that upper - bound. Default ``re2c`` behavior is to silently swap range bounds. + Warn if the lower bound of a range is greater than its upper + bound. The default behavior is to silently swap the range bounds. ``-Wundefined-control-flow`` Warn if some input strings cause undefined - control flow in lexer (the faulty patterns are reported). This is the - most dangerous and common mistake. It can be easily fixed by adding - default rule ``*`` (this rule has the lowest priority, matches any code unit and consumes + control flow in the lexer (the faulty patterns are reported). This is the + most dangerous and most common mistake. It can be easily fixed by adding + the default rule (``*``) (this rule has the lowest priority, matches any code unit, and consumes exactly one code unit). ``-Wunreachable-rules`` @@ -250,8 +250,8 @@ OPTIONS ``-Wuseless-escape`` Warn if a symbol is escaped when it shouldn't be. - By default re2c silently ignores escape, but this may as well indicate a - typo or an error in escape sequence. + By default, re2c silently ignores such escapes, but this may as well indicate a + typo or error in the escape sequence. INTERFACE CODE -------------- @@ -289,7 +289,7 @@ depends on the particular use case. ``YYDEBUG (state, current)`` This is only needed if the ``-d`` flag was - specified. It allows to easily debug the generated parser by calling a + specified. It allows easy debugging of the generated parser by calling a user defined function for every state. The function should have the following signature: ``void YYDEBUG (int state, char current)``. The first parameter receives the state or -1 and the second parameter receives the @@ -301,13 +301,13 @@ depends on the particular use case. provided. ``YYFILL (n)`` should adjust ``YYCURSOR``, ``YYLIMIT``, ``YYMARKER``, and ``YYCTXMARKER`` as needed. Note that for typical programming languages ``n`` will be the length of the longest keyword plus one. The user can - place a comment of the form ``/*!max:re2c*/`` to insert ``YYMAXFILL`` definition that is set to the maximum + place a comment of the form ``/*!max:re2c*/`` to insert a ``YYMAXFILL`` define set to the maximum length value. ``YYGETCONDITION ()`` This define is used to get the condition prior to entering the scanner code when using the ``-c`` switch. The value must be - initialized with a value from the enumeration ``YYCONDTYPE`` type. + initialized with a value from the ``YYCONDTYPE`` enumeration type. ``YYGETSTATE ()`` The user only needs to define this macro if the ``-f`` @@ -320,15 +320,15 @@ depends on the particular use case. ``YYFILL (n)`` was called. ``YYLIMIT`` - Expression of type ``YYCTYPE *`` that marks the end of the buffer ``YYLIMIT[-1]`` + An expression of type ``YYCTYPE *`` that marks the end of the buffer ``YYLIMIT[-1]`` is the last character in the buffer). The generated code repeatedly compares ``YYCURSOR`` to ``YYLIMIT`` to determine when the buffer needs (re)filling. ``YYMARKER`` - l-value of type ``YYCTYPE *``. + An l-value of type ``YYCTYPE *``. The generated code saves backtracking information in ``YYMARKER``. Some - easy scanners might not use this. + simple scanners might not use this. ``YYMAXFILL`` This will be automatically defined by ``/*!max:re2c*/`` blocks as explained above. @@ -354,7 +354,7 @@ depends on the particular use case. SYNTAX ------ -Code for ``re2c`` consists of a set of ``RULES``, ``NAMED DEFINITIONS`` and +Code for ``re2c`` consists of a set of ``RULES``, ``NAMED DEFINITIONS``, and ``INPLACE CONFIGURATIONS``. @@ -525,11 +525,11 @@ INPLACE CONFIGURATIONS ``re2c:yych:conversion = 0;`` When this setting is non zero, ``re2c`` automatically generates - conversion code whenever yych gets read. In this case the type must be + conversion code whenever yych gets read. In this case, the type must be defined using ``re2c:define:YYCTYPE``. ``re2c:yych:emit = 1;`` - The generation of *yych* can be suppressed by setting this to 0. + Set this to zero to suppress the generation of *yych*. ``re2c:yybm:hex = 0;`` If set to zero, a decimal table will be used. Otherwise, a hexadecimal table will be generated. @@ -540,7 +540,7 @@ INPLACE CONFIGURATIONS introduce several security issues to your program. ``re2c:yyfill:check = 1;`` - This can be set to 0 to suppress the generations of + This can be set to 0 to suppress the generation of ``YYCURSOR`` and ``YYLIMIT`` based precondition checks. This option is useful when ``YYLIMIT + YYMAXFILL`` is always accessible. @@ -791,12 +791,13 @@ REGULAR EXPRESSIONS Character classes and string literals may contain octal or hexadecimal character definitions and the following set of escape sequences: ``\a``, ``\b``, ``\f``, ``\n``, ``\r``, ``\t``, ``\v``, ``\\``. An octal character is defined by a backslash -followed by its three octal digits (e.g. ``\377``). -Hexadecimal characters from 0 to 0xFF are defined by backslash, a lower -cased ``x`` and two hexadecimal digits (e.g. ``\x12``). Hexadecimal characters from 0x100 to 0xFFFF are defined by backslash, a lower cased -``\u`` or an upper cased ``\X`` and four hexadecimal digits (e.g. ``\u1234``). -Hexadecimal characters from 0x10000 to 0xFFFFffff are defined by backslash, an upper cased ``\U`` -and eight hexadecimal digits (e.g. ``\U12345678``). +followed by its three octal digits (e.g., ``\377``). +Hexadecimal characters from 0 to 0xFF are defined by a backslash, a lower +case ``x`` and two hexadecimal digits (e.g., ``\x12``). Hexadecimal characters from 0x100 to 0xFFFF are defined by a backslash, a lower case +``\u``or an upper case ``\X``, and four hexadecimal digits (e.g., ``\u1234``). +Hexadecimal characters from 0x10000 to 0xFFFFffff are defined by a backslash, an upper case ``\U``, +and eight hexadecimal digits (e.g., ``\U12345678``). + The only portable "any" rule is the default rule, ``*``. @@ -804,20 +805,21 @@ The only portable "any" rule is the default rule, ``*``. SCANNER WITH STORABLE STATES ---------------------------- +.. ./gh-pages-gen/src/manual/features/state/state.rst + When the ``-f`` flag is specified, ``re2c`` generates a scanner that can -store its current state, return to the caller, and later resume +store its current state, return to its caller, and later resume operations exactly where it left off. -The default operation of ``re2c`` is a -"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner" -the parsing loop, and that may not always be convenient. +The default mode of operation in ``re2c`` is a +"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner" of the parsing loop, and that may not always be convenient. Typically, if there is a preprocessor ahead of the scanner in the -stream, or for that matter any other procedural source of data, the +stream, or for that matter, any other procedural source of data, the scanner cannot "ask" for more data unless both the scanner and the source live in separate threads. -The ``-f`` flag is useful for exactly for situations like that: it lets users design +The ``-f`` flag is useful exactly for situations like that: it lets users design scanners that work in a "push" model, i.e., a model where data is fed to the scanner chunk by chunk. When the scanner runs out of data to consume, it stores its state and returns to the caller. When more input data is @@ -828,13 +830,13 @@ Changes needed compared to the "pull" model: * The user has to supply macros named ``YYSETSTATE ()`` and ``YYGETSTATE (state)``. * The ``-f`` option inhibits declaration of ``yych`` and ``yyaccept``, so the - user has to declare them. Also the user has to save and restore them. + user has to declare them and save and restore them where required. In the ``examples/push_model/push.re`` example, these are declared as fields of a (C++) class of which the scanner is a method, so they do not need to be saved/restored explicitly. For C, they could, e.g., be made macros that select fields from a structure passed in as a parameter. Alternatively, they could be declared as local variables, saved with - ``YYFILL (n)`` when it decides to return and restore upon entering the + ``YYFILL (n)`` when it decides to return and restored upon entering the function. Also, it could be more efficient to save the state from ``YYFILL (n)`` because ``YYSETSTATE (state)`` is called unconditionally. ``YYFILL (n)`` however does not get ``state`` as a parameter, so we would have @@ -845,7 +847,7 @@ Changes needed compared to the "pull" model: * Modify the caller to recognize if more input is needed and respond appropriately. * The generated code will contain a switch block that is used to - restores the last state by jumping behind the corresponding ``YYFILL (n)`` + restore the last state by jumping behind the corresponding ``YYFILL (n)`` call. This code is automatically generated in the epilogue of the first ``/*!re2c */`` block. It is possible to trigger generation of the ``YYGETSTATE ()`` block earlier by placing a ``/*!getstate:re2c*/`` comment. This is especially useful when the scanner code should be @@ -872,7 +874,7 @@ There are two special rule types. First, the rules of the condition ``<*>`` are merged to all conditions (note that they have a lower priority than other rules of that condition). And second, the empty condition list allows to provide a code block that does not have a scanner part, -meaning it does not allow any regular expression. The condition value +meaning it does not allow any regular expressions. The condition value referring to this special block is always the one with the enumeration value 0. This way the code of this special rule can be used to initialize a scanner. It is in no way necessary to have these rules: but @@ -883,14 +885,14 @@ Non empty rules allow to specify the new condition, which makes them transition rules. Besides generating calls for the ``YYSETCONDTITION`` define, no other special code is generated. -There is another kind of special rules that allows to prepend code to any +There is another kind of special rule that allows to prepend code to any code block of all rules of a certain set of conditions or to all code -blocks to all rules. This can be helpful when some operation is common +blocks of all rules. This can be helpful when some operation is common among rules. For instance, this can be used to store the length of the scanned string. These special setup rules start with an exclamation mark followed by either a list of conditions ```` or a star ````. When ``re2c`` generates the code for a rule whose state does not have a -setup rule and a starred setup rule is present, that code will be +setup rule and a starred setup rule is present, the starred setup code will be used as setup code. @@ -941,20 +943,14 @@ of code units. * UTF-8 is a variable-length encoding. Its code space includes all Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One - code point is represented with sequence of one, two, three, or four - 1-byte code units. Size of ``YYCTYPE`` must be 1 byte. + code point is represented with a sequence of one, two, three, or four + 1-byte code units. The size of ``YYCTYPE`` must be 1 byte. In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not valid Unicode code points. Any encoded sequence of code units that would map to Unicode code points in the range 0xD800-0xDFFF, is ill-formed. The user can control how ``re2c`` treats such ill-formed -sequences with the ``--encoding-policy `` flag. - -For some encodings, there are code units that never occur in a valid -encoded stream (e.g., 0xFF byte in UTF-8). If the generated scanner must -check for invalid input, the only correct way to do so is to use the default -rule (``*``). Note that the full range rule (``[^]``) won't catch invalid code units when a variable-length encoding is used -(``[^]`` means "any valid code point", whereas the default rule (``*``) means "any possible code unit"). +sequences with the ``--encoding-policy `` switch. GENERIC INPUT API