CHANGES.txt

0.8.20 - 2023-09-14

 - For std::move generated code, don't rely on #include <type_traits> but
   specifically #include <utility> (fixes compile failure on later
   gcc compilers.)

0.8.18 - 2023-09-13

 - IMPORTANT: UTF-8 is now the default encoding. This is a breaking change.
   To specify the old RAW behavior, use the new --x-raw flag.
   The --x-utf8 flag is now deprecated and will be removed in a future
   release; for now it will issue a deprecation warning message.
   Please see the https://carburetta.com/manual.html#unicode-support section
   of the manual, in particular "Future direction" for the intended
   future of raw mode (the way it handles line endings will change.)

 - Fix for plain %type with no %constructor, %destructor or %move, the
   data would not have been moved correctly internally from the generated 
   scanner to the generated parser. Work-around is easy (use %class if on
   C++ or add any of %constructor, %destructor and %move if on C) but the
   tragedy is you'd hit this on the simplest of examples, and not know
   what's going on.

 - Allow %% .. %% code snippet sections when inside a scanner section's 
   <MODE> { ... } -- this was always the intent, however, the "%%" section
   delimiter was (like all section delimiters) seen as the end of the 
   preceeding section, implying that the grammar of that preceeding section
   must have completed (i.e. no open braces, no open parentheses, etc.)
   Fixed by leaving grammar and scanner sections open until the end of the 
   file.

 - New "Tilly" example, this is a work in progress. Tilly is a tile matching
   engine which is useful if you're implementing a compiler backend and have
   to select machine instructions to match your expression trees, especially
   for CISC type architectures. The Tilly example is a work in progress, it
   implements the algorithm from the 1989 paper:
   "Code Generation Using Tree Matching and Dynamic Programming"
   by Alfred V. Aho, Mahadevan Ganapathi and Steven W.K. Tjiang.
   It will take a while for this example to mature (and clean up) - for now
   it is a good example of how to use Carburetta to implement a more complex
   multi-stage parser.

0.8.16 - 2023-07-14

- Critical fixes as per below, please upgrade.

- Improved regression test t4, it previously was incomplete which did not
  surface until a recent change. The test now checks for the correct
  behavior.

- New `%header%` directive, it allows adding code directly to the generated
  header to simplify resolving dependencies your scanners and parsers may
  have.
  Please see the manual at https://carburetta.com/manual.html for details.

- Changed default behavior for `extern "C" { ... }`, it is no longer 
  generated when Carburetta perceives that the scanner or parser is C++
  code (i.e. detected by the presence of `%class` or `%common_class`.)
  Please see the manual at https://carburetta.com/manual.html for details.

- New `%externc` and `%no_externc` directives, to let you directly control
  whether the generated header is wrapped in `extern "C" { ... }` or not;
  overriding any default behavior.
  Please see the manual at https://carburetta.com/manual.html for details.

- Critical fix: `%type` without a `%destructor` would generate broken parsers. 

- Critical fix: Buffer under-allocation in Carburetta itself, this could cause 
  corruption in the generated scanner and is presumed to be the reason why t4 
  previously passed.

- fix when generating --x-utf8 UTF-8 based parser, and <PREFIX>LEXICAL_ERROR 
  was returned (because the scanner could not match any pattern) and the first 
  codepoint was multi-byte, then only the first byte was returned as the
  matching text for the error. Thus the scanner would restart and immediately
  bork over the next subsequent continuation byte. This is now fixed, the
  new (correct) behavior is for it to return the entire codepoint.


0.8.14 - 2023-06-21

- Added regression tests t12 to t15, all of which are C++ tests that test
  common data in the same manner as we already tested symbol data.

- fix regression on moving data, specifying a move snippet for the common 
  type would invoke it from the wrong source location.

- fix for scanner pattern matches without sym but with common data
  such matches would invoke the common data destructor twice, once following
  the pattern action, and possibly again if the stack was reset or
  cleaned up before the next token.

- fix for generated code skipping over initializer for "need_common_move"
  variable (C++ only issue.)

- fix for common move snippets not being able to use `$0` to refer to the 
  source common data (could still use `${0}` but this is not symmetric with
  `%type` -- while `${0}` typically refers to common data and `$0` to sym
  specific data, the subject here is the data type, rather than how it is
  used; so consistency across `%type` and `%common_type` is preferred over
  the distinction between common data and symbol data.)

- `<prefix>token_common_data()`
  Returns the common data associated with the current token. This is useful
  in the specific scenario of a scanner / parser combination where the parser
  runs into a _<PREFIX>SYNTAX_ERROR on the next token. This next token has
  already been scanned (and therefore its common data constructed), but has
  not yet moved into the parser stack. However, to report the error, the
  common data may be very useful. Because there is no way to access it without
  knowing about the implementation details of the scanner, the new
  `<prefix>token_common_data()` function is provided.
  Note that is can return NULL if there is currently no "next" token, which is
  most of the time. It is probably only useful in the specific scenario above,
  and even then only when the parser is not used stand-alone but in combination
  with a scanner.
  Please see the manual at https://carburetta.com/manual.html for details.

- `%common_class` directive
  The `%common_class` directive is the C++ class variation for %common_type,
  it provides the same convenience of use as `%class` but for the common data
  types. 
  Please see the manual at https://carburetta.com/manual.html for details.

0.8.12 - 2023-06-02

- `%class` directive
  The `%class` directive wraps up a few C++ features into a single directive
  that is much more friendly for newcomers. It is a shorthand for the
  `%type`, `%raii_constructor`, `%destructor` and `%move` directives in the
  common scenario where `%type` is used to specify a C++ class.
  Please see the manual at https://carburetta.com/manual.html for details.

- `%move` directive
  `%move` is to be used alongside %constructor and %destructor directives, and
  describes how to move the datatype from $0 to $$.
  The `%move` directive while common for both C and C++, allows better support 
  for C++ classes in particular. If no `%move` directive is specified for a type, 
  then a `memcpy` is performed.
  Please see the manual at https://carburetta.com/manual.html for details.

- preserve whitespace in types, this fixes a problem with C++ where
  `std::int64_t` would be generated as `std : : int64_t` (note the
  spaces) because Carburetta internally did not recognize `::` as
  one token. This is now fixed; no spaces are generated in types anymore,
  so while `::` is still not a token for Carburetta internally, it
  no longer injects spaces.

- The new `%move` directive requires constructors and destructors to
  be called far more religiously than before. This is a good thing,
  but may require existing C code to have to use `%move` where it 
  could rely 

- add t9 C++ tracing failing constructor cleanup test
  This tests that whenever any constructor fail at whatever moment (via `throw`
  or early `return`,) that all the destructors are still called correctly.

- add t10 C++ tracing failing move cleanup test
  This is like t9, but for the `%move` directive; which is also allowed to
  fail at any moment.
  Note that no corresponding test exists for `%destructor` because it is
  never allowed to fail.

- add t11 C++ tracing failing constructor carrying data
  This is like t9 but the test-case has symbols carrying data.

- odd patch levels (e.g. 0.8.11 or the soon to be 0.8.13, but not 0.8.12) are
  development versions, the commandline version information now reflects this.

- Manual updated to 0.8.12, see https://carburetta.com/manual.html for details.

0.8.10 - 2023-05-04

- bring C++ back up from regressions - please be careful to read
  about the limitations in the manual.

- add t8 C++ simple calculator test case to unit/regression tests

- added GNU style --version and --help flags

- clear warnings across MSVC, GCC and Clang on C and C++.

- various important fixes.

- Flag --x-utf8 no longer generates the raw scanner table, the latter
  may be quite substantial and is not used (by definition) when specifying
  --x-utf8. This can be a substantial space savings.

- Manual (carburetta.com/manual.html) updated for 0.8.10, including the
  innards of utf8 operation which was previously missing.

- In memoriam https://en.wikipedia.org/wiki/Remembrance_of_the_Dead

0.8.8 - 2023-04-21

- critical fixes relative to 0.8.6 and unit/regression testing to help 
  prevent such regressions in the future.

- unicode support inside character sets [\u{0}-\u{0x10FFFF}] for instance.

- support for unicode categories as per unicode annex 44 section 5.7.1.
  Unicode categories can be specified as both \p{Letter} (matches only
  letters) and \P{Letter} (matches anything but letters).

- Note that the default is still raw encoding, pass the --x-utf8 flag to
  carburetta to enable utf-8 mode.

0.8.6 - 2023-02-27

- substantial fixes relative to 0.8.4 ; users are encouraged to upgrade if
  not on 0.8.5 already.

- build issues under linux (missing limits.h)

- regular expression use of "." operator (the any character),
  now spans the full unicode range, excluding only newline \xA,
  carriage return \xD, NEXT LINE (NEL) \x85, and LINE SEPARATOR (LS) 
  \x2028. Otherwise any character from \x00 to \x10FFFF is matched.

- $is_mode(), allows checking the current scanner mode from within a 
  pattern action.

- <prefix>mode(), returns the current scanner mode to calling code.

- -x--utf8 - enables utf8 mode for the scanner. This is an experimental
  function that enables reading UTF-8 as input. Note that, while the
  core lexing engine can handle UTF-8, the regular expression parser
  is still missing substantial features (so, for instance, while a
  unicode literal like \u2028 can be used in a pattern, it cannot be
  used inside a character class, e.g. [\u2028] is not yet valid.).
  Eventually the intent is for utf8 mode to be the default and a raw,
  byte oriented, mode to be the flag selectable mode.


0.8.4 - 2021-04-09

- inireader example - reads .ini files using Carburetta in a scanner-only
  fashion. Demonstrates  how to build a  library-like  encapsulation of a
  carburetta  scanner/parser where  the user-code  retains control of the 
  flow.

- the template_scan example, showing how to use only the scanner section,
  without a grammar section. The  example takes input and substitutes the
  django like escaped fields like {{ this }} for files passed by argument
  like "--this file".

- fix: <prefix>scan() header prototype was broken if no additional params
  were passed in.

- fix: if, for whatever  reason, the input has  no epilogue section, both
  the scanner and  grammar sections should still  complete their parsing.
  (I.e. know  they've reached the  end of their inputs.)  Previously, the
  last line / directive / production could be ignored.

0.8.2 - 2021-04-06

- $chg_term() -- allows  changing the  terminal currently  matched in the
  scanner prior to that matched terminal entering the parser.
  This is a fairly advanced  feature that can  be hazardous, nevertheless
  some languages really need it (C).

- $set_mode() -- allows setting the current  scanner mode using a special
  identifier from inside a pattern  action. This avoids the need of using
  prefixes (for  both the  <prefix>set_mode  function and  its M_<PREFIX>
  constants.

- Better error location management, in particular when the error does not
  have any input associated with it (eg. for instance, "unexpected end of
  input" on any of the sections, has no associated text and yet can occur
  frequently if, e.g. a brace is mismatched.)

- Scannerless generation fix; previously had trouble generating without a 
  scanner but this is a supported method.

- Minor cleanups in code and NOTICE file.


0.8.0 - 2021-03-22 - Initial Release.