-
Notifications
You must be signed in to change notification settings - Fork 1
/
CHANGES.txt
278 lines (208 loc) · 12.4 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
0.8.20 - 2023-09-14
- For std::move generated code, don't rely on #include <type_traits> but
specifically #include <utility> (fixes compile failure on later
gcc compilers.)
0.8.18 - 2023-09-13
- IMPORTANT: UTF-8 is now the default encoding. This is a breaking change.
To specify the old RAW behavior, use the new --x-raw flag.
The --x-utf8 flag is now deprecated and will be removed in a future
release; for now it will issue a deprecation warning message.
Please see the https://carburetta.com/manual.html#unicode-support section
of the manual, in particular "Future direction" for the intended
future of raw mode (the way it handles line endings will change.)
- Fix for plain %type with no %constructor, %destructor or %move, the
data would not have been moved correctly internally from the generated
scanner to the generated parser. Work-around is easy (use %class if on
C++ or add any of %constructor, %destructor and %move if on C) but the
tragedy is you'd hit this on the simplest of examples, and not know
what's going on.
- Allow %% .. %% code snippet sections when inside a scanner section's
<MODE> { ... } -- this was always the intent, however, the "%%" section
delimiter was (like all section delimiters) seen as the end of the
preceeding section, implying that the grammar of that preceeding section
must have completed (i.e. no open braces, no open parentheses, etc.)
Fixed by leaving grammar and scanner sections open until the end of the
file.
- New "Tilly" example, this is a work in progress. Tilly is a tile matching
engine which is useful if you're implementing a compiler backend and have
to select machine instructions to match your expression trees, especially
for CISC type architectures. The Tilly example is a work in progress, it
implements the algorithm from the 1989 paper:
"Code Generation Using Tree Matching and Dynamic Programming"
by Alfred V. Aho, Mahadevan Ganapathi and Steven W.K. Tjiang.
It will take a while for this example to mature (and clean up) - for now
it is a good example of how to use Carburetta to implement a more complex
multi-stage parser.
0.8.16 - 2023-07-14
- Critical fixes as per below, please upgrade.
- Improved regression test t4, it previously was incomplete which did not
surface until a recent change. The test now checks for the correct
behavior.
- New `%header%` directive, it allows adding code directly to the generated
header to simplify resolving dependencies your scanners and parsers may
have.
Please see the manual at https://carburetta.com/manual.html for details.
- Changed default behavior for `extern "C" { ... }`, it is no longer
generated when Carburetta perceives that the scanner or parser is C++
code (i.e. detected by the presence of `%class` or `%common_class`.)
Please see the manual at https://carburetta.com/manual.html for details.
- New `%externc` and `%no_externc` directives, to let you directly control
whether the generated header is wrapped in `extern "C" { ... }` or not;
overriding any default behavior.
Please see the manual at https://carburetta.com/manual.html for details.
- Critical fix: `%type` without a `%destructor` would generate broken parsers.
- Critical fix: Buffer under-allocation in Carburetta itself, this could cause
corruption in the generated scanner and is presumed to be the reason why t4
previously passed.
- fix when generating --x-utf8 UTF-8 based parser, and <PREFIX>LEXICAL_ERROR
was returned (because the scanner could not match any pattern) and the first
codepoint was multi-byte, then only the first byte was returned as the
matching text for the error. Thus the scanner would restart and immediately
bork over the next subsequent continuation byte. This is now fixed, the
new (correct) behavior is for it to return the entire codepoint.
0.8.14 - 2023-06-21
- Added regression tests t12 to t15, all of which are C++ tests that test
common data in the same manner as we already tested symbol data.
- fix regression on moving data, specifying a move snippet for the common
type would invoke it from the wrong source location.
- fix for scanner pattern matches without sym but with common data
such matches would invoke the common data destructor twice, once following
the pattern action, and possibly again if the stack was reset or
cleaned up before the next token.
- fix for generated code skipping over initializer for "need_common_move"
variable (C++ only issue.)
- fix for common move snippets not being able to use `$0` to refer to the
source common data (could still use `${0}` but this is not symmetric with
`%type` -- while `${0}` typically refers to common data and `$0` to sym
specific data, the subject here is the data type, rather than how it is
used; so consistency across `%type` and `%common_type` is preferred over
the distinction between common data and symbol data.)
- `<prefix>token_common_data()`
Returns the common data associated with the current token. This is useful
in the specific scenario of a scanner / parser combination where the parser
runs into a _<PREFIX>SYNTAX_ERROR on the next token. This next token has
already been scanned (and therefore its common data constructed), but has
not yet moved into the parser stack. However, to report the error, the
common data may be very useful. Because there is no way to access it without
knowing about the implementation details of the scanner, the new
`<prefix>token_common_data()` function is provided.
Note that is can return NULL if there is currently no "next" token, which is
most of the time. It is probably only useful in the specific scenario above,
and even then only when the parser is not used stand-alone but in combination
with a scanner.
Please see the manual at https://carburetta.com/manual.html for details.
- `%common_class` directive
The `%common_class` directive is the C++ class variation for %common_type,
it provides the same convenience of use as `%class` but for the common data
types.
Please see the manual at https://carburetta.com/manual.html for details.
0.8.12 - 2023-06-02
- `%class` directive
The `%class` directive wraps up a few C++ features into a single directive
that is much more friendly for newcomers. It is a shorthand for the
`%type`, `%raii_constructor`, `%destructor` and `%move` directives in the
common scenario where `%type` is used to specify a C++ class.
Please see the manual at https://carburetta.com/manual.html for details.
- `%move` directive
`%move` is to be used alongside %constructor and %destructor directives, and
describes how to move the datatype from $0 to $$.
The `%move` directive while common for both C and C++, allows better support
for C++ classes in particular. If no `%move` directive is specified for a type,
then a `memcpy` is performed.
Please see the manual at https://carburetta.com/manual.html for details.
- preserve whitespace in types, this fixes a problem with C++ where
`std::int64_t` would be generated as `std : : int64_t` (note the
spaces) because Carburetta internally did not recognize `::` as
one token. This is now fixed; no spaces are generated in types anymore,
so while `::` is still not a token for Carburetta internally, it
no longer injects spaces.
- The new `%move` directive requires constructors and destructors to
be called far more religiously than before. This is a good thing,
but may require existing C code to have to use `%move` where it
could rely
- add t9 C++ tracing failing constructor cleanup test
This tests that whenever any constructor fail at whatever moment (via `throw`
or early `return`,) that all the destructors are still called correctly.
- add t10 C++ tracing failing move cleanup test
This is like t9, but for the `%move` directive; which is also allowed to
fail at any moment.
Note that no corresponding test exists for `%destructor` because it is
never allowed to fail.
- add t11 C++ tracing failing constructor carrying data
This is like t9 but the test-case has symbols carrying data.
- odd patch levels (e.g. 0.8.11 or the soon to be 0.8.13, but not 0.8.12) are
development versions, the commandline version information now reflects this.
- Manual updated to 0.8.12, see https://carburetta.com/manual.html for details.
0.8.10 - 2023-05-04
- bring C++ back up from regressions - please be careful to read
about the limitations in the manual.
- add t8 C++ simple calculator test case to unit/regression tests
- added GNU style --version and --help flags
- clear warnings across MSVC, GCC and Clang on C and C++.
- various important fixes.
- Flag --x-utf8 no longer generates the raw scanner table, the latter
may be quite substantial and is not used (by definition) when specifying
--x-utf8. This can be a substantial space savings.
- Manual (carburetta.com/manual.html) updated for 0.8.10, including the
innards of utf8 operation which was previously missing.
- In memoriam https://en.wikipedia.org/wiki/Remembrance_of_the_Dead
0.8.8 - 2023-04-21
- critical fixes relative to 0.8.6 and unit/regression testing to help
prevent such regressions in the future.
- unicode support inside character sets [\u{0}-\u{0x10FFFF}] for instance.
- support for unicode categories as per unicode annex 44 section 5.7.1.
Unicode categories can be specified as both \p{Letter} (matches only
letters) and \P{Letter} (matches anything but letters).
- Note that the default is still raw encoding, pass the --x-utf8 flag to
carburetta to enable utf-8 mode.
0.8.6 - 2023-02-27
- substantial fixes relative to 0.8.4 ; users are encouraged to upgrade if
not on 0.8.5 already.
- build issues under linux (missing limits.h)
- regular expression use of "." operator (the any character),
now spans the full unicode range, excluding only newline \xA,
carriage return \xD, NEXT LINE (NEL) \x85, and LINE SEPARATOR (LS)
\x2028. Otherwise any character from \x00 to \x10FFFF is matched.
- $is_mode(), allows checking the current scanner mode from within a
pattern action.
- <prefix>mode(), returns the current scanner mode to calling code.
- -x--utf8 - enables utf8 mode for the scanner. This is an experimental
function that enables reading UTF-8 as input. Note that, while the
core lexing engine can handle UTF-8, the regular expression parser
is still missing substantial features (so, for instance, while a
unicode literal like \u2028 can be used in a pattern, it cannot be
used inside a character class, e.g. [\u2028] is not yet valid.).
Eventually the intent is for utf8 mode to be the default and a raw,
byte oriented, mode to be the flag selectable mode.
0.8.4 - 2021-04-09
- inireader example - reads .ini files using Carburetta in a scanner-only
fashion. Demonstrates how to build a library-like encapsulation of a
carburetta scanner/parser where the user-code retains control of the
flow.
- the template_scan example, showing how to use only the scanner section,
without a grammar section. The example takes input and substitutes the
django like escaped fields like {{ this }} for files passed by argument
like "--this file".
- fix: <prefix>scan() header prototype was broken if no additional params
were passed in.
- fix: if, for whatever reason, the input has no epilogue section, both
the scanner and grammar sections should still complete their parsing.
(I.e. know they've reached the end of their inputs.) Previously, the
last line / directive / production could be ignored.
0.8.2 - 2021-04-06
- $chg_term() -- allows changing the terminal currently matched in the
scanner prior to that matched terminal entering the parser.
This is a fairly advanced feature that can be hazardous, nevertheless
some languages really need it (C).
- $set_mode() -- allows setting the current scanner mode using a special
identifier from inside a pattern action. This avoids the need of using
prefixes (for both the <prefix>set_mode function and its M_<PREFIX>
constants.
- Better error location management, in particular when the error does not
have any input associated with it (eg. for instance, "unexpected end of
input" on any of the sections, has no associated text and yet can occur
frequently if, e.g. a brace is mismatched.)
- Scannerless generation fix; previously had trouble generating without a
scanner but this is a supported method.
- Minor cleanups in code and NOTICE file.
0.8.0 - 2021-03-22 - Initial Release.