-
Notifications
You must be signed in to change notification settings - Fork 193
Closed
Description
Hi,
I noticed interesting behavior when tried to generated a code for the following re2c-specification:
bool consumeOneCodePoint(InputCursor pos, InputCursor end)
{
/*!re2c
re2c:flags:utf-8 = 1;
. { return pos; }
* { return end; }
*/
}as I figured out, the generated code, on input requires at least 4 bytes not 1:
/* Generated by re2c 1.1.1 on Mon Feb 11 12:20:12 2019 */
#line 1 "example.re"
bool consumeOneCodePoint(InputCursor pos, InputCursor end)
{
#line 7 "<stdout>"
{
YYCTYPE yych;
if ((YYLIMIT - YYCURSOR) < 4) YYFILL(4);
yych = *YYCURSOR;
switch (yych) {
case 0x00:
...
}this requirement seems a little bit strange to me, because UTF-8 code points aren't necessary to have 4 bytes length.
Could you help me please to clarify, whether this behavior is bug or an expected feature? :)
PS.
My initial code, where I encountered this behavior, was a little bit more complex and looked as follows:
struct ConsumptionResult {
bool success;
InputCursor pos;
};
inline ConsumptionResult execConsumeCodePoint(InputCursor begin, InputCursor end)
{
auto pos = begin.get();
[[maybe_unused]] decltype(pos) YYMARKER;
/*!re2c
re2c:define:YYCTYPE = 'std::decay_t<decltype(*pos)>';
re2c:define:YYCURSOR = pos;
re2c:define:YYLIMIT = end;
re2c:define:YYFILL:naked = 1;
re2c:define:YYFILL = 'return {false, begin};';
re2c:flags:utf-8 = 1;
. { return {true, pos}; }
* { return {false, end}; }
*/
}in other words, in the code, at triggering of YYFILL I intended to return a fail from function, due to exhaustion of input.
Metadata
Metadata
Assignees
Labels
No labels