Skip to content

A code generated for period (.) requires 4 bytes. #243

@unixod

Description

@unixod

Hi,

I noticed interesting behavior when tried to generated a code for the following re2c-specification:

bool consumeOneCodePoint(InputCursor pos, InputCursor end)
{
    /*!re2c
        re2c:flags:utf-8 = 1;

        . { return pos; }
        * { return end; }
    */
}

as I figured out, the generated code, on input requires at least 4 bytes not 1:

/* Generated by re2c 1.1.1 on Mon Feb 11 12:20:12 2019 */
#line 1 "example.re"
bool consumeOneCodePoint(InputCursor pos, InputCursor end)
{

#line 7 "<stdout>"
{
	YYCTYPE yych;
	if ((YYLIMIT - YYCURSOR) < 4) YYFILL(4);
	yych = *YYCURSOR;
	switch (yych) {
	case 0x00:
        ...
}

this requirement seems a little bit strange to me, because UTF-8 code points aren't necessary to have 4 bytes length.

Could you help me please to clarify, whether this behavior is bug or an expected feature? :)


PS.
My initial code, where I encountered this behavior, was a little bit more complex and looked as follows:

struct ConsumptionResult {
    bool success;
    InputCursor pos;
};

inline ConsumptionResult execConsumeCodePoint(InputCursor begin, InputCursor end)
{
    auto pos = begin.get();
    [[maybe_unused]] decltype(pos) YYMARKER;

    /*!re2c
        re2c:define:YYCTYPE = 'std::decay_t<decltype(*pos)>';
        re2c:define:YYCURSOR = pos;
        re2c:define:YYLIMIT = end;
        re2c:define:YYFILL:naked = 1;
        re2c:define:YYFILL = 'return {false, begin};';
        re2c:flags:utf-8 = 1;

        . { return {true, pos}; }
        * { return {false, end}; }
    */
}

in other words, in the code, at triggering of YYFILL I intended to return a fail from function, due to exhaustion of input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions