Some kind of regression between 0.9.7 and 0.10. #88

ilyvion · 2019-10-28T17:59:27Z

To be specific, I'm getting this issue with 7a31ef3, but not with efc666d. (The ones in-between causes me compilation errors, so it's hard to say which of them introduced the code that leads to this problem.)

This is my configuration:

Token type

#[derive(Debug, PartialEq, Clone, Copy, Logos)]
pub enum Token {
    #[end]
    EndOfProgram,

    #[regex = "\"[^\"]*\""]
    Text,

    #[regex = "'(\\\\x[0-9a-fA-F]+|\\\\[0-7]+|\\\\(n|t|r|\\\\|\"|'))'"]
    Char,

    #[regex = "-?([0-9]*\\.[0-9]+|[0-9]+\\.[0-9]*)"]
    Float,

    #[regex = "-?0(x|X)[0-9a-fA-F]+"]
    Hex,

    #[regex = "-?[0-9]+"]
    Integer,

    #[regex = "(struct|bool|char|int|float|string|color)_t"]
    Type,

    #[regex = "[a-zA-Z_]+"]
    Identifier,

    #[regex = "#[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]"]
    Color,

    #[token = "{"]
    BraceOpen,

    #[token = "}"]
    BraceClose,

    #[token = "="]
    Assign,

    #[token = ","]
    Comma,

    #[token = "["]
    BracketOpen,

    #[token = "]"]
    BracketClose,

    #[regex = "//[^\n]*"]
    #[token = "/*"]
    #[callback = "ignore_comments"]
    #[error]
    UnexpectedToken,
    UnexpectedEndOfProgram,
}

Sufficient amount of document causing lexing issues in 0.10

tileset "Default Tileset" {
	tileWidth=16
	tileHeight=16
	description="Foobar"
	author="BazQuux"
	version="0.15"

	texture "terrain.png"
	{
		terrain_sprite {
			sprites=[0]
		}

		terrain_sprite "grass" {
			sprites=[
			// Height < -1.0
				20,
			// Height < -0.5
				21,
			// Height < 0.0
				22,
			// Height < 0.25
				23,
			// Height < 0.5
				24,
			// Height < 1.0
				25,
			// Height < 2.0
				26,
			// Height < 4.0
				27,
			// Height >= 4.0
				28
			]
			heightSplits=[-1.0, -0.5, 0.0, 0.25, 0.5, 1.0, 2.0, 4.0]
			snowSprites=[152]
			edgeSprites=[16,17,18,19]
			snowEdgeSprites=[148,149,150,151]

			details=[192,193,194,195,208,209,210,211]
			snowedDetails=[196,197,198,199,212,213,214,215]
			detailsChance=0.0625

			burntOverlay=[32,33,34,35,36]
			corruption=[128,129,130,131,
						144,145,146,147,
						160,161,162,163,
						176,177,178,179]
		}

Relevant part of token stream on 0.9.7

    (
        Comma,
        ",",
        810..811,
    ),
    (
        Integer,
        "36",
        811..813,
    ),
    (
        BracketClose,
        "]",
        813..814,
    ),
    (
        Identifier,
        "corruption",
        819..829,
    ),
    (
        Assign,
        "=",
        829..830,
    ),
    (
        BracketOpen,
        "[",
        830..831,
    ),
    (
        Integer,
        "128",
        831..834,
    ),

Relevant part of token stream on 0.10

    (
        Comma,
        ",",
        810..811,
    ),
    (
        Integer,
        "36",
        811..813,
    ),
    (
        BracketClose,
        "]",
        813..814,
    ),
    (
        Identifier,
        "c",
        819..820,
    ),
    (
        Identifier,
        "orruption",
        820..829,
    ),
    (
        Assign,
        "=",
        829..830,
    ),
    (
        BracketOpen,
        "[",
        830..831,
    ),
    (
        Integer,
        "128",
        831..834,
    ),

As you can see, in 0.9.7, I get a single Identifier token with the value "corruption" from 819..830, while in 0.10, I get two Identifier tokens, one with the value "c" from 819..820, and one with the value "orruption" from 820..830.

This is obviously a bug; the lexer should do as in 0.9.7 and consume the whole identifier at once, not divide it like it does in 0.10.

By chance, after experimenting trying to figure out what differentiated identifiers that stayed whole from those that got divided like this before submitting this issue, I discovered that the cause is when an identifier has a two-or-more character overlap with any of the keywords in the Type token. "corruption" gets, well, corrupted, because of the "color," and if you change "burntOverlay" to "borntOverlay," (overlaps with "bool") you get the same behavior there. It seems like a partial match from one token doesn't get correctly dealt with when another token matches fully later. By removing/commenting out the Type token, the behavior ceases.

The text was updated successfully, but these errors were encountered:

delbato · 2019-12-16T15:03:16Z

I am experiencing the same issue. Unfortunately, development on this project seems to have died off a little :/

pascalkuthe · 2020-03-05T21:49:21Z

I had the same issue and found the cause. The characters starting off overlapping keywords are missing from the LUT for the regex. Seems to be a werid codegen bug.
However I found a workaround: Put the regex before the overlapping keywords in your token type (or more generally just put all regexes before all your keywords to avoid bugs until this is fixed).
I am using this in production so I have a high interest in the stability of this crate and might try to fix this problem myself if nobody else can find the time

ilyvion · 2020-03-06T16:19:35Z

Project seems mostly abandoned by original author at this point. Last commit was made nearly a year ago.

pascalkuthe · 2020-03-07T12:21:17Z

Yeah I am probably going to fix this myself the question is whether the author would still be willing to approve pull requests or let someone else take over the project. Otherwise I just have to fork this Project

ebkalderon · 2020-03-14T17:05:44Z

@DSPOM2 It's a shame this library isn't being more actively developed. I'd love it if you opened a PR for this either way just so the community can build on your work in the meantime.

maciejhirsz · 2020-03-19T00:54:58Z

@DSPOM2 I'm happy to accept PRs. Logos is in the back of my head, but it's also quite daunting since I reckon I might have to rewrite large parts of the derive part to make sure the compilation explosion that I've been trying to fix in #71 doesn't happen. I have some idea how to do it, but I'd need to probably sink good two weeks into it.

I tend to jump between projects and sink time into one thing at a time when I can, a year of a hiatus does not make a project dead :).

ilyvion · 2020-03-19T07:46:18Z

@maciejhirsz I'm glad to hear the project isn't abandoned, and I empathize with jumping between projects and viscerally understand how daunting it can be to have to sink several weeks into a rewrite to get something working right.

maciejhirsz · 2020-03-30T21:51:26Z

@alexschrod 0.10.0-rc3 landed on crates, could you please check if the issue persists?

Edit: Given the number of bugs the refactored logos-derive has fixed (including some very similar ones), I'm going to go ahead and close this. Feel free to reopen if anything pops up again.

ilyvion · 2020-04-01T05:30:25Z

I don't appear to have the power to re-open, but I could always open a new issue or comment in this one.

In any case, I no longer have the project I was working on 6 months ago ready at hand, so unfortunately, I can't easily test this any better than anyone else, sorry.

maciejhirsz · 2020-04-01T07:46:15Z

@alexschrod Thanks for checking 🙇‍♂️.

This was referenced Mar 22, 2020

Refactor logos-derive from tree to graph #94

Merged

Incorrect tokenizing #95

Closed

maciejhirsz closed this as completed Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some kind of regression between 0.9.7 and 0.10. #88

Some kind of regression between 0.9.7 and 0.10. #88

ilyvion commented Oct 28, 2019 •

edited

Loading

delbato commented Dec 16, 2019

pascalkuthe commented Mar 5, 2020

ilyvion commented Mar 6, 2020

pascalkuthe commented Mar 7, 2020

ebkalderon commented Mar 14, 2020

maciejhirsz commented Mar 19, 2020

ilyvion commented Mar 19, 2020

maciejhirsz commented Mar 30, 2020 •

edited

Loading

ilyvion commented Apr 1, 2020

maciejhirsz commented Apr 1, 2020

Some kind of regression between 0.9.7 and 0.10. #88

Some kind of regression between 0.9.7 and 0.10. #88

Comments

ilyvion commented Oct 28, 2019 • edited Loading

delbato commented Dec 16, 2019

pascalkuthe commented Mar 5, 2020

ilyvion commented Mar 6, 2020

pascalkuthe commented Mar 7, 2020

ebkalderon commented Mar 14, 2020

maciejhirsz commented Mar 19, 2020

ilyvion commented Mar 19, 2020

maciejhirsz commented Mar 30, 2020 • edited Loading

ilyvion commented Apr 1, 2020

maciejhirsz commented Apr 1, 2020

ilyvion commented Oct 28, 2019 •

edited

Loading

maciejhirsz commented Mar 30, 2020 •

edited

Loading