Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some kind of regression between 0.9.7 and 0.10. #88

Closed
ilyvion opened this issue Oct 28, 2019 · 10 comments
Closed

Some kind of regression between 0.9.7 and 0.10. #88

ilyvion opened this issue Oct 28, 2019 · 10 comments

Comments

@ilyvion
Copy link

ilyvion commented Oct 28, 2019

To be specific, I'm getting this issue with 7a31ef3, but not with efc666d. (The ones in-between causes me compilation errors, so it's hard to say which of them introduced the code that leads to this problem.)

This is my configuration:

Token type
#[derive(Debug, PartialEq, Clone, Copy, Logos)]
pub enum Token {
    #[end]
    EndOfProgram,

    #[regex = "\"[^\"]*\""]
    Text,

    #[regex = "'(\\\\x[0-9a-fA-F]+|\\\\[0-7]+|\\\\(n|t|r|\\\\|\"|'))'"]
    Char,

    #[regex = "-?([0-9]*\\.[0-9]+|[0-9]+\\.[0-9]*)"]
    Float,

    #[regex = "-?0(x|X)[0-9a-fA-F]+"]
    Hex,

    #[regex = "-?[0-9]+"]
    Integer,

    #[regex = "(struct|bool|char|int|float|string|color)_t"]
    Type,

    #[regex = "[a-zA-Z_]+"]
    Identifier,

    #[regex = "#[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]"]
    Color,

    #[token = "{"]
    BraceOpen,

    #[token = "}"]
    BraceClose,

    #[token = "="]
    Assign,

    #[token = ","]
    Comma,

    #[token = "["]
    BracketOpen,

    #[token = "]"]
    BracketClose,

    #[regex = "//[^\n]*"]
    #[token = "/*"]
    #[callback = "ignore_comments"]
    #[error]
    UnexpectedToken,
    UnexpectedEndOfProgram,
}
Sufficient amount of document causing lexing issues in 0.10
tileset "Default Tileset" {
	tileWidth=16
	tileHeight=16
	description="Foobar"
	author="BazQuux"
	version="0.15"

	texture "terrain.png"
	{
		terrain_sprite {
			sprites=[0]
		}

		terrain_sprite "grass" {
			sprites=[
			// Height < -1.0
				20,
			// Height < -0.5
				21,
			// Height < 0.0
				22,
			// Height < 0.25
				23,
			// Height < 0.5
				24,
			// Height < 1.0
				25,
			// Height < 2.0
				26,
			// Height < 4.0
				27,
			// Height >= 4.0
				28
			]
			heightSplits=[-1.0, -0.5, 0.0, 0.25, 0.5, 1.0, 2.0, 4.0]
			snowSprites=[152]
			edgeSprites=[16,17,18,19]
			snowEdgeSprites=[148,149,150,151]

			details=[192,193,194,195,208,209,210,211]
			snowedDetails=[196,197,198,199,212,213,214,215]
			detailsChance=0.0625

			burntOverlay=[32,33,34,35,36]
			corruption=[128,129,130,131,
						144,145,146,147,
						160,161,162,163,
						176,177,178,179]
		}
Relevant part of token stream on 0.9.7
    (
        Comma,
        ",",
        810..811,
    ),
    (
        Integer,
        "36",
        811..813,
    ),
    (
        BracketClose,
        "]",
        813..814,
    ),
    (
        Identifier,
        "corruption",
        819..829,
    ),
    (
        Assign,
        "=",
        829..830,
    ),
    (
        BracketOpen,
        "[",
        830..831,
    ),
    (
        Integer,
        "128",
        831..834,
    ),
Relevant part of token stream on 0.10
    (
        Comma,
        ",",
        810..811,
    ),
    (
        Integer,
        "36",
        811..813,
    ),
    (
        BracketClose,
        "]",
        813..814,
    ),
    (
        Identifier,
        "c",
        819..820,
    ),
    (
        Identifier,
        "orruption",
        820..829,
    ),
    (
        Assign,
        "=",
        829..830,
    ),
    (
        BracketOpen,
        "[",
        830..831,
    ),
    (
        Integer,
        "128",
        831..834,
    ),

As you can see, in 0.9.7, I get a single Identifier token with the value "corruption" from 819..830, while in 0.10, I get two Identifier tokens, one with the value "c" from 819..820, and one with the value "orruption" from 820..830.

This is obviously a bug; the lexer should do as in 0.9.7 and consume the whole identifier at once, not divide it like it does in 0.10.

By chance, after experimenting trying to figure out what differentiated identifiers that stayed whole from those that got divided like this before submitting this issue, I discovered that the cause is when an identifier has a two-or-more character overlap with any of the keywords in the Type token. "corruption" gets, well, corrupted, because of the "color," and if you change "burntOverlay" to "borntOverlay," (overlaps with "bool") you get the same behavior there. It seems like a partial match from one token doesn't get correctly dealt with when another token matches fully later. By removing/commenting out the Type token, the behavior ceases.

@delbato
Copy link

delbato commented Dec 16, 2019

I am experiencing the same issue. Unfortunately, development on this project seems to have died off a little :/

@pascalkuthe
Copy link

I had the same issue and found the cause. The characters starting off overlapping keywords are missing from the LUT for the regex. Seems to be a werid codegen bug.
However I found a workaround: Put the regex before the overlapping keywords in your token type (or more generally just put all regexes before all your keywords to avoid bugs until this is fixed).
I am using this in production so I have a high interest in the stability of this crate and might try to fix this problem myself if nobody else can find the time

@ilyvion
Copy link
Author

ilyvion commented Mar 6, 2020

Project seems mostly abandoned by original author at this point. Last commit was made nearly a year ago.

@pascalkuthe
Copy link

Yeah I am probably going to fix this myself the question is whether the author would still be willing to approve pull requests or let someone else take over the project. Otherwise I just have to fork this Project

@ebkalderon
Copy link

@DSPOM2 It's a shame this library isn't being more actively developed. I'd love it if you opened a PR for this either way just so the community can build on your work in the meantime.

@maciejhirsz
Copy link
Owner

@DSPOM2 I'm happy to accept PRs. Logos is in the back of my head, but it's also quite daunting since I reckon I might have to rewrite large parts of the derive part to make sure the compilation explosion that I've been trying to fix in #71 doesn't happen. I have some idea how to do it, but I'd need to probably sink good two weeks into it.

I tend to jump between projects and sink time into one thing at a time when I can, a year of a hiatus does not make a project dead :).

@ilyvion
Copy link
Author

ilyvion commented Mar 19, 2020

@maciejhirsz I'm glad to hear the project isn't abandoned, and I empathize with jumping between projects and viscerally understand how daunting it can be to have to sink several weeks into a rewrite to get something working right.

@maciejhirsz
Copy link
Owner

maciejhirsz commented Mar 30, 2020

@alexschrod 0.10.0-rc3 landed on crates, could you please check if the issue persists?

Edit: Given the number of bugs the refactored logos-derive has fixed (including some very similar ones), I'm going to go ahead and close this. Feel free to reopen if anything pops up again.

@ilyvion
Copy link
Author

ilyvion commented Apr 1, 2020

I don't appear to have the power to re-open, but I could always open a new issue or comment in this one.

In any case, I no longer have the project I was working on 6 months ago ready at hand, so unfortunately, I can't easily test this any better than anyone else, sorry.

@maciejhirsz
Copy link
Owner

@alexschrod Thanks for checking 🙇‍♂️.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants