Inconsistent internal parser state #53

pithub · 2022-11-14T09:02:50Z

This issue describes a bug in Elm.Kernel.Parser.findSubString.

Note: the following issues describe symptoms of this bug:

In the same way, the following pull request tries to fix the symptoms:

Fix bug in chompUntil and chompUntilEndOr #21

The Elm Parser internally keeps track of the current position in two ways:

as a row and a column (like a code editor)
as an offset into the source string.

Normally both kinds of position infos (row and column vs. offset) are in sync with each other.
(For a given source string, you can calculate both row and column from the offset and vice versa.)

The bug in Elm.Kernel.Parser.findSubString breaks this synchronicity, though.
This affects the following parsers:

lineComment
multiComment
chompUntil
chompUntilEndOr

They set...

row and column after the (closing) token
the offset before the (closing) token

Here's an example with chompUntil:

import Parser exposing ((|.), (|=), Parser)

testParser : Parser { row : Int, col : Int, offset : Int }
testParser =
    Parser.succeed (\row col offset -> { row = row, col = col, offset = offset })
        |. Parser.chompUntil "token"
        |= Parser.getRow
        |= Parser.getCol
        |= Parser.getOffset

Parser.run testParser "< token >"
--> Ok { row = 1, col = 8, offset = 2 }

The state after the test parser is run:

row = 1, col = 8 (corresponding to offset = 7) --> after the token
offset = 2 (corresponding to row = 1, col = 3) --> before the token

The root cause for these bugs lies in the Elm.Kernel.Parser.findSubString function:

parser/src/Elm/Kernel/Parser.js

Lines 120 to 134 in 02839df

    
           var _Parser_findSubString = F5(function(smallString, offset, row, col, bigString) 
        
           { 
        
           	var newOffset = bigString.indexOf(smallString, offset); 
        
           	var target = newOffset < 0 ? bigString.length : newOffset + smallString.length; 
        
           	while (offset < target) 
        
           	{ 
        
           		var code = bigString.charCodeAt(offset++); 
        
           		code === 0x000A /* \n */ 
        
           			? ( col=1, row++ ) 
        
           			: ( col++, (code & 0xF800) === 0xD800 && offset++ ) 
        
           	} 
        
           	return __Utils_Tuple3(newOffset, row, col); 
        
           });

If the smallString is found, the returned newOffset is at the position before the smallString (the result of the indexOf function), but the new row and col after the smallString (at the target position).

Note: the following pull request tries to fix the comment of the Elm.Kernel.Parser.findSubString function
to correctly describe the buggy behavior:

Fix comment in findSubString #37

The text was updated successfully, but these errors were encountered:

fixes elm#53 Fix bug in Elm.Kernel.Parser.findSubString

miniBill · 2024-04-16T14:12:50Z

I've now created an elm-review rule to check for this https://github.com/miniBill/elm-review-no-broken-elm-parser-functions

pithub changed the title ~~Bug in Elm.Kernel.Parser.findSubString~~ Inconsistent internal parser state Nov 14, 2022

pithub linked a pull request Nov 14, 2022 that will close this issue

Fix for the inconsistent internal parser state bug #54

Open

rupertlssmith pushed a commit to elm-janitor/parser that referenced this issue Feb 17, 2023

fixes elm#54

6d655f1

fixes elm#53 Fix bug in Elm.Kernel.Parser.findSubString

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent internal parser state #53

Inconsistent internal parser state #53

pithub commented Nov 14, 2022

miniBill commented Apr 16, 2024

Inconsistent internal parser state #53

Inconsistent internal parser state #53

Comments

pithub commented Nov 14, 2022

miniBill commented Apr 16, 2024