Make `yychar` long #605

regisd · 2019-11-24T11:30:18Z

When the input is larger than 2GB, the scanner internal state for the number of read characters (aka zzchar) is negative. This obviously doesn't make sense.

The test added in #603 reproduces the problem and throws an exception, when the zzchar is negative.

The problem is caused by yychar being an int, hence subject to overflow.

Fix #558 by making yychar long. This is similar to #558 by @sarowe with a lighter approach for testing.

Note that some problems remain:

yyline and yycolumn are still integers.
if the match section is larger than 2GB, then
- you will likely have an OutOfMemory error
- even if you have enough heap size, zzRefill() will throw a NegativeArraySizeException like this
  
  java.lang.NegativeArraySizeException
  at jflex.testcase.large_input.LargeInputScanner.zzRefill(LargeInputScanner.java:288)
  at jflex.testcase.large_input.LargeInputScanner.yylex(LargeInputScanner.java:601)
  at jflex.testcase.large_input.LargeInputTest.consumeLargeInput(LargeInputTest.java:22)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

The input is generated by a custom Reader.

…er that repeats the given content, to reduce iterations when we read a large amount of data.

We actually need a lot of RAM to make this test pass.

> By default a container is given 2 CPUs and 4 GB of memory but it can be configured in .cirrus.yml https://cirrus-ci.org/guide/linux/

Don't prepare a large buffer if we read only a few characters.

And remove old `Yyxlex.java` which was submitted (probably by mistake) in commit 4db21cb

lsf37

I'm happy with this, esp generating the large input is nice.

We should announce this change fairly prominently, because it is API-breaking: users will have to make the same change we had to make in the example.

I wonder if we should make the type configurable (in a separate PR). The ability to handle input files >2GB is not necessarily what everyone needs, some might prefer not having to change their code.

regisd · 2019-11-26T22:59:35Z

Thanks for the positive feedback, Gerwin.

My opinion is that:

It's OK to do API breaking changes, particularly because there was none in the last year (with a slow release cycle, changes can/will be more brutal), and very few in the last decade.
There is no need to announce much in advance. Users have to read the changelog and update their code when they upgrade the dep.

That being said, I think 1.8 has accumulated enough changes to be released.

Author: Régis Décamps <regisd@google.com> Date: Wed Nov 27 00:02:51 2019 +0100 Make `yychar` long (#605) * Replace `yychar` by long in skeletons (default & nested) * Update example for long yyChar * Also, add some positivity assertion in `example/simple/Yytoken` to give some defensive programming as an example. * Update manual and READMe * Update test added in #603 . * Improve doc of RepeatContentReader * Improve RepeatContentReader Don't prepare a large buffer if we read only a few characters. * Also update testcase/example/simple for long `yychar`. * Delete testcase/simple which is just redundant with the example. Updated from target/jflex-parent-1.8.0-SNAPSHOT-sources.jar

lsf37 · 2019-11-26T23:25:00Z

Yes, I meant making it prominent in the release announcement (just because API breaking changes are rare for JFlex), I agree that we don't want to announce it before.

lsf37 · 2019-11-27T08:49:41Z

Forgot to say: I agree that 1.8 is getting there. Still labouring under the illusion that I'll manage to pull in the char class macros (#216) into 1.8. Recently made some progress on that, but there is still some way to go.

Introduce a fakeRead() that changes `yychar` so that we can fakely jump to yychar just before Integer.MAX_VALUE and make the test run in a snap. Follow-up of jflex-de#603 and jflex-de#605

) * Make LargeInputTest run much faster by faking the scanner position Introduce a fakeRead() that changes `yychar` so that we can fakely jump to yychar just before Integer.MAX_VALUE and make the test run in a snap. Follow-up of #603 and #605

regisd · 2020-03-23T22:36:44Z

Follow-up comment for those who hit this breaking change. There are two options to fix it:

either adapt the consuming code and use long as well.
or there is a guarantee that yychar is neither larger or equal to 2^31 in which case it's possible to wrap yychar with Math.toExactInt()

Starting from JFlex 1.8.0, `yychar` is now a long. However antlr `CommonToken` only takes an int, hence explicit casting is needed. I propose to use `java.lang.Math.toIntExact()` which throws ArithmeticException in case of overflow. This is acceptable, because the application will have an illegal state today (negative character position caused by the int overflow). See jflex-de/jflex#605 for more details.

Since JFlex 1.8.0, the `yychar` is a long. However, it is used to build an antlr `CommonToken` which takes an int. I propose to cast the `yychar` with `java.lang.Math.toIntExact()` which throw an ArithmeticException in case of overflow. This is acceptable because the previous code would have returned an overflowed (negative) position. See jflex-de/jflex#605 for details.

Starting from JFlex 1.8.0, `yychar` is a long. However, it is used to build an antlr `CommonToken` which takes an int. Hence, explicit casting is needed. I propose to cast the `yychar` with `java.lang.Math.toIntExact()` which throw an ArithmeticException in case of overflow. This is acceptable because the previous code would have returned an overflowed (negative) position. See jflex-de/jflex#605 for details.

regisd added 12 commits November 24, 2019 00:00

Add a test for a very large input.

2ad5921

The input is generated by a custom Reader.

Improve performace of the RepeatContentReader, using an internal buff…

27fb669

…er that repeats the given content, to reduce iterations when we read a large amount of data.

Make test expect current behavior jflex-de#536.

d2a0ac5

Oops, forgot build dep.

6992e99

Update RepeatContentReaderTest

bcd2172

Mark test large_input:LargeInputTest as enormous.

55a04ae

We actually need a lot of RAM to make this test pass.

Increase Cirrus memory ask from 4 to 6GB

f0273fe

> By default a container is given 2 CPUs and 4 GB of memory but it can be configured in .cirrus.yml https://cirrus-ci.org/guide/linux/

Remove unused testRuntimeDir in LargeInputTest.java

0bdb4c6

Replace yychar by long in skeletons (default & nested)

a512f66

Update example for long yyChar

138eb8b

Update doc

86de0d2

Merge branch 'master' into long-yychar

ef69cf6

regisd requested a review from lsf37 as a code owner November 24, 2019 11:30

regisd added the bug Not working as intended label Nov 24, 2019

regisd self-assigned this Nov 24, 2019

regisd added 3 commits November 24, 2019 12:47

Update test.

9e02a26

Imrove doc of RepeatContentReader

dbb3e13

Improve RepeatContentReader

6b73a1d

Don't prepare a large buffer if we read only a few characters.

regisd force-pushed the long-yychar branch from f123fb4 to 6b73a1d Compare November 24, 2019 12:54

regisd added 4 commits November 24, 2019 13:59

Update FR number in changelog.md

742de71

Also update testcase/example/simple for long yychar.

72c6905

And remove old `Yyxlex.java` which was submitted (probably by mistake) in commit 4db21cb

Delete testcase/simple which is just rendundant with the examle.

a67620c

Add some positivity assertion in example/simple/Yytoken

108f2ec

regisd requested a review from sarowe as a code owner November 24, 2019 13:51

regisd added this to the 1.8.0 milestone Nov 25, 2019

regisd mentioned this pull request Nov 25, 2019

make yychar long #558

Closed

lsf37 approved these changes Nov 26, 2019

View reviewed changes

regisd merged commit 8c9c006 into jflex-de:master Nov 26, 2019

regisd deleted the long-yychar branch November 26, 2019 23:02

regisd mentioned this pull request Dec 3, 2019

Make LargeInputTest run much faster by faking the scanner position #647

Merged

regisd mentioned this pull request Mar 25, 2020

Cast long yychar to int. JesusFreke/smali#760

Merged

andreasabel mentioned this pull request Jul 26, 2023

Java: jflex-generated lexer with line numbers fails to build BNFC/bnfc#453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `yychar` long #605

Make `yychar` long #605

regisd commented Nov 24, 2019 •

edited

Loading

lsf37 left a comment

regisd commented Nov 26, 2019

lsf37 commented Nov 26, 2019

lsf37 commented Nov 27, 2019

regisd commented Mar 23, 2020

Make yychar long #605

Make yychar long #605

Conversation

regisd commented Nov 24, 2019 • edited Loading

lsf37 left a comment

Choose a reason for hiding this comment

regisd commented Nov 26, 2019

lsf37 commented Nov 26, 2019

lsf37 commented Nov 27, 2019

regisd commented Mar 23, 2020

Make `yychar` long #605

Make `yychar` long #605

regisd commented Nov 24, 2019 •

edited

Loading