New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] buffer expansion bug in yy_refill()? [sf#60] #62

Closed
lsf37 opened this Issue Feb 15, 2015 · 15 comments

Comments

Projects
None yet
1 participant
@lsf37
Member

lsf37 commented Feb 15, 2015

Reported by smagoun on 2004-01-30 20:35 UTC
yy_refill() in skeleton.default and skeleton.nested seems to
have a problem expanding the buffer correctly. The bug
manifests itself when reading a lot of data at once. I ran into
this using the Piccolo XML parser, which uses JFlex to parse
XML. Piccolo died while reading a very long CDATA element
in the XML. I tracked it to yy_refill(), which seems to have
been copied from one of the skeleton files JFlex ships with.

The problem is that the buffer never expands properly when
reading long input, which results in an
ArrayIndexOutOfBoundsException. The following patch fixes
Piccolo; I'm not sure if it applies to JFlex, but I'm guessing it
might.

(I'm not convinced that the if() should check
yy_currentPos>=buffer.length at all, but it seems harmless)

--- PiccoloLexer.java Sun Jul 7 14:21:18 2002
+++ PiccoloLexer copy.java Fri Jan 30 15:07:44 2004
@@ -3291,9 +3291,10 @@
}

/* is the buffer big enough? */

  • if (yy_currentPos >= yy_buffer.length) {
  • if (yy_currentPos >= yy_buffer.length)
  •    || yy_markedPos >= yy_buffer.length) {
    

/* if not: blow it up */

  •  char newBuffer[] = new char[yy_currentPos*2];
    
  •  char newBuffer[] = new char[yy_buffer.length*2];
    

System.arraycopy(yy_buffer, 0, newBuffer, 0,
yy_buffer.length);
yy_buffer = newBuffer;
}

@lsf37 lsf37 changed the title from buffer expansion bug in yy_refill()? to [Bug] buffer expansion bug in yy_refill()? [sf#60] Feb 15, 2015

@lsf37 lsf37 added this to the jflex bug milestone Feb 15, 2015

@lsf37 lsf37 closed this Feb 15, 2015

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Updated by lsf37 on 2004-02-12 00:52 UTC

  • priority: 5 --> 9
  • assigned_to: nobody --> lsf37
  • labels: --> backend/codegen
  • milestone: --> jflex bug
  • status: open --> closed
Member

lsf37 commented Feb 15, 2015

Updated by lsf37 on 2004-02-12 00:52 UTC

  • priority: 5 --> 9
  • assigned_to: nobody --> lsf37
  • labels: --> backend/codegen
  • milestone: --> jflex bug
  • status: open --> closed
@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by lsf37 on 2004-02-12 09:38 UTC
Logged In: YES
user_id=93534

Thanks for your bug report! Unfortunately, I couldn't
reproduce any buffer refill error and I also don't really
see a bug. That doesn't rule out there is one of course,
it may very well just be me.

Two things make me think it's not the bug you mean: 1. the
piccolo lexer is produced by a patched version of JFlex
that I don't really know well. A bug there doesn't
necessarily translate to a bug in JFlex. 2. if there is a
bug, the fix shouldn't work, because:

yy_refill expects and should preserve the following
invariant:
startRead <= markedPos <= currentPos <= endRead <=
buffer.length

yy_refill is usually called because currentPos==endRead.
It may be that endRead == buffer.len if the first
arraycopy didn't manage to clear up some space.

It is possible that this is not true in the piccolo
runtime, but for JFlex proper it should be. In this light
it doesn't really make sense to add "|| markedPos >=
buffer.len" because "currentPos >= buffer.len" is stronger
(it's true then anyway). "currentPos >= buffer.len" may be
confusing, in reality it's equivalent to "currentPos ==
buffer.len", the >= is just defensive programming.
Replacing currentPos2 by buffer.len2 should make no
difference.

In my tests buffer expansion works without problems, but
that doesn't mean there is no bug either. It's my guess
that the invariant I talked about above is violated
somewhere in your example. That would explain why your
change made the problem go way. It also means that the
change propably didn't really fix the bug, only the
symptom.

Where did the ArrayIndexOutOfBoundsException occur? Can
you reproduce the problem with a normal JFlex scanner?
(you can go to smaller buffer sizes with the %buffer
switch, so you don't need overly long inputs to trigger
bugs in buffer expansion). Could you send me the input
that made the piccolo lexer crash? I would very much like
to get to the bottom of this problem.

Member

lsf37 commented Feb 15, 2015

Commented by lsf37 on 2004-02-12 09:38 UTC
Logged In: YES
user_id=93534

Thanks for your bug report! Unfortunately, I couldn't
reproduce any buffer refill error and I also don't really
see a bug. That doesn't rule out there is one of course,
it may very well just be me.

Two things make me think it's not the bug you mean: 1. the
piccolo lexer is produced by a patched version of JFlex
that I don't really know well. A bug there doesn't
necessarily translate to a bug in JFlex. 2. if there is a
bug, the fix shouldn't work, because:

yy_refill expects and should preserve the following
invariant:
startRead <= markedPos <= currentPos <= endRead <=
buffer.length

yy_refill is usually called because currentPos==endRead.
It may be that endRead == buffer.len if the first
arraycopy didn't manage to clear up some space.

It is possible that this is not true in the piccolo
runtime, but for JFlex proper it should be. In this light
it doesn't really make sense to add "|| markedPos >=
buffer.len" because "currentPos >= buffer.len" is stronger
(it's true then anyway). "currentPos >= buffer.len" may be
confusing, in reality it's equivalent to "currentPos ==
buffer.len", the >= is just defensive programming.
Replacing currentPos2 by buffer.len2 should make no
difference.

In my tests buffer expansion works without problems, but
that doesn't mean there is no bug either. It's my guess
that the invariant I talked about above is violated
somewhere in your example. That would explain why your
change made the problem go way. It also means that the
change propably didn't really fix the bug, only the
symptom.

Where did the ArrayIndexOutOfBoundsException occur? Can
you reproduce the problem with a normal JFlex scanner?
(you can go to smaller buffer sizes with the %buffer
switch, so you don't need overly long inputs to trigger
bugs in buffer expansion). Could you send me the input
that made the piccolo lexer crash? I would very much like
to get to the bottom of this problem.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by mrdon on 2004-08-11 23:19 UTC
Logged In: YES
user_id=255188

FWIW, I've been experiencing this problem for quite a while
now. In my case, I'm feeding socket bytes to the lexer, so
it is hard to really test, but the stack trace looks like this:

java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.twdata.kokua.tw.ChatLexer.yy_refill(ChatLexer.java:325)
at org.twdata.kokua.tw.ChatLexer.yylex(ChatLexer.java:526)
at org.twdata.kokua.LexerRunner.run(LexerRunner.java:88)
at java.lang.Thread.run(Unknown Source)

The application is here: http://www.twdata.org/kokua is is
using the latest version of JFlex. The lexer specification
is here:
http://cvs.sourceforge.net/viewcvs.py/twdata/kokua/src/java/org/twdata/kokua/tw/ChatLexer.flex?rev=1.2&view=markup

HTH,

Don

Member

lsf37 commented Feb 15, 2015

Commented by mrdon on 2004-08-11 23:19 UTC
Logged In: YES
user_id=255188

FWIW, I've been experiencing this problem for quite a while
now. In my case, I'm feeding socket bytes to the lexer, so
it is hard to really test, but the stack trace looks like this:

java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.twdata.kokua.tw.ChatLexer.yy_refill(ChatLexer.java:325)
at org.twdata.kokua.tw.ChatLexer.yylex(ChatLexer.java:526)
at org.twdata.kokua.LexerRunner.run(LexerRunner.java:88)
at java.lang.Thread.run(Unknown Source)

The application is here: http://www.twdata.org/kokua is is
using the latest version of JFlex. The lexer specification
is here:
http://cvs.sourceforge.net/viewcvs.py/twdata/kokua/src/java/org/twdata/kokua/tw/ChatLexer.flex?rev=1.2&view=markup

HTH,

Don

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by mrdon on 2004-08-12 00:12 UTC
Logged In: YES
user_id=255188

Ok, more information. It looks like the exception is
different than the one Steve describes, but it could be the
same cause.

I slightly modifed the code to add a try block and print out
some debugging. Here is the modified section:

if (yy_startRead > 0) {
try {
System.arraycopy(yy_buffer, yy_startRead,
yy_buffer, 0,
yy_endRead-yy_startRead);
} catch (ArrayIndexOutOfBoundsException ex) {
System.err.println("buffer.len:"+yy_buffer.length+"
yystart:"+yy_startRead+" yyend:"+yy_endRead);
}

The exception was first being thrown on that arraycopy.
When the problem occurred again, this is what printed out:

buffer.len:16384 yystart:1159 yyend:1025

According to your invariant, start should never be larger
than end. Let me know if I be of any more help.

Don

Member

lsf37 commented Feb 15, 2015

Commented by mrdon on 2004-08-12 00:12 UTC
Logged In: YES
user_id=255188

Ok, more information. It looks like the exception is
different than the one Steve describes, but it could be the
same cause.

I slightly modifed the code to add a try block and print out
some debugging. Here is the modified section:

if (yy_startRead > 0) {
try {
System.arraycopy(yy_buffer, yy_startRead,
yy_buffer, 0,
yy_endRead-yy_startRead);
} catch (ArrayIndexOutOfBoundsException ex) {
System.err.println("buffer.len:"+yy_buffer.length+"
yystart:"+yy_startRead+" yyend:"+yy_endRead);
}

The exception was first being thrown on that arraycopy.
When the problem occurred again, this is what printed out:

buffer.len:16384 yystart:1159 yyend:1025

According to your invariant, start should never be larger
than end. Let me know if I be of any more help.

Don

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Updated by lsf37 on 2004-08-12 00:28 UTC

  • status: closed --> open
Member

lsf37 commented Feb 15, 2015

Updated by lsf37 on 2004-08-12 00:28 UTC

  • status: closed --> open
@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by mrdon on 2004-08-13 16:08 UTC
Logged In: YES
user_id=255188

I captured all the input that was causing it to die to file,
and wrote a unit test to try to replicate the problem.
Unfortunately, that didn't work. It looks like the problem
only crops up when the data is being fed to the lexer
through a piped output/input stream from a socket.

The code that feeds Socket input is here:
http://cvs.sourceforge.net/viewcvs.py/twdata/kokua/src/java/org/twdata/kokua/LexerManager.java?rev=1.11&view=markup

The code that runs the lexer is here:
http://cvs.sourceforge.net/viewcvs.py/twdata/kokua/src/java/org/twdata/kokua/LexerRunner.java?rev=1.2&view=markup

HTH, Don

Member

lsf37 commented Feb 15, 2015

Commented by mrdon on 2004-08-13 16:08 UTC
Logged In: YES
user_id=255188

I captured all the input that was causing it to die to file,
and wrote a unit test to try to replicate the problem.
Unfortunately, that didn't work. It looks like the problem
only crops up when the data is being fed to the lexer
through a piped output/input stream from a socket.

The code that feeds Socket input is here:
http://cvs.sourceforge.net/viewcvs.py/twdata/kokua/src/java/org/twdata/kokua/LexerManager.java?rev=1.11&view=markup

The code that runs the lexer is here:
http://cvs.sourceforge.net/viewcvs.py/twdata/kokua/src/java/org/twdata/kokua/LexerRunner.java?rev=1.2&view=markup

HTH, Don

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by mrdon on 2004-09-11 23:52 UTC
Logged In: YES
user_id=255188

Interesting. I have confirmed this problem only occurs when
the data is fed through a piped input/output stream. I
wrote a unit test that uses two threads to setup this
situation and I can consistently reproduce the error now.
If I instead pass the file directly to the lexer's
constructor, everything works fine. If you want the test, I
can zip everything up and post it on my website.

Member

lsf37 commented Feb 15, 2015

Commented by mrdon on 2004-09-11 23:52 UTC
Logged In: YES
user_id=255188

Interesting. I have confirmed this problem only occurs when
the data is fed through a piped input/output stream. I
wrote a unit test that uses two threads to setup this
situation and I can consistently reproduce the error now.
If I instead pass the file directly to the lexer's
constructor, everything works fine. If you want the test, I
can zip everything up and post it on my website.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by lsf37 on 2004-09-12 02:52 UTC
Logged In: YES
user_id=93534

Yes, please zip together the test case and post it. Still wasn't able
to reproduce the problem.

Gerwin

Member

lsf37 commented Feb 15, 2015

Commented by lsf37 on 2004-09-12 02:52 UTC
Logged In: YES
user_id=93534

Yes, please zip together the test case and post it. Still wasn't able
to reproduce the problem.

Gerwin

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by mrdon on 2004-09-12 03:05 UTC
Logged In: YES
user_id=255188

Ok, I think I'm starting to narrow it down: the problem
seems to occur whenever \r is used. In my case, the stream
uses \r\n for line endings. I have some states that, as
long as there is more than one character for the line, want
to simply grab the whole line and pass it to a secondary
parser. If there is a line like ^$, then the state ends and
changes to YYINITIAL. The problem is I'm having difficulty
doing this without doing something like ^[^\r]+ or ^\r The
same problem seems to happen when $ is used for line endings
as well.

I traced it down in the generated code to the yylex()
method, in particular the yy_forAction block seems to be
where yy_markedPos_1 gets incremented past yy_endRead, which
is a problem since right after that block yy_startRead gets
set to yy_markedPos_1.

My project is at http://www.twdata.org/dakine/jflex-bug.zip

Unzip it and, using Apache Ant, run "ant run-tests" in the
new kokua directory. The flex file is
kokua/src/org/kokua/tw/TWLexer.flex and "ant lexer"
regenerates the lexer Java file. In particular, the
"<SECTORDISPLAY> ^\r {" rule seems to be causing a problem
in this case, but as soon as I fix something like this,
another problem crops up later (I run the program and
capture all the text getting fed to the lexer).

Member

lsf37 commented Feb 15, 2015

Commented by mrdon on 2004-09-12 03:05 UTC
Logged In: YES
user_id=255188

Ok, I think I'm starting to narrow it down: the problem
seems to occur whenever \r is used. In my case, the stream
uses \r\n for line endings. I have some states that, as
long as there is more than one character for the line, want
to simply grab the whole line and pass it to a secondary
parser. If there is a line like ^$, then the state ends and
changes to YYINITIAL. The problem is I'm having difficulty
doing this without doing something like ^[^\r]+ or ^\r The
same problem seems to happen when $ is used for line endings
as well.

I traced it down in the generated code to the yylex()
method, in particular the yy_forAction block seems to be
where yy_markedPos_1 gets incremented past yy_endRead, which
is a problem since right after that block yy_startRead gets
set to yy_markedPos_1.

My project is at http://www.twdata.org/dakine/jflex-bug.zip

Unzip it and, using Apache Ant, run "ant run-tests" in the
new kokua directory. The flex file is
kokua/src/org/kokua/tw/TWLexer.flex and "ant lexer"
regenerates the lexer Java file. In particular, the
"<SECTORDISPLAY> ^\r {" rule seems to be causing a problem
in this case, but as soon as I fix something like this,
another problem crops up later (I run the program and
capture all the text getting fed to the lexer).

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by mrdon on 2004-09-15 19:53 UTC
Logged In: YES
user_id=255188

Well, in the test case I sent you, I can reproduce the bug
in the test that doesn't use piped input/output streams by
modifying the lexer-generated code to read a smaller number,
say 128 bytes - Math.min(128, yy_buffer.length-yy_endRead).
The PipedInputStream's buffer is only 1024 bytes,
preventing more from being able to be read at a time.

I don't know if jflex just can't handle it when it can't get
all the data it wants, or the smaller buffer reads are
affecting another parts of the code in indeterminate ways.
I'm hoping the issue isn't with the generated state tables
as that would be about impossible to troubleshoot.

Member

lsf37 commented Feb 15, 2015

Commented by mrdon on 2004-09-15 19:53 UTC
Logged In: YES
user_id=255188

Well, in the test case I sent you, I can reproduce the bug
in the test that doesn't use piped input/output streams by
modifying the lexer-generated code to read a smaller number,
say 128 bytes - Math.min(128, yy_buffer.length-yy_endRead).
The PipedInputStream's buffer is only 1024 bytes,
preventing more from being able to be read at a time.

I don't know if jflex just can't handle it when it can't get
all the data it wants, or the smaller buffer reads are
affecting another parts of the code in indeterminate ways.
I'm hoping the issue isn't with the generated state tables
as that would be about impossible to troubleshoot.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by mrdon on 2004-10-19 06:21 UTC
Logged In: YES
user_id=255188

Ok, got HEAD, and have narrowed down the problem. It seems
in the zzForAction loop, the variable zzEndReadL is used to
ensure zzCurrentPosL doesn't read too far. The problem is
zzEndReadL somehow gets out of sync with zzEndRead, so when
the loop goes to zzRefill, zzStartRead, which is set with
zzMarkedPosL (previously set by zzCurrentPosL), is bad and
so tries to read past the end of the array set by zzEndRead.

If I changed:
zzForAction: {
while (true) {

if (zzCurrentPosL < zzEndReadL)

to read:
zzForAction: {
while (true) {

if (zzCurrentPosL < zzEndRead)

my tests pass. Perhaps that breaks something else, I don't
know, but at least it ensures jflex doesn't try to read too far.

Member

lsf37 commented Feb 15, 2015

Commented by mrdon on 2004-10-19 06:21 UTC
Logged In: YES
user_id=255188

Ok, got HEAD, and have narrowed down the problem. It seems
in the zzForAction loop, the variable zzEndReadL is used to
ensure zzCurrentPosL doesn't read too far. The problem is
zzEndReadL somehow gets out of sync with zzEndRead, so when
the loop goes to zzRefill, zzStartRead, which is set with
zzMarkedPosL (previously set by zzCurrentPosL), is bad and
so tries to read past the end of the array set by zzEndRead.

If I changed:
zzForAction: {
while (true) {

if (zzCurrentPosL < zzEndReadL)

to read:
zzForAction: {
while (true) {

if (zzCurrentPosL < zzEndRead)

my tests pass. Perhaps that breaks something else, I don't
know, but at least it ensures jflex doesn't try to read too far.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by mrdon on 2004-10-19 20:54 UTC
Logged In: YES
user_id=255188

Actually, it is a bit cleaner to just have zzEndReadL
updated after every zzRefill rather my previous fix. I
modified Emitter.java as follows (diff -u format)

Index: Emitter.java

RCS file: /cvsroot/jflex/jflex/src/JFlex/Emitter.java,v
retrieving revision 2.23
diff -u -r2.23 Emitter.java
--- Emitter.java 21 Apr 2004 08:09:50 -0000 2.23
+++ Emitter.java 19 Oct 2004 20:49:59 -0000
@@ -941,6 +941,7 @@
println(" zzPeek = false;");
println(" else {");
println(" boolean eof = zzRefill();");

  •    println("          zzEndReadL = zzEndRead;");
    

println(" zzMarkedPosL = zzMarkedPos;");
println(" zzBufferL = zzBuffer;");
println(" if (eof) ");
@@ -974,6 +975,7 @@
println(" zzAtBOL = false;");
println(" else {");
println(" boolean eof = zzRefill();");

  •  println("            zzEndReadL = zzEndRead;");
    

println(" zzMarkedPosL = zzMarkedPos;");
println(" zzBufferL = zzBuffer;");
println(" if (eof) ");

Member

lsf37 commented Feb 15, 2015

Commented by mrdon on 2004-10-19 20:54 UTC
Logged In: YES
user_id=255188

Actually, it is a bit cleaner to just have zzEndReadL
updated after every zzRefill rather my previous fix. I
modified Emitter.java as follows (diff -u format)

Index: Emitter.java

RCS file: /cvsroot/jflex/jflex/src/JFlex/Emitter.java,v
retrieving revision 2.23
diff -u -r2.23 Emitter.java
--- Emitter.java 21 Apr 2004 08:09:50 -0000 2.23
+++ Emitter.java 19 Oct 2004 20:49:59 -0000
@@ -941,6 +941,7 @@
println(" zzPeek = false;");
println(" else {");
println(" boolean eof = zzRefill();");

  •    println("          zzEndReadL = zzEndRead;");
    

println(" zzMarkedPosL = zzMarkedPos;");
println(" zzBufferL = zzBuffer;");
println(" if (eof) ");
@@ -974,6 +975,7 @@
println(" zzAtBOL = false;");
println(" else {");
println(" boolean eof = zzRefill();");

  •  println("            zzEndReadL = zzEndRead;");
    

println(" zzMarkedPosL = zzMarkedPos;");
println(" zzBufferL = zzBuffer;");
println(" if (eof) ");

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by lsf37 on 2004-10-24 07:36 UTC
Logged In: YES
user_id=93534

Yes! That seems to be the one. Thank you Don for your help!

I've committed the patch to the repository. It will be released together with
the next version.

Gerwin

Member

lsf37 commented Feb 15, 2015

Commented by lsf37 on 2004-10-24 07:36 UTC
Logged In: YES
user_id=93534

Yes! That seems to be the one. Thank you Don for your help!

I've committed the patch to the repository. It will be released together with
the next version.

Gerwin

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Updated by lsf37 on 2004-10-24 07:36 UTC

  • status: open --> open-fixed
Member

lsf37 commented Feb 15, 2015

Updated by lsf37 on 2004-10-24 07:36 UTC

  • status: open --> open-fixed
@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Updated by lsf37 on 2004-11-07 06:04 UTC

  • status: open-fixed --> closed
Member

lsf37 commented Feb 15, 2015

Updated by lsf37 on 2004-11-07 06:04 UTC

  • status: open-fixed --> closed

@lsf37 lsf37 added the bug label Feb 17, 2015

@lsf37 lsf37 modified the milestone: jflex bug Feb 17, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment