New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Error in skeleton.nested [sf#132] #133

Closed
lsf37 opened this Issue Feb 15, 2015 · 27 comments

Comments

Projects
None yet
2 participants
@lsf37
Member

lsf37 commented Feb 15, 2015

Reported by jeningar on 2014-09-23 10:37 UTC
Hi Steve, hi Gerwin,

using the skeleton.nested, the resulting program showed the following behaviour:

echo "list sessions;" | sdmsh

hangs forever. But

echo "list sessions;" > /tmp/x
sdmsh < /tmp/x

teminates as should be.

After merging the skeleton.default and skeleton.nested this erroneous behaviour vanished.

Diffing the old and new skeleton.nested shows (< == new; > == old):

[ronald@cheetah shell]$ diff skeleton.nested* | more
39c39
< 

---
>   
169c169
<     while (numRead == 0) { // bug #130 discussion; while is better than if

---
>     if (numRead == 0) {
403,409d402
<     // cached fields:
<     int zzCurrentPosL;
<     int zzMarkedPosL;
<     int zzEndReadL = zzEndRead;
<     char [] zzBufferL = zzBuffer;
<     char [] zzCMapL = ZZ_CMAP;
< 
413c406,411
<       zzMarkedPosL = zzMarkedPos;

---
>       // cached fields:
>       int zzCurrentPosL;
>       int zzMarkedPosL = zzMarkedPos;
>       int zzEndReadL = zzEndRead;
>       char [] zzBufferL = zzBuffer;
>       char [] zzCMapL = ZZ_CMAP;

The HUGE difference is that the variables (notably zzEndReadL) aren't initialized by skeleton.default in every iteration of the while loop.

During the merge I wondered why there are two skeletons in the first place. Those who don't want to read from multiple streams just don't call yypushStream() and friends. One skeleton would be more than enough (and a link for backward compatibility).

for the sake of completeness I attached the new skeleton.nested.

Regards,

Ronald

@lsf37 lsf37 changed the title from Error in skeleton.nested to [Bug] Error in skeleton.nested [sf#132] Feb 15, 2015

@lsf37 lsf37 added this to the jflex bug milestone Feb 15, 2015

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by jeningar on 2014-09-25 13:14 UTC
small add-on.

Today I tried to write a minimal "working" example. So far I didn't succeed, unfortunately. It was even so, that our software seemed to work as intended, regardless of the skeleton used.
After a complete recompile, the software failed again, blocking as described.
Odd enought, it now also blocks when using a file as input. (Actually I think the original behaviour is stranger than the actual behaviour. The use of "odd" refers to the fact that a system behaves differently after a recompile of unchanged sources).
There's no difference in behaviour between the two skeletons any more (the original and the patched version).

It's all very weird. I'll do some more investigations. If I get any results, I'll report them. In the mean time I accept this to be a low priority bug, as everything works flawlessly with jflex 1.4.x

Member

lsf37 commented Feb 15, 2015

Commented by jeningar on 2014-09-25 13:14 UTC
small add-on.

Today I tried to write a minimal "working" example. So far I didn't succeed, unfortunately. It was even so, that our software seemed to work as intended, regardless of the skeleton used.
After a complete recompile, the software failed again, blocking as described.
Odd enought, it now also blocks when using a file as input. (Actually I think the original behaviour is stranger than the actual behaviour. The use of "odd" refers to the fact that a system behaves differently after a recompile of unchanged sources).
There's no difference in behaviour between the two skeletons any more (the original and the patched version).

It's all very weird. I'll do some more investigations. If I get any results, I'll report them. In the mean time I accept this to be a low priority bug, as everything works flawlessly with jflex 1.4.x

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by jeningar on 2014-09-25 13:22 UTC
For the record, the relevant part of jstack output:

"main" prio=10 tid=0x00007f9cfc009000 nid=0x1b29 runnable [0x00007f9d002df000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:272)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
    - locked <0x0000000688c3cf70> (a java.io.BufferedInputStream)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
    - locked <0x0000000688a06ce8> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at de.independit.scheduler.shell.MiniScanner.zzRefill(MiniScanner.java:648)
    at de.independit.scheduler.shell.MiniScanner.yylex(MiniScanner.java:982)
    at de.independit.scheduler.shell.MiniScanner.advance(MiniScanner.java:523)
    at de.independit.scheduler.shell.MiniParser.yyparse(MiniParser.java:832)
    at de.independit.scheduler.sdmsh.go(sdmsh.java:196)
    at de.independit.scheduler.SDMSApp.App.doTry(App.java:325)
    at de.independit.scheduler.SDMSApp.App.reTry(App.java:349)
    at de.independit.scheduler.SDMSApp.App.run(App.java:440)
    at de.independit.scheduler.sdmsh.main(sdmsh.java:215)
Member

lsf37 commented Feb 15, 2015

Commented by jeningar on 2014-09-25 13:22 UTC
For the record, the relevant part of jstack output:

"main" prio=10 tid=0x00007f9cfc009000 nid=0x1b29 runnable [0x00007f9d002df000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:272)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
    - locked <0x0000000688c3cf70> (a java.io.BufferedInputStream)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
    - locked <0x0000000688a06ce8> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at de.independit.scheduler.shell.MiniScanner.zzRefill(MiniScanner.java:648)
    at de.independit.scheduler.shell.MiniScanner.yylex(MiniScanner.java:982)
    at de.independit.scheduler.shell.MiniScanner.advance(MiniScanner.java:523)
    at de.independit.scheduler.shell.MiniParser.yyparse(MiniParser.java:832)
    at de.independit.scheduler.sdmsh.go(sdmsh.java:196)
    at de.independit.scheduler.SDMSApp.App.doTry(App.java:325)
    at de.independit.scheduler.SDMSApp.App.reTry(App.java:349)
    at de.independit.scheduler.SDMSApp.App.run(App.java:440)
    at de.independit.scheduler.sdmsh.main(sdmsh.java:215)
@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by jeningar on 2014-09-26 09:46 UTC
My next attempt to build a minimal working example has been a bit more successful. Attached are:

  1. the jflex file (AReader.jlex)
  2. the jflex 1.7 output (AReader.java.1.7)
  3. the jflex 1.4 output using the same 1.7 skeleton (AReader.java.1.4)

after a

cp AReader.java.1.4 AReader.java
javac AReader.java

resp.

cp AReader.java.1.7 AReader.java
javac AReader.java

the following test shows the difference:

echo "some text no matter what but ending with semicolon;" | java AReader

The 1.7 version hangs, the 1.4 version doesn't.
Obviously the title of the bug is wrong. Sorry for that.
I'll continue to investigate the matter, but would gratefully accept any help.

Although this source code works flawlessly since years, I don't exclude the possibility of a small flaw in my jflex code. The fact that the 1.4 version requires two "^D" to terminate might be a clue here.

The java version I'm using is

[ronald@cheetah shell]$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
Member

lsf37 commented Feb 15, 2015

Commented by jeningar on 2014-09-26 09:46 UTC
My next attempt to build a minimal working example has been a bit more successful. Attached are:

  1. the jflex file (AReader.jlex)
  2. the jflex 1.7 output (AReader.java.1.7)
  3. the jflex 1.4 output using the same 1.7 skeleton (AReader.java.1.4)

after a

cp AReader.java.1.4 AReader.java
javac AReader.java

resp.

cp AReader.java.1.7 AReader.java
javac AReader.java

the following test shows the difference:

echo "some text no matter what but ending with semicolon;" | java AReader

The 1.7 version hangs, the 1.4 version doesn't.
Obviously the title of the bug is wrong. Sorry for that.
I'll continue to investigate the matter, but would gratefully accept any help.

Although this source code works flawlessly since years, I don't exclude the possibility of a small flaw in my jflex code. The fact that the 1.4 version requires two "^D" to terminate might be a clue here.

The java version I'm using is

[ronald@cheetah shell]$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

@lsf37 lsf37 added the bug label Feb 15, 2015

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37
Member

lsf37 commented Feb 15, 2015

Attachments on SF tracker: https://sourceforge.net/p/jflex/bugs/132/

@lsf37 lsf37 removed this from the jflex bug milestone Feb 15, 2015

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

It looks like this one is closely related to #131

Member

lsf37 commented Feb 15, 2015

It looks like this one is closely related to #131

@lsf37 lsf37 added this to the 1.6.1 milestone Feb 15, 2015

@schedulix

This comment has been minimized.

Show comment
Hide comment
@schedulix

schedulix Feb 18, 2015

After picking up this issue again and some experimenting, I finally found out something useful.
There is a new code part in the 1.7 scanner which looks like

      // set up zzAction for empty match case:
      int zzAttributes = zzAttrL[zzState];
      if ( (zzAttributes & 1) == 1 ) {
        zzAction = zzState;
      }

After commenting out the assignment to zzAction, the scanner works flawlessly. If I leave it as it is, there seems to be a problem with EOF handling.
The 1.7 version hangs when using file input, the 1.4 version doesn't.
After "including" a file, the 1.7 version doesn't "return" to the original input stream, 1.4 does.

Is there a strong reason for this piece of code? What does it do? (Or better said: what is it meant to do?).

Regards,

Ronald

PS. if required I can provide a test set
PPS. We use jflex in combination with the Jay Parser generator. If you want I can write some documentation on that. Granted, Jay is not very mainstream, but it is very stable and easy to use.

schedulix commented Feb 18, 2015

After picking up this issue again and some experimenting, I finally found out something useful.
There is a new code part in the 1.7 scanner which looks like

      // set up zzAction for empty match case:
      int zzAttributes = zzAttrL[zzState];
      if ( (zzAttributes & 1) == 1 ) {
        zzAction = zzState;
      }

After commenting out the assignment to zzAction, the scanner works flawlessly. If I leave it as it is, there seems to be a problem with EOF handling.
The 1.7 version hangs when using file input, the 1.4 version doesn't.
After "including" a file, the 1.7 version doesn't "return" to the original input stream, 1.4 does.

Is there a strong reason for this piece of code? What does it do? (Or better said: what is it meant to do?).

Regards,

Ronald

PS. if required I can provide a test set
PPS. We use jflex in combination with the Jay Parser generator. If you want I can write some documentation on that. Granted, Jay is not very mainstream, but it is very stable and easy to use.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 21, 2015

Member

Thanks for tracing this down further. Will need to look through the history to figure out what this code was supposed to do. It may still only be a symptom, not the root cause, but it looks like we're getting closer.

A small test case would be very helpful, not just for fixing it but also for making sure it's in the regression suite.

Documentation on Jay would also be very useful, happy to add a small section to the manual for it.

Member

lsf37 commented Feb 21, 2015

Thanks for tracing this down further. Will need to look through the history to figure out what this code was supposed to do. It may still only be a symptom, not the root cause, but it looks like we're getting closer.

A small test case would be very helpful, not just for fixing it but also for making sure it's in the regression suite.

Documentation on Jay would also be very useful, happy to add a small section to the manual for it.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 21, 2015

Member

A note while I'm trying to get to get to the bottom of this: the lexer spec AReader.jlex contains multiple expressions that match the empty string. For instance:

nl =        [\n\r]*
...
<YYINITIAL>{nl}         { if(inStatement) addVal();                 break;      }

This can always lead to non-termination if nothing else matches. In general you would want to avoid this pattern and make sure that each rule matches at least one character. E.g. use nl = [\n\r]+, or later <YYINITIAL>[a-zA-Z]+ instead of <YYINITIAL>[a-zA-Z]*. If I do this for all of the rules, I get a terminating scanner.

However, this is not the real cause of the problem here. Since there is a possible longer match (because the same scanner does work with +), the scanner should never just return the empty match for your spec. So I'll keep looking.

Member

lsf37 commented Feb 21, 2015

A note while I'm trying to get to get to the bottom of this: the lexer spec AReader.jlex contains multiple expressions that match the empty string. For instance:

nl =        [\n\r]*
...
<YYINITIAL>{nl}         { if(inStatement) addVal();                 break;      }

This can always lead to non-termination if nothing else matches. In general you would want to avoid this pattern and make sure that each rule matches at least one character. E.g. use nl = [\n\r]+, or later <YYINITIAL>[a-zA-Z]+ instead of <YYINITIAL>[a-zA-Z]*. If I do this for all of the rules, I get a terminating scanner.

However, this is not the real cause of the problem here. Since there is a possible longer match (because the same scanner does work with +), the scanner should never just return the empty match for your spec. So I'll keep looking.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 21, 2015

Member

Just logging that the "empty match case" originated from issue #109, and that at least my minimized AReader.jlex correctly matches the input before it goes into the loop (still need to confirm that this is also the case in the original).

Being able to match the empty string is legit in principle, even though the comment on #109 already says that you have to be more careful with writing your rules now. The EOF match should still take precedence, and I guess this is the actual problem.

Member

lsf37 commented Feb 21, 2015

Just logging that the "empty match case" originated from issue #109, and that at least my minimized AReader.jlex correctly matches the input before it goes into the loop (still need to confirm that this is also the case in the original).

Being able to match the empty string is legit in principle, even though the comment on #109 already says that you have to be more careful with writing your rules now. The EOF match should still take precedence, and I guess this is the actual problem.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 21, 2015

Member

Confirmed that original spec matches initial text correctly and empty string match occurs only at EOF in the example. Which means my reasoning that there is a longer match is misleading, because EOF isn't a real character in the input stream.

Member

lsf37 commented Feb 21, 2015

Confirmed that original spec matches initial text correctly and empty string match occurs only at EOF in the example. Which means my reasoning that there is a longer match is misleading, because EOF isn't a real character in the input stream.

@lsf37 lsf37 self-assigned this Feb 21, 2015

@schedulix

This comment has been minimized.

Show comment
Hide comment
@schedulix

schedulix Feb 21, 2015

Still, I think that an EOF (as far as there is a rule for it) is a positive result compared to an empty string and should take precedence.
Nevertheless your critics regarding my scanner code is more than valid and I'm going to eliminate all possibilities of an empty string matching some rule. Until now, my life has been saved by greedy matching :-)
Thank you for pointing this out.

schedulix commented Feb 21, 2015

Still, I think that an EOF (as far as there is a rule for it) is a positive result compared to an empty string and should take precedence.
Nevertheless your critics regarding my scanner code is more than valid and I'm going to eliminate all possibilities of an empty string matching some rule. Until now, my life has been saved by greedy matching :-)
Thank you for pointing this out.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 21, 2015

Member

Didn't mean to say that EOF shouldn't take precedence over the empty match. I've already started looking into how it can be fixed.

Empty matches will still remain dangerous if they can occur somewhere in the middle of the input, unless something in the lexer changes state, but EOF should count as one character of input.

Member

lsf37 commented Feb 21, 2015

Didn't mean to say that EOF shouldn't take precedence over the empty match. I've already started looking into how it can be fixed.

Empty matches will still remain dangerous if they can occur somewhere in the middle of the input, unless something in the lexer changes state, but EOF should count as one character of input.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 21, 2015

Member

(this all also means this issue is not related to issue #131)

Member

lsf37 commented Feb 21, 2015

(this all also means this issue is not related to issue #131)

lsf37 added a commit that referenced this issue Feb 21, 2015

Warn about expressions that match the empty string.
There are legitimate cases where you might want to match the empty string
(see issue #109), but it can easily lead to non-termination.

Triggered by issue #133, but is not a fix.

lsf37 added a commit that referenced this issue Feb 22, 2015

Fix issue #133 "Error in skeleton.nested"
The problem was that empty matches preceded `<<EOF>>` actions. This change set
pulls the EOF test including action and return code before the action switch
statement.

The drawback is that we now test for the infrequent EOF on every single action.

Since the issue can only occur if there are empty matches, an optimization
would be to only move the EOF test before the action switch if the spec
contains potential empty matches.

Since action code will usually dominate the EOF test anyway, I'm deferring this
optimization for now.

Test cases updated to the new skeleton.nested; the test case `genlook`
actually contained an instance of an empty match before EOF, but avoided
non-termination because it changed state on that match. The test case
`emptymatch` shows that empty matches within the stream still work.

Caveats:
  * Can't match the empty string at the end of the stream before EOF any
    more, because EOF will take precedence. Should be able to pull any such
    action into EOF code to avoid this problem.
  * `break` statements at the end of EOF code will cause a compile error now,
    because they are not in a switch statement any more. Solution: remove
    such `break` statements, they shouldn't be there in the first place.
@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 22, 2015

Member

f312d56 should fix this issue. @schedulix, can you please confirm that the fix works for you?

It is possible that your initial problem was a combination of the empty-match issue and issue #131.

Note that the break statement in your <<EOF>> code will now lead to a compile error. You can just remove this break, including the other break statements at the end of actions: the generated code is designed such that a break is inserted automatically with a fall-through switch. (Some static analysers don't like this, so it's also Ok to leave them in, apart from in <<EOF>>)

Member

lsf37 commented Feb 22, 2015

f312d56 should fix this issue. @schedulix, can you please confirm that the fix works for you?

It is possible that your initial problem was a combination of the empty-match issue and issue #131.

Note that the break statement in your <<EOF>> code will now lead to a compile error. You can just remove this break, including the other break statements at the end of actions: the generated code is designed such that a break is inserted automatically with a fall-through switch. (Some static analysers don't like this, so it's also Ok to leave them in, apart from in <<EOF>>)

@schedulix

This comment has been minimized.

Show comment
Hide comment
@schedulix

schedulix Feb 22, 2015

Removing the break in the <<EOF>> rule doesn't seem to be a problem. In all other rules it is. Not because of the generated 1.7 code, but because of backward compatibility. Many Linux distributions (so far fortunately) still ship the 1.4.3 version of jflex. Removing the breaks would definitely break the generated code. (Actually, I don't really mind adding those break statements).

Now for some results after removing the break in the <<EOF>> rule:

  1. everything compiled without problems with the 1.4.3 version of jflex
  2. It didn't compile with the f312d5...3ca53c git version:
MiniScanner.java:946: error: orphaned default
        default: 
        ^
MiniScanner.java:1183: error: 'else' without 'if'
          else {
          ^

For some reason the default: action is written before the start of the switch statement.
The generated code (line 939ff) looks like

...
      // store back cached position
      zzMarkedPos = zzMarkedPosL;

              {
                if(yymoreStreams()) { endInclude(); yypopStream(); } else return YYEOF;
                                  /* break; not necessary here because of fallthrough of default rule that ends the switch */
              }
        default:
          if (zzInput == YYEOF && zzStartRead == zzCurrentPos) {
            zzAtEOF = true;
        switch (zzAction < 0 ? zzAction : ZZ_ACTION[zzAction]) {
          case 1:
            { addVal();         inStatement = true;                     break;
            }
          case 38: break;
...

and (line 1173ff)

...
          case 37:
            { addVal();
                                  inStatement = true;
                                  inMultiCmd = true;
                                  // System.out.print("BMULTI ");
                                  return MiniParser.BMULTI;
            }
          case 74: break;
          }
          else {
            zzScanError(ZZ_NO_MATCH);
          }
...

schedulix commented Feb 22, 2015

Removing the break in the <<EOF>> rule doesn't seem to be a problem. In all other rules it is. Not because of the generated 1.7 code, but because of backward compatibility. Many Linux distributions (so far fortunately) still ship the 1.4.3 version of jflex. Removing the breaks would definitely break the generated code. (Actually, I don't really mind adding those break statements).

Now for some results after removing the break in the <<EOF>> rule:

  1. everything compiled without problems with the 1.4.3 version of jflex
  2. It didn't compile with the f312d5...3ca53c git version:
MiniScanner.java:946: error: orphaned default
        default: 
        ^
MiniScanner.java:1183: error: 'else' without 'if'
          else {
          ^

For some reason the default: action is written before the start of the switch statement.
The generated code (line 939ff) looks like

...
      // store back cached position
      zzMarkedPos = zzMarkedPosL;

              {
                if(yymoreStreams()) { endInclude(); yypopStream(); } else return YYEOF;
                                  /* break; not necessary here because of fallthrough of default rule that ends the switch */
              }
        default:
          if (zzInput == YYEOF && zzStartRead == zzCurrentPos) {
            zzAtEOF = true;
        switch (zzAction < 0 ? zzAction : ZZ_ACTION[zzAction]) {
          case 1:
            { addVal();         inStatement = true;                     break;
            }
          case 38: break;
...

and (line 1173ff)

...
          case 37:
            { addVal();
                                  inStatement = true;
                                  inMultiCmd = true;
                                  // System.out.print("BMULTI ");
                                  return MiniParser.BMULTI;
            }
          case 74: break;
          }
          else {
            zzScanError(ZZ_NO_MATCH);
          }
...
@schedulix

This comment has been minimized.

Show comment
Hide comment
@schedulix

schedulix Feb 22, 2015

Apparently there are a few code parts.
Part 1 (the EOF action):

              {
                if(yymoreStreams()) { endInclude(); yypopStream(); } else return YYEOF;
                                  /* break; not necessary here because of fallthrough of default rule that ends the switch */
              }

Part 2 (the switch default):

        default:

Part 3 (first part of an if Statement):

          if (zzInput == YYEOF && zzStartRead == zzCurrentPos) {
            zzAtEOF = true;

Part 4 (Rest of the switch statement):

        switch (zzAction < 0 ? zzAction : ZZ_ACTION[zzAction]) {
          case 1:
            { addVal();         inStatement = true;                     break;
            }
          case 38: break;
...
                                  return MiniParser.BMULTI;
            }
          case 74: break;

Part 5 (the else part of the if statement):

          }
          else {
            zzScanError(ZZ_NO_MATCH);
          }

The test I did was to change the sequence of the parts to
Part 4 (switch)
Part 2 (default)
Part 1 (EOF-action)
Part 3 (if)
Part 5 (else)
That compiled and seems to work. (And this sequence would also throw a compile error if the break would remain in the action).

The "test case" I used was

[ronald@cheetah schedulix]$ echo "list sessions;
show system;
include '/tmp/x.sdms'
list sessions;" | bin/sdmsh

As "clearly visible" this test also changes input from stdin to /tmp/x.sdms and back. I won't provide the lengthy output here. You'll have to believe me it was correct. ;-)

schedulix commented Feb 22, 2015

Apparently there are a few code parts.
Part 1 (the EOF action):

              {
                if(yymoreStreams()) { endInclude(); yypopStream(); } else return YYEOF;
                                  /* break; not necessary here because of fallthrough of default rule that ends the switch */
              }

Part 2 (the switch default):

        default:

Part 3 (first part of an if Statement):

          if (zzInput == YYEOF && zzStartRead == zzCurrentPos) {
            zzAtEOF = true;

Part 4 (Rest of the switch statement):

        switch (zzAction < 0 ? zzAction : ZZ_ACTION[zzAction]) {
          case 1:
            { addVal();         inStatement = true;                     break;
            }
          case 38: break;
...
                                  return MiniParser.BMULTI;
            }
          case 74: break;

Part 5 (the else part of the if statement):

          }
          else {
            zzScanError(ZZ_NO_MATCH);
          }

The test I did was to change the sequence of the parts to
Part 4 (switch)
Part 2 (default)
Part 1 (EOF-action)
Part 3 (if)
Part 5 (else)
That compiled and seems to work. (And this sequence would also throw a compile error if the break would remain in the action).

The "test case" I used was

[ronald@cheetah schedulix]$ echo "list sessions;
show system;
include '/tmp/x.sdms'
list sessions;" | bin/sdmsh

As "clearly visible" this test also changes input from stdin to /tmp/x.sdms and back. I won't provide the lengthy output here. You'll have to believe me it was correct. ;-)

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 22, 2015

Member

I forgot to say that you'll need the new skeleton.nested file from f312d56, because I changed the sections. This means it's no longer compatible with earlier jflex versions.

It's at https://github.com/jflex-de/jflex/blob/master/jflex/src/main/jflex/skeleton.nested

Member

lsf37 commented Feb 22, 2015

I forgot to say that you'll need the new skeleton.nested file from f312d56, because I changed the sections. This means it's no longer compatible with earlier jflex versions.

It's at https://github.com/jflex-de/jflex/blob/master/jflex/src/main/jflex/skeleton.nested

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 22, 2015

Member

ps: the break statement should not be necessary with any jflex version, including 1.4.3. If they are, then there is something strange going on that we should investigate.

Member

lsf37 commented Feb 22, 2015

ps: the break statement should not be necessary with any jflex version, including 1.4.3. If they are, then there is something strange going on that we should investigate.

@schedulix

This comment has been minimized.

Show comment
Hide comment
@schedulix

schedulix Feb 22, 2015

mmmh, this means that I have to do something special? I did a git pull origin master in my jflex directory this morning and after building the software I called

[ronald@cheetah shell]$ make
/home/ronald/jflex-master/jflex/bin/jflex -d /tmp -skel skeleton.nested MiniScanner.jlex
Reading skeleton file "skeleton.nested".
Reading "MiniScanner.jlex"
Constructing NFA : 436 states in NFA
Converting NFA to DFA : 
.................................................................................................................................................................
173 states before minimization, 158 states in minimized DFA
Writing code to "/tmp/MiniScanner.java"
sed -e 's/zzCMapL\[zzInput\]/(((zzInput >= 0) \&\& (zzInput < zzCMapL.length)) ? zzCMapL\[zzInput\] : 0)/' /tmp/MiniScanner.java >MiniScanner.java

(I didn't actively remember any more, but obviously I use sed to fix some other ancient problem which seems to address outside of an array. As long as zzInput isn't out of bounds, it's a void change).

I checked which jflex version is installed on the computer I work on at the moment. It's

[ronald@cheetah bin]$ ./jflex -version
This is JFlex 1.4

So it is probably older than 1.4.3.

Since the breaks don't harm (except for the <<EOF>> rule in 1.7), I leave them where they are.
The missing one in the <<EOF>> rule doesn't harm in all other jflex versions because it's physically the last part of the switch statement. And since the older jflex are unlikely to change, everything's perfect from my point of view.

schedulix commented Feb 22, 2015

mmmh, this means that I have to do something special? I did a git pull origin master in my jflex directory this morning and after building the software I called

[ronald@cheetah shell]$ make
/home/ronald/jflex-master/jflex/bin/jflex -d /tmp -skel skeleton.nested MiniScanner.jlex
Reading skeleton file "skeleton.nested".
Reading "MiniScanner.jlex"
Constructing NFA : 436 states in NFA
Converting NFA to DFA : 
.................................................................................................................................................................
173 states before minimization, 158 states in minimized DFA
Writing code to "/tmp/MiniScanner.java"
sed -e 's/zzCMapL\[zzInput\]/(((zzInput >= 0) \&\& (zzInput < zzCMapL.length)) ? zzCMapL\[zzInput\] : 0)/' /tmp/MiniScanner.java >MiniScanner.java

(I didn't actively remember any more, but obviously I use sed to fix some other ancient problem which seems to address outside of an array. As long as zzInput isn't out of bounds, it's a void change).

I checked which jflex version is installed on the computer I work on at the moment. It's

[ronald@cheetah bin]$ ./jflex -version
This is JFlex 1.4

So it is probably older than 1.4.3.

Since the breaks don't harm (except for the <<EOF>> rule in 1.7), I leave them where they are.
The missing one in the <<EOF>> rule doesn't harm in all other jflex versions because it's physically the last part of the switch statement. And since the older jflex are unlikely to change, everything's perfect from my point of view.

@schedulix

This comment has been minimized.

Show comment
Hide comment
@schedulix

schedulix Feb 22, 2015

OK, sorry, my fault. There's a skeleton.nested in the local directory. I'm going to think about how to get this portable for several versions of jflex.

schedulix commented Feb 22, 2015

OK, sorry, my fault. There's a skeleton.nested in the local directory. I'm going to think about how to get this portable for several versions of jflex.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 23, 2015

Member

Ah yes, the same skeleton over multiple jflex versions will be a challenge. You could give them separate names, query the jflex version in the build, and use the right skeleton based on that.

Leaving the breaks in shouldn't do any harm. Did I read your previous message right that it now works correctly if you use the new skeleton?

Member

lsf37 commented Feb 23, 2015

Ah yes, the same skeleton over multiple jflex versions will be a challenge. You could give them separate names, query the jflex version in the build, and use the right skeleton based on that.

Leaving the breaks in shouldn't do any harm. Did I read your previous message right that it now works correctly if you use the new skeleton?

@schedulix

This comment has been minimized.

Show comment
Hide comment
@schedulix

schedulix Feb 23, 2015

That's what I did (the new skeleton is called skeleton.nested.1.7; I'm sure I'll get a prize for originality on this one ;).
The problematic utility seems to work flawlessly. I'm doing the changes in the master branch of our git and soon I'll be doing some thorough testing because of some other features we newly developed. The piggyback testing of the JFlex 1.7 compatibility will be part of it. I'll give you a final feedback within the next few days, but so far everything looks good.

schedulix commented Feb 23, 2015

That's what I did (the new skeleton is called skeleton.nested.1.7; I'm sure I'll get a prize for originality on this one ;).
The problematic utility seems to work flawlessly. I'm doing the changes in the master branch of our git and soon I'll be doing some thorough testing because of some other features we newly developed. The piggyback testing of the JFlex 1.7 compatibility will be part of it. I'll give you a final feedback within the next few days, but so far everything looks good.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 24, 2015

Member

Excellent. I have another change in the pipeline that might affect people (from issue #131), although it usually shouldn't.

Do you have an opinion on that discussion from your application? (The change may lead to IOExceptions being thrown if the underlying Reader returns 0 on read, which it is not supposed to do, but some existing Readers do. The "proper" fix is to let the user wrap these in another Reader that conforms to the read specification and knows what the source is, instead JFlex trying to predict what is going on).

Member

lsf37 commented Feb 24, 2015

Excellent. I have another change in the pipeline that might affect people (from issue #131), although it usually shouldn't.

Do you have an opinion on that discussion from your application? (The change may lead to IOExceptions being thrown if the underlying Reader returns 0 on read, which it is not supposed to do, but some existing Readers do. The "proper" fix is to let the user wrap these in another Reader that conforms to the read specification and knows what the source is, instead JFlex trying to predict what is going on).

@schedulix

This comment has been minimized.

Show comment
Hide comment
@schedulix

schedulix Feb 24, 2015

Well, I have quite a clear opinion on the Reader returning 0. In fact I think that it's the users responsibility to make sure it doesn't happen.
In Java spawning a thread is very simple. Hence there's no need for non-blocking readers; if you want one, which is legitimate, just spawn a thread that does the reading for you. Depending on the policy one wants to implement that might not be a trivial task. But the less trivial it is, the more it should be separated from lexically analyzing the stream.
Therefor, the only valid reason to return a zero I can think of doesn't apply.

Hey, did you read the book? Yes, I read zero characters! My teachers would have killed me for such an answer since it's a clear No.
Hence, everything else is a bogus implementation. I'm perfectly fine if an IOException is thrown because a Reader returns zero though it is not at EOF.

But it is necessary that the Reader isn't obligated to read an entire buffer. That would be like a bartender that doesn't sell you coffee if your request isn't exactly 16384 characters long. "Coffee! fast!" should be sufficient ;-)

I'm a true fan of fail-fast modules since I am convinced it's the only way to yield stable systems. This might sound odd at first, but think of the implications.
If some error occurs which hasn't been thought of, the process halts and the error can be investigated. Important is that, because of the immediate halt, no side effects occur. The core dump (or whatever error describing information) contains the exact situation at the time of error which makes it easier to analyze it.
As soon as the error has been analyzed and understood the software can be extended to handle the situation, which transforms the error to a valid (maybe annoying, but still valid) state.
Since there weren't any side effects, the recovery of the system will be easier.

When talking about job scheduling, we advocate something we call "Process Decomposition", which is basically the partitioning of a monolithic process into small units each doing exactly one thing (the old Unix philosophy: do only one thing and do it right). Data processing stops if one of the units runs into a non handled exception. Recovery afterwards is easy or at least still possible. In my opinion this is the only way to survive in a Big Data/Data Warehouse environment.

schedulix commented Feb 24, 2015

Well, I have quite a clear opinion on the Reader returning 0. In fact I think that it's the users responsibility to make sure it doesn't happen.
In Java spawning a thread is very simple. Hence there's no need for non-blocking readers; if you want one, which is legitimate, just spawn a thread that does the reading for you. Depending on the policy one wants to implement that might not be a trivial task. But the less trivial it is, the more it should be separated from lexically analyzing the stream.
Therefor, the only valid reason to return a zero I can think of doesn't apply.

Hey, did you read the book? Yes, I read zero characters! My teachers would have killed me for such an answer since it's a clear No.
Hence, everything else is a bogus implementation. I'm perfectly fine if an IOException is thrown because a Reader returns zero though it is not at EOF.

But it is necessary that the Reader isn't obligated to read an entire buffer. That would be like a bartender that doesn't sell you coffee if your request isn't exactly 16384 characters long. "Coffee! fast!" should be sufficient ;-)

I'm a true fan of fail-fast modules since I am convinced it's the only way to yield stable systems. This might sound odd at first, but think of the implications.
If some error occurs which hasn't been thought of, the process halts and the error can be investigated. Important is that, because of the immediate halt, no side effects occur. The core dump (or whatever error describing information) contains the exact situation at the time of error which makes it easier to analyze it.
As soon as the error has been analyzed and understood the software can be extended to handle the situation, which transforms the error to a valid (maybe annoying, but still valid) state.
Since there weren't any side effects, the recovery of the system will be easier.

When talking about job scheduling, we advocate something we call "Process Decomposition", which is basically the partitioning of a monolithic process into small units each doing exactly one thing (the old Unix philosophy: do only one thing and do it right). Data processing stops if one of the units runs into a non handled exception. Recovery afterwards is easy or at least still possible. In my opinion this is the only way to survive in a Big Data/Data Warehouse environment.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 26, 2015

Member

I like clear statements ;-) and I agree with the argument.

Just wanted to hear more opinions before I implement it and prepare to get email about scanners that don't work any more. It's the right way forward, though.

Member

lsf37 commented Feb 26, 2015

I like clear statements ;-) and I agree with the argument.

Just wanted to hear more opinions before I implement it and prepare to get email about scanners that don't work any more. It's the right way forward, though.

@schedulix

This comment has been minimized.

Show comment
Hide comment
@schedulix

schedulix Feb 26, 2015

From my point of view this issue can be closed.

We identified the exact situation that caused the problem: a not-so-optimal implementation of a scanner in combination with the wrong choice between EOF and an empty string match. The solution was to have an EOF take precedence over the empty string match. On my side I changed the not-so-optimal scanner into a little-less-not-so-optimal scanner by eliminating the possibility of an empty string to match some rule.
This is: One bug and two improved systems :-) This is also called a win-win situation. Thanks for that!

The last three comments are about bug #131. And I think we have a consensus here. Implement a strict policy (i.e. throw an exception) and spend a few lines of documentation on it.

schedulix commented Feb 26, 2015

From my point of view this issue can be closed.

We identified the exact situation that caused the problem: a not-so-optimal implementation of a scanner in combination with the wrong choice between EOF and an empty string match. The solution was to have an EOF take precedence over the empty string match. On my side I changed the not-so-optimal scanner into a little-less-not-so-optimal scanner by eliminating the possibility of an empty string to match some rule.
This is: One bug and two improved systems :-) This is also called a win-win situation. Thanks for that!

The last three comments are about bug #131. And I think we have a consensus here. Implement a strict policy (i.e. throw an exception) and spend a few lines of documentation on it.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 26, 2015

Member

I agree!

Member

lsf37 commented Feb 26, 2015

I agree!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment