Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #40, EOF/EOL #45

Merged
merged 14 commits into from
Jun 18, 2021
Merged

Issue #40, EOF/EOL #45

merged 14 commits into from
Jun 18, 2021

Conversation

meisl
Copy link
Contributor

@meisl meisl commented Jun 12, 2021

Finally, I can build myself :D

So I just tried what I had been proposing in #40, and it turned out to just work. I did the "automagic newline normalization" as well, including support for Mac newlines (sole \r).

I wasn't quite sure what you had in mind with COMMENT and LINECOMMENT, so I went for maximally defensive = same behavior as before, just with added Windows and Mac line endings.

What's missing are tests, I'd need your help with that. Haven't worked with IDEA, gradle, Kotlin before. Maybe you could just add one trivial test to the parser module, so that I can see what setup is needed. I'd try to adapt then.

@irmen
Copy link
Owner

irmen commented Jun 13, 2021

Hi !
Have a look at a2588a1
I added TestAntlrParser.kt with some simple tests to get started
You can run it with "gradle test" from the command line, or from IDEA with its test runner in the gui.

By the way I don't think it's useful to put (java) tests in the (java) Parser submodule as I think it's very cumbersome to directly test the antlr parse tree. So I put it in the CompilerAst module. You can still directly access the antlr nodes there if you like but also (preferably) the transformed Ast nodes from the prog8 parse tree

@meisl
Copy link
Contributor Author

meisl commented Jun 13, 2021

Great! I'll add tests to the PR.

You're right, let's not add java tests. I realized that the parser module isn't Kotlin but java just before I read your comment :)

Right now I'm having my fights with IDEA, it won't build because of some strange "missing/conflicting dependency" re the JDK for module compilerAst - but only for one file, and on top of that, it's a different file after restart of the IDE...! Well, doesn't matter too much, I can still build and run tests with gradle.

@meisl
Copy link
Contributor Author

meisl commented Jun 13, 2021

Hi @irmen,
I don't understand at all what you did in 3f58eca but thanks a lot for it anyways :)
It did make quite some warnings/problems go away for me. However, IDEA still doesn't seem to like me much. This weird JDK conflict is gone, but now it's complaining about not knowing prog8Lexer, prog8Parser,... from inside parser/ModuleParsing.kt, even though that's in the same package (prog8.parser) - I don't get it.

Anyhow, the gradle stuff - building and testing - does work, and with no more warnings now.

I've managed to create (hopefully) reasonable tests, and as you might see from my numerous additional commits, went to great lengths to make sure that what I did actually does test what it should. Bear in mind that I'm in the midst of learning all that stuff still.

There's one strange thing I couldn't wrap my head around yet: one test, testWindowsAndMacNewlinesAreAlsoFine I cannot get to fail (!) That is, neither in IDEA nor with gradlew test. However, when I do build the big jar (gradlew installShadowDist) with the old grammar (newlines just \n), and throw a Windows or Mac .p8 file at it, then this does indeed fail to parse it (and succeeds with the new grammar).
There must be something really hideous going on inside CharStreams.fromString I suspect, that "automagically" normalizes the newlines when running the tests, and only then.

So, can this PR be merged now?
Well, it depends on this never-failing one test. It might just be my screwed up config or something. But if indeed this test doesn't actually test what it's supposed to, then this should probably be investigated and resolved first. Maybe you have the time to clone my fork here locally and try it yourself. Right now it's the "new" grammar, but the old newline handling is just commented out there. So, "as is" the test should pass, and when you swap the commenting in parser/antlr/prog8.g4 it should fail.

@meisl
Copy link
Contributor Author

meisl commented Jun 14, 2021

I found the problem, hope you didn't waste time on this yet. I'll add a commit and explain soon.

@irmen
Copy link
Owner

irmen commented Jun 14, 2021

It's probably in the weekend that I have time to actually investigate this PR anyway, so you can keep working on it if you like no problem

meisl added 3 commits June 14, 2021 22:02
…): also test sole \r AND do not allow any recovery, neither from parser not lexer.
… a "synthesized double EOF" (behavior remains exactly the same)
@meisl
Copy link
Contributor Author

meisl commented Jun 14, 2021

The strange non-failing test (7c1de81, ce76a7d):

  • Not the CharStream but the lexer recovered by just skipping over \r
  • This could succeed only because I did not test a sole \r (was always followed by \r\n or \n)
  • The difference between tests and "big jar" was a different setup of error strategy and error listener.

...so I learned quite a bit of ANTLR, which led to

Other things I found out:

  • There's a cool ANTLR plugin for IDEA, see below
  • The old workaround, newline normalization in Kotlin, had introduced a bug (I think) to the effect that the source file name was missing in error messages. That was in ModuleParsing.kt, importModule and had to do with CharStreams.fromString. It seems to have been fixed by doing it in the grammar. Unfortunately I was unable to reconstruct it, neither to write a test. I'll see what I can do, in another PR; see Bug: file name missing in err msgs for imported files #46 for details.
  • My original fix, (EOL | EOF) at the end of rule block was in fact a little bit of cheating, as it required the synthetic token EOF to be consumable more than once. Even though this just works due to ANTLR smartness, it's nevertheless ugly, so I did it properly (fcb1a7e), including a comprehensive comment in the grammar. To illustrate (using mentioned plugin):
    • input source: foo {\n} (= no trailing newline)
    • old rules:
      module: (directive | block | EOL)* EOF;
      block: identifier integerliteral? '{' EOL (block_statement | EOL)* '}' (EOL | EOF);
    • old parse tree:
      parseTree_EOF-old
    • new rules:
      module: EOL? ((directive | block) (EOL (directive | block))*)? EOL? EOF;
      block: identifier integerliteral? '{' EOL (block_statement | EOL)* '}';
    • new parse tree:
      parseTree_EOF-new
    • Notes
      • The new module rule is considerably more complicated, but it has the exact same effect (just without the need for ANTLR to recover by "inserting" EOFs)
      • I was wondering if what it does is really what we want. The comment I added says:
        // A module (file) consists of zero or more directives or blocks, in any order.
        // If there are more than one, then they must be separated by EOL (one or more newlines).
        // However, trailing EOL is NOT required.
        For example this also allows a completely empty file, which indeed can currently be imported without any warning or error. On the other hand, if e.g. at least one block would be required, then that would make the rule simpler (!). Also: which directives are allowed where could also be partly handled in the grammar, leading to better error messages.
  • There are a few (more) quirks regarding directives, to curious effects. But that's for another PR.

So far, I'd say this PR is ready now.

  • Of course I'd be pleased by any comments and/or further requirements from you :)

@irmen
Copy link
Owner

irmen commented Jun 15, 2021

yeah the antlr plugin for IDEA is quite useful! I used it a lot when first constructing the grammar

@meisl
Copy link
Contributor Author

meisl commented Jun 15, 2021

Tracked down mentioned bug, finally: #46

@irmen
Copy link
Owner

irmen commented Jun 17, 2021

@meisl

if e.g. at least one block would be required, then that would make the rule simpler

Then we lose the possibility of just having a module file containing just %import directives.
While that is a bit silly (why would you %import your main block from another module?), I think it shouldn't be forbidden.

which directives are allowed where could also be partly handled in the grammar, leading to better error messages.

possibly, but I found it not really worth it. Do you think "directive xxx is only allowed on module scope" is too vague? This should be one of the errors printed by the ast checker step if you make a mistake with the directives

@meisl
Copy link
Contributor Author

meisl commented Jun 17, 2021

  • Only %imports in a file: I was wondering the same; what I could think of was a way of selecting different sets of libraries for different targets, maybe. Feels a bit ad hoc, though. Anyways, I'm fine with the new rule - plus the comment - if you are.
  • Order of directives and blocks: didn't say it explicitly, but things like %target, %imports and a few others should imho really come first, before the first block (if any). This also would have the potential of making the rule slightly simpler (but as said, that isn't the main concern for me).
  • Which directive where? The docs are actually fine, it's the error messages you get: they tend to state a bunch of alternatives, which are semantically disallowed. Less is better in this case. Keep in mind: the user (=programmer) does not - and should not need to - know about the actual grammar (workings). And: getting an error message that's to the point is better than having to consult the docs.
  • Where to implement such checks/messages? I understand that you want to keep the non-Kotlin lines of code to a minimum, that's ok. However, this is not an either-or question, it will remain a compromise (as long as we stick with ANTLR, to be precise). So I'd opt for easiest/most simple, while aiming for good (=better) error messages.
  • Directives in general are worth a separate discussion, in a separate issue I think. But hey, I can't resist anyway: try %importfooblah (without space)...

@meisl
Copy link
Contributor Author

meisl commented Jun 17, 2021

Oh, maybe I didn't get you quite right re "Which directive where?". I suppose you meant that the error messages already say "directive xxx ...". But that's not always the case, in fact rather seldom. Rather as I described. Needs of course more thorough investigation - and tests ;)

@irmen
Copy link
Owner

irmen commented Jun 18, 2021

Just checked this out. Looking very good! Thanks for your work on this!

@irmen irmen merged commit cfe4753 into irmen:master Jun 18, 2021
@meisl meisl mentioned this pull request Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants