Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Rust Parser from Java instead of AST.scala #3611

Merged
merged 325 commits into from
Nov 13, 2022

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Jul 26, 2022

Enabling Rust based parser as the default Enso parser. Disabling few currently failing tests - marking their appropriate reproducers as @Ignored in EnsoCompilerTest. Introducing ENSO_PARSERproperty to allow anyone to switch to old parser in case of problems. Just use:

$ ENSO_PARSER=scala ./bin/enso --run ....
$ ENSO_PARSER=scala ./run ide watch

and you temporarily get the exact behavior you are used to.

Checklist

  • All code conforms to the
    Scala,
    Java,
    style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH
      ./run ide build and ./run ide watch.

@JaroslavTulach
Copy link
Member Author

With 3ef9022 the testOnly *MethodsTest* test suite succeeds fine including one test which is using the new parser: "throw an exception when non-existent"!

@JaroslavTulach JaroslavTulach changed the base branch from wip/kw/parser/dont-panic to develop July 28, 2022 05:17
@kazcw
Copy link
Contributor

kazcw commented Aug 29, 2022

@JaroslavTulach How can I run these tests? I've tried ./run backend test standard-library but it fails with compile errors (cannot find Tree while compiling Parser).

@JaroslavTulach
Copy link
Member Author

@JaroslavTulach How can I run these tests? I've tried ./run backend test standard-library but it fails with compile errors (cannot find Tree while compiling Parser).

There is a primitive build system in LoadParser.sh that invokes necessary commands.

@JaroslavTulach
Copy link
Member Author

Updated to most recent develop branch + fixed compilation problems. Now the ./LoadParser.sh score is:

Found 190 files. 105 failed to parse
0 files failed because they have comments
From 76 files 65 failed to produce IR

@JaroslavTulach
Copy link
Member Author

Twelve more files produce IR now:

Found 190 files. 105 failed to parse
0 files failed because they have comments
From 76 files 53 failed to produce IR

@JaroslavTulach JaroslavTulach force-pushed the wip/jtulach/UseRustParserFromJava_182743471 branch from 79b58d1 to dc87923 Compare September 15, 2022 03:50
build.sbt Outdated Show resolved Hide resolved
@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Sep 17, 2022

I am trying:

enso$ rm -rf *; git checkout -f .
enso$ sbt --java-home ~/bin/graalvm bootstrap
enso$ sbt --java-home ~/bin/graalvm "runtime/testOnly *EnsoCompilerTest"

and that fails complaining there is no parser library .so and yes, there is none:

[error] Test org.enso.compiler.EnsoCompilerTest failed: java.lang.UnsatisfiedLinkError: Can't load library: /target/rust/debug/libenso_parser.so, took 0.0 sec
[error]     at java.lang.ClassLoader.loadLibrary(ClassLoader.java:2630)
[error]     at java.lang.Runtime.load0(Runtime.java:768)
[error]     at java.lang.System.load(System.java:1835)
[error]     at org.enso.syntax2.Parser.<clinit>(Parser.java:24)
[error]     at org.enso.compiler.EnsoCompiler.<init>(EnsoCompiler.java:13)
[error]     at org.enso.compiler.EnsoCompilerTest.initEnsoCompiler(EnsoCompilerTest.java:27)

running sbt generateRustParserLib manually before running the test fixes the problem. However, I'd like this dependency to be built automatically, if possible.

Copy link
Contributor

@kazcw kazcw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see that we're passing tests with the new parser--it is working for correct inputs. However there are a few (small) things necessary to handle syntax errors gracefully, and I think that's important enough to the libraries-developer experience that we should do it before changing the default parser.

The 3 things we need are:

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Nov 9, 2022

https://www.pivotaltracker.com/story/show/183740897: Make errors (IR$Error$Syntax), not exceptions, in TreeToIr.

Created #3861 to address the problem of errors (IR$Error$Syntax).

When an invalid escape sequence is encountered, getValue() returns -1. Currently TreeToIr throws an exception in that case; I think inserting the Unicode "replacement character" U+FFFD (�) would be a better way to handle it.

Shouldn't the getValue() return 0xFFFD instead of -1 then? Anyway addressing this on any side of the parser is easy once we have a testcase. So far I found that:

main = '\x'

behaves differently between the old and new parser. However there is no -1, just 0 returned from getValue(). If you have a testcase in mind, please share.

Copy link
Contributor

@hubertp hubertp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -39,7 +39,7 @@ class AtomFixtures extends DefaultInterpreterRunner {
val reverseListCode =
"""from Standard.Base.Data.List import all
|
|main = list ->
|main = self -> list ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a TODO to figure out why self parameter is needed?

@kazcw
Copy link
Contributor

kazcw commented Nov 9, 2022

When an invalid escape sequence is encountered, getValue() returns -1. Currently TreeToIr throws an exception in that case; I think inserting the Unicode "replacement character" U+FFFD (�) would be a better way to handle it.

Shouldn't the getValue() return 0xFFFD instead of -1 then?

I think it's important to maintain a distinction between a valid escape with value 0xFFFD, and an invalid escape where we are just using that value as a placeholder in the backend to allow compilation to proceed in spite of an error. It may make a difference in editing or refactoring tools, where we wouldn't want to accidentally "promote" an invalid escape to a valid escape of the placeholder character.

Anyway addressing this on any side of the parser is easy once we have a testcase. So far I found that:

main = '\x'

behaves differently between the old and new parser. However there is no -1, just 0 returned from getValue(). If you have a testcase in mind, please share.

Ah, that's a great test case! The lexer accepts up to 2 hexadecimal characters in the \x type of escape. I will update it to specifically disallow a zero-length hex number, because we don't need \x to be an alias for \0 and \x0.

In the meantime, an example of an invalid escape would be: \c

mergify bot pushed a commit that referenced this pull request Nov 10, 2022
This PR mimics test cases from #3860 and makes sure `IR.Syntax.Error` is constructed at appropriate places rather than just yielding an `UnhandledEntity` exception.

# Important Notes
Merge before #3611 to minimize disruption when changing the parser.
@JaroslavTulach
Copy link
Member Author

In the meantime, an example of an invalid escape would be: \c

Done in 56f04d6. Let's integrate.

@JaroslavTulach JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Nov 10, 2022
@JaroslavTulach JaroslavTulach removed the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Nov 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants