Use Rust Parser from Java instead of AST.scala #3611

JaroslavTulach · 2022-07-26T06:02:23Z

Enabling Rust based parser as the default Enso parser. Disabling few currently failing tests - marking their appropriate reproducers as @Ignored in EnsoCompilerTest. Introducing ENSO_PARSERproperty to allow anyone to switch to old parser in case of problems. Just use:

$ ENSO_PARSER=scala ./bin/enso --run ....
$ ENSO_PARSER=scala ./run ide watch

and you temporarily get the exact behavior you are used to.

Checklist

All code conforms to the
Scala,
Java,
style guides.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed: Enso GUI was tested when built using BOTH
  ./run ide build and ./run ide watch.

lib/rust/parser/generate-java/java/org/enso/syntax2/Parser.java

JaroslavTulach · 2022-07-27T10:06:41Z

With 3ef9022 the testOnly *MethodsTest* test suite succeeds fine including one test which is using the new parser: "throw an exception when non-existent"!

lib/rust/metamodel/src/java/implementation.rs

kazcw · 2022-08-29T16:58:16Z

@JaroslavTulach How can I run these tests? I've tried ./run backend test standard-library but it fails with compile errors (cannot find Tree while compiling Parser).

JaroslavTulach · 2022-09-01T01:37:39Z

@JaroslavTulach How can I run these tests? I've tried ./run backend test standard-library but it fails with compile errors (cannot find Tree while compiling Parser).

There is a primitive build system in LoadParser.sh that invokes necessary commands.

JaroslavTulach · 2022-09-05T18:54:11Z

Updated to most recent develop branch + fixed compilation problems. Now the ./LoadParser.sh score is:

Found 190 files. 105 failed to parse
0 files failed because they have comments
From 76 files 65 failed to produce IR

JaroslavTulach · 2022-09-06T09:48:57Z

Twelve more files produce IR now:

Found 190 files. 105 failed to parse
0 files failed because they have comments
From 76 files 53 failed to produce IR

engine/runtime/src/main/java/org/enso/compiler/TreeToIr.java

build.sbt

JaroslavTulach · 2022-09-17T17:50:31Z

I am trying:

enso$ rm -rf *; git checkout -f .
enso$ sbt --java-home ~/bin/graalvm bootstrap
enso$ sbt --java-home ~/bin/graalvm "runtime/testOnly *EnsoCompilerTest"

and that fails complaining there is no parser library .so and yes, there is none:

[error] Test org.enso.compiler.EnsoCompilerTest failed: java.lang.UnsatisfiedLinkError: Can't load library: /target/rust/debug/libenso_parser.so, took 0.0 sec
[error]     at java.lang.ClassLoader.loadLibrary(ClassLoader.java:2630)
[error]     at java.lang.Runtime.load0(Runtime.java:768)
[error]     at java.lang.System.load(System.java:1835)
[error]     at org.enso.syntax2.Parser.<clinit>(Parser.java:24)
[error]     at org.enso.compiler.EnsoCompiler.<init>(EnsoCompiler.java:13)
[error]     at org.enso.compiler.EnsoCompilerTest.initEnsoCompiler(EnsoCompilerTest.java:27)

running sbt generateRustParserLib manually before running the test fixes the problem. However, I'd like this dependency to be built automatically, if possible.

kazcw

It's great to see that we're passing tests with the new parser--it is working for correct inputs. However there are a few (small) things necessary to handle syntax errors gracefully, and I think that's important enough to the libraries-developer experience that we should do it before changing the default parser.

The 3 things we need are:

https://www.pivotaltracker.com/story/show/183405907: 2 days of work Rust side; this is the next thing on my agenda. (Edit: implemented in Ensure parses of invalid inputs represent all tokens #3860)
https://www.pivotaltracker.com/story/show/183740897: Make errors (IR$Error$Syntax), not exceptions, in TreeToIr.
When an invalid escape sequence is encountered, getValue() returns -1. Currently TreeToIr throws an exception in that case; I think inserting the Unicode "replacement character" U+FFFD (�) would be a better way to handle it. (In the future, the lexer will also emit a warning when encountering such a case; this is part of https://www.pivotaltracker.com/story/show/182963507 because lexer warnings depend on frontend-related UUID work).

JaroslavTulach · 2022-11-09T08:47:45Z

https://www.pivotaltracker.com/story/show/183740897: Make errors (IR$Error$Syntax), not exceptions, in TreeToIr.

Created #3861 to address the problem of errors (IR$Error$Syntax).

When an invalid escape sequence is encountered, getValue() returns -1. Currently TreeToIr throws an exception in that case; I think inserting the Unicode "replacement character" U+FFFD (�) would be a better way to handle it.

Shouldn't the getValue() return 0xFFFD instead of -1 then? Anyway addressing this on any side of the parser is easy once we have a testcase. So far I found that:

main = '\x'

behaves differently between the old and new parser. However there is no -1, just 0 returned from getValue(). If you have a testcase in mind, please share.

hubertp

lgtm

hubertp · 2022-11-09T13:28:48Z

engine/runtime/src/bench/scala/org/enso/interpreter/bench/fixtures/semantic/AtomFixtures.scala

@@ -39,7 +39,7 @@ class AtomFixtures extends DefaultInterpreterRunner {
  val reverseListCode =
    """from Standard.Base.Data.List import all
      |
-      |main = list ->
+      |main = self -> list ->


a TODO to figure out why self parameter is needed?

kazcw · 2022-11-09T20:28:19Z

When an invalid escape sequence is encountered, getValue() returns -1. Currently TreeToIr throws an exception in that case; I think inserting the Unicode "replacement character" U+FFFD (�) would be a better way to handle it.

Shouldn't the getValue() return 0xFFFD instead of -1 then?

I think it's important to maintain a distinction between a valid escape with value 0xFFFD, and an invalid escape where we are just using that value as a placeholder in the backend to allow compilation to proceed in spite of an error. It may make a difference in editing or refactoring tools, where we wouldn't want to accidentally "promote" an invalid escape to a valid escape of the placeholder character.

Anyway addressing this on any side of the parser is easy once we have a testcase. So far I found that:
main = '\x'
behaves differently between the old and new parser. However there is no -1, just 0 returned from getValue(). If you have a testcase in mind, please share.

Ah, that's a great test case! The lexer accepts up to 2 hexadecimal characters in the \x type of escape. I will update it to specifically disallow a zero-length hex number, because we don't need \x to be an alias for \0 and \x0.

In the meantime, an example of an invalid escape would be: \c

This PR mimics test cases from #3860 and makes sure `IR.Syntax.Error` is constructed at appropriate places rather than just yielding an `UnhandledEntity` exception. # Important Notes Merge before #3611 to minimize disruption when changing the parser.

…tParserFromJava_182743471

JaroslavTulach · 2022-11-10T10:26:14Z

In the meantime, an example of an invalid escape would be: \c

Done in 56f04d6. Let's integrate.

JaroslavTulach requested review from 4e6, MichaelMauderer, mwu-tow, farmaazon, wdanilo, kazcw, PabloBuchu and jdunkerley as code owners July 26, 2022 06:02

JaroslavTulach self-assigned this Jul 26, 2022

JaroslavTulach marked this pull request as draft July 26, 2022 06:02

JaroslavTulach removed request for 4e6, MichaelMauderer, mwu-tow, PabloBuchu, farmaazon and jdunkerley July 26, 2022 06:03

JaroslavTulach mentioned this pull request Jul 26, 2022

Parser: don't panic for any standard library files #3609

Merged

4 tasks

kazcw approved these changes Jul 26, 2022

View reviewed changes

lib/rust/parser/generate-java/java/org/enso/syntax2/Parser.java Outdated Show resolved Hide resolved

JaroslavTulach changed the base branch from wip/kw/parser/dont-panic to develop July 28, 2022 05:17

JaroslavTulach commented Aug 9, 2022

View reviewed changes

lib/rust/metamodel/src/java/implementation.rs Outdated Show resolved Hide resolved

JaroslavTulach mentioned this pull request Aug 22, 2022

Parser: Parse UUIDs; implement comments in AST; implement type annotations and signatures; fix field names #3653

Merged

4 tasks

kazcw reviewed Sep 8, 2022

View reviewed changes

engine/runtime/src/main/java/org/enso/compiler/TreeToIr.java Outdated Show resolved Hide resolved

JaroslavTulach force-pushed the wip/jtulach/UseRustParserFromJava_182743471 branch from 79b58d1 to dc87923 Compare September 15, 2022 03:50

JaroslavTulach commented Sep 15, 2022

View reviewed changes

build.sbt Outdated Show resolved Hide resolved

kazcw reviewed Nov 8, 2022

View reviewed changes

JaroslavTulach added 2 commits November 9, 2022 04:59

Merge branch 'develop' into wip/jtulach/UseRustParserFromJava_182743471

54857c1

Fallback to old parser when the .so file of the new one cannot be loaded

8f33e0a

JaroslavTulach force-pushed the wip/jtulach/UseRustParserFromJava_182743471 branch from 2a61405 to 8f33e0a Compare November 9, 2022 05:06

JaroslavTulach mentioned this pull request Nov 9, 2022

Construct IR.Syntax.Error rather than throwing an exception #3861

Merged

2 tasks

hubertp approved these changes Nov 9, 2022

View reviewed changes

JaroslavTulach added 2 commits November 10, 2022 10:29

Merge remote-tracking branch 'origin/develop' into wip/jtulach/UseRus…

c6db855

…tParserFromJava_182743471

Syntax error for InvalidEscapeSequence

56f04d6

JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Nov 10, 2022

JaroslavTulach added 2 commits November 11, 2022 16:11

Don't scare devs with too verbose messages

010c6aa

To avoid long starvation limit the test to 5s

c0a12a9

JaroslavTulach mentioned this pull request Nov 11, 2022

More improvements that work with both parsers #3868

Merged

2 tasks

JaroslavTulach added 3 commits November 12, 2022 08:03

Accept changes from PR-3868

c44ff59

Eliminating differences between CRLF and LF positions by stripping of CR

7641db8

More CRLF -> LF conversions

7d8137b

JaroslavTulach removed the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Nov 12, 2022

Mitigating another set of CRLF vs. LF differences

648dc63

mergify bot merged commit 7f2d02a into develop Nov 13, 2022

mergify bot deleted the wip/jtulach/UseRustParserFromJava_182743471 branch November 13, 2022 06:22

JaroslavTulach mentioned this pull request Mar 14, 2023

New documentation parser #5917

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Rust Parser from Java instead of AST.scala #3611

Use Rust Parser from Java instead of AST.scala #3611

JaroslavTulach commented Jul 26, 2022 •

edited

Loading

JaroslavTulach commented Jul 27, 2022

kazcw commented Aug 29, 2022

JaroslavTulach commented Sep 1, 2022

JaroslavTulach commented Sep 5, 2022

JaroslavTulach commented Sep 6, 2022

JaroslavTulach commented Sep 17, 2022 •

edited

Loading

kazcw left a comment •

edited

Loading

JaroslavTulach commented Nov 9, 2022 •

edited

Loading

hubertp left a comment

hubertp Nov 9, 2022

kazcw commented Nov 9, 2022 •

edited

Loading

JaroslavTulach commented Nov 10, 2022

Use Rust Parser from Java instead of AST.scala #3611

Use Rust Parser from Java instead of AST.scala #3611

Conversation

JaroslavTulach commented Jul 26, 2022 • edited Loading

Checklist

JaroslavTulach commented Jul 27, 2022

kazcw commented Aug 29, 2022

JaroslavTulach commented Sep 1, 2022

JaroslavTulach commented Sep 5, 2022

JaroslavTulach commented Sep 6, 2022

JaroslavTulach commented Sep 17, 2022 • edited Loading

kazcw left a comment • edited Loading

Choose a reason for hiding this comment

JaroslavTulach commented Nov 9, 2022 • edited Loading

hubertp left a comment

Choose a reason for hiding this comment

hubertp Nov 9, 2022

Choose a reason for hiding this comment

kazcw commented Nov 9, 2022 • edited Loading

JaroslavTulach commented Nov 10, 2022

JaroslavTulach commented Jul 26, 2022 •

edited

Loading

JaroslavTulach commented Sep 17, 2022 •

edited

Loading

kazcw left a comment •

edited

Loading

JaroslavTulach commented Nov 9, 2022 •

edited

Loading

kazcw commented Nov 9, 2022 •

edited

Loading