Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mini-language standalone parser for observe #1066

Merged
merged 9 commits into from
May 7, 2020

Conversation

kitchoi
Copy link
Contributor

@kitchoi kitchoi commented May 5, 2020

Part of item 5 in #977

This PR:

  • Adds the grammar file for generating a standalone parser using lark-parser
  • Adds the generated parser
  • Add the developer tool for regenerating the parser whenever the grammar is modified. This introduces a (build) dependency on lark-parser.

Tests are added to verify the parser can differentiate good texts versus bad ones. The parser converts text into a syntax tree. Further interpretation on the syntax tree will require item 4 (Expression) in #977, which has not been introduced yet. But here is an example of how it will get used:

def _handle_trait(trees, default_notifies):
""" Handle an element for a named trait.
Parameters
----------
trees : list of lark.tree.Tree
The children tree for the "trait" rule.
It contains only one item.
default_notifies : list of boolean
The notify flag stack.
Returns
-------
expression : traits.observers.expressions.Expression
"""
token, = trees
# sanity check
if token.type != _NAME_TOKEN:
raise ValueError("Unexpected token: {!r}".format(token))
name = token.value
notify = default_notifies[-1]
return _expr_module.trait(name, notify=notify)

The standalone parser is only for internal use. There will be a user-facing one later when Expression is finalized.

Checklist

  • Tests
  • Update API reference (docs/source/traits_api_reference)
  • Update User manual (docs/source/traits_user_manual)
  • Update type annotation hints in traits-stubs

@kitchoi
Copy link
Contributor Author

kitchoi commented May 5, 2020

The generated 2000+ lines parser file _generated_parser.py is probably NOT reviewable.

Copy link
Contributor

@corranwebster corranwebster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammar file needs to be added to the project manifest file so that it gets bundled with source distributions. It doesn't need to be added to the package data in setup.py, however.

Other than that, it looks good.

etstool.py Outdated
"{grammar_path}"
)

with open(out_path, "w") as out_file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a command-line option to get lark to output to a file? that would be preferable to capturing stdout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and unfortunately no:
https://github.com/lark-parser/lark/blob/master/lark/tools/standalone.py#L87

Well spotted about the source distribution. I will add the file to MANIFEST.in

@kitchoi kitchoi requested a review from corranwebster May 7, 2020 09:15
Copy link
Contributor

@corranwebster corranwebster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,54 @@
// Grammar for Traits Mini Language used in observe
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized nowhere in the file says what this lark file is for (it is kind of implicitly there in the filename/context). So I added this line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also record somewhere the exact command-line command needed to generate the parser from the grammar? (We probably eventually want that wrapped up in an etstool task, too, but that's separate.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, ignore. I should have looked harder at the PR first. (/me crawls back into his hole)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, you helped me realize that I misspelt the filename for etstool.py as etstools.py on line 3. Now fixed. Waiting for CI...

etstool.py Outdated
"{grammar_path}"
)

with open(out_path, "w") as out_file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include an explicit encoding here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, thanks.

@mdickinson
Copy link
Member

It appears that the lark output can differ from run to run: if I do python etstool.py generate-parser, I get a big diff in the DATA and MEMO constants from the existing _generated_parser.py.

Do you know if there's any way to control this? Reproducibility would be good.

@mdickinson
Copy link
Member

Do you know if there's any way to control this?

From lark-parser/lark#278, it looks as though the answer for now is "not really, no".

@mdickinson
Copy link
Member

From lark-parser/lark#278, it looks as though the answer for now is "not really, no".

Hmm. But that issue's closed, and apparently fixed. I'm not sure what's going on here.

@codecov-io
Copy link

Codecov Report

Merging #1066 into master will decrease coverage by 3.34%.
The diff coverage is 56.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1066      +/-   ##
==========================================
- Coverage   76.15%   72.80%   -3.35%     
==========================================
  Files          54       63       +9     
  Lines        6493     8036    +1543     
  Branches     1263     1538     +275     
==========================================
+ Hits         4945     5851     +906     
- Misses       1205     1806     +601     
- Partials      343      379      +36     
Impacted Files Coverage Δ
traits/api.py 100.00% <ø> (+9.67%) ⬆️
traits/base_trait_handler.py 61.76% <ø> (ø)
traits/ctrait.py 71.07% <ø> (ø)
traits/has_traits.py 72.40% <ø> (-0.37%) ⬇️
traits/observers/_i_observable.py 0.00% <0.00%> (ø)
traits/traits.py 75.10% <ø> (-2.45%) ⬇️
traits/util/resource.py 15.25% <ø> (ø)
traits/observers/_generated_parser.py 51.96% <51.96%> (ø)
traits/trait_types.py 72.15% <80.00%> (-0.31%) ⬇️
traits/observers/_trait_event_notifier.py 96.61% <96.61%> (ø)
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9f101c...07b2f84. Read the comment docs.

@kitchoi
Copy link
Contributor Author

kitchoi commented May 7, 2020

It appears that the lark output can differ from run to run: if I do python etstool.py generate-parser, I get a big diff in the DATA and MEMO constants from the existing _generated_parser.py.

I can reproduce the different outputs with lark 0.8, Python 3.8 (where dictionaries are ordered), even with PYTHONHASHSEED set to a fixed value. No idea why.

@kitchoi
Copy link
Contributor Author

kitchoi commented May 7, 2020

Merging. It is not clear if there is a trivial way to get the output consistent and it does not seem important enough to block the parser from going in...

@kitchoi kitchoi merged commit 36d3d5d into master May 7, 2020
@kitchoi kitchoi deleted the 977-mini-language-parser branch May 7, 2020 13:47
@mdickinson
Copy link
Member

It is not clear if there is a trivial way to get the output consistent and it does not seem important enough to block the parser from going in...

Agreed, but please could you open an issue? We may want to report upstream too, but a local issue would be the first step for that.

@kitchoi
Copy link
Contributor Author

kitchoi commented May 7, 2020

I typed and removed ("will open an issue upstream") because I did not want to promise that - I will need to work out a reproducible example first.

@mdickinson
Copy link
Member

Yep, just a local issue would be fine; that issue will act as a trigger for us to send an upstream issue when we're ready.

@kitchoi
Copy link
Contributor Author

kitchoi commented May 8, 2020

Opened lark-parser/lark#584

I think lark-parser/lark#278 is about the parsed syntax tree showing nondeterministic behaviour.

Here the nondeterministic behaviour is about the generated code not being static, creating noisy diff despite using the same grammar.

@mdickinson
Copy link
Member

@kitchoi Thank you!

One option we might consider is not tracking the parser code in version control, but instead generating it at package install time (i.e., as a result of running the setup script). I suspect that that would be more annoying than helpful, but there are tradeoffs either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants