Add mini-language standalone parser for observe #1066

kitchoi · 2020-05-05T10:24:33Z

Part of item 5 in #977

This PR:

Adds the grammar file for generating a standalone parser using lark-parser
Adds the generated parser
Add the developer tool for regenerating the parser whenever the grammar is modified. This introduces a (build) dependency on lark-parser.

Tests are added to verify the parser can differentiate good texts versus bad ones. The parser converts text into a syntax tree. Further interpretation on the syntax tree will require item 4 (Expression) in #977, which has not been introduced yet. But here is an example of how it will get used:

traits/traits/observers/parsing.py

Lines 139 to 160 in 318f284

    
           def _handle_trait(trees, default_notifies): 
        
               """ Handle an element for a named trait. 
        
               Parameters 
        
               ---------- 
        
               trees : list of lark.tree.Tree 
        
                   The children tree for the "trait" rule. 
        
                   It contains only one item. 
        
               default_notifies : list of boolean 
        
                   The notify flag stack. 
        
               Returns 
        
               ------- 
        
               expression : traits.observers.expressions.Expression 
        
               """ 
        
               token, = trees 
        
               # sanity check 
        
               if token.type != _NAME_TOKEN: 
        
                   raise ValueError("Unexpected token: {!r}".format(token)) 
        
               name = token.value 
        
               notify = default_notifies[-1] 
        
               return _expr_module.trait(name, notify=notify)

The standalone parser is only for internal use. There will be a user-facing one later when Expression is finalized.

Checklist

Tests
~~Update API reference (docs/source/traits_api_reference)~~
~~Update User manual (docs/source/traits_user_manual)~~
~~Update type annotation hints in traits-stubs~~

kitchoi · 2020-05-05T10:27:08Z

The generated 2000+ lines parser file _generated_parser.py is probably NOT reviewable.

corranwebster

The grammar file needs to be added to the project manifest file so that it gets bundled with source distributions. It doesn't need to be added to the package data in setup.py, however.

Other than that, it looks good.

corranwebster · 2020-05-07T08:24:55Z

etstool.py

+        "{grammar_path}"
+    )
+
+    with open(out_path, "w") as out_file:


Is there a command-line option to get lark to output to a file? that would be preferable to capturing stdout.

I checked and unfortunately no:
https://github.com/lark-parser/lark/blob/master/lark/tools/standalone.py#L87

Well spotted about the source distribution. I will add the file to MANIFEST.in

corranwebster

LGTM

kitchoi · 2020-05-07T11:54:14Z

traits/observers/_dsl_grammar.lark

@@ -0,0 +1,54 @@
+// Grammar for Traits Mini Language used in observe


Just realized nowhere in the file says what this lark file is for (it is kind of implicitly there in the filename/context). So I added this line.

Could we also record somewhere the exact command-line command needed to generate the parser from the grammar? (We probably eventually want that wrapped up in an etstool task, too, but that's separate.)

Sorry, ignore. I should have looked harder at the PR first. (/me crawls back into his hole)

Not at all, you helped me realize that I misspelt the filename for etstool.py as etstools.py on line 3. Now fixed. Waiting for CI...

mdickinson · 2020-05-07T12:02:27Z

etstool.py

+        "{grammar_path}"
+    )
+
+    with open(out_path, "w") as out_file:


Please include an explicit encoding here.

(related: https://www.python.org/dev/peps/pep-0597/)

Well spotted, thanks.

mdickinson · 2020-05-07T12:13:10Z

It appears that the lark output can differ from run to run: if I do python etstool.py generate-parser, I get a big diff in the DATA and MEMO constants from the existing _generated_parser.py.

Do you know if there's any way to control this? Reproducibility would be good.

mdickinson · 2020-05-07T12:15:54Z

Do you know if there's any way to control this?

From lark-parser/lark#278, it looks as though the answer for now is "not really, no".

mdickinson · 2020-05-07T12:38:11Z

From lark-parser/lark#278, it looks as though the answer for now is "not really, no".

Hmm. But that issue's closed, and apparently fixed. I'm not sure what's going on here.

codecov-io · 2020-05-07T12:41:40Z

Codecov Report

Merging #1066 into master will decrease coverage by 3.34%.
The diff coverage is 56.78%.

@@            Coverage Diff             @@
##           master    #1066      +/-   ##
==========================================
- Coverage   76.15%   72.80%   -3.35%     
==========================================
  Files          54       63       +9     
  Lines        6493     8036    +1543     
  Branches     1263     1538     +275     
==========================================
+ Hits         4945     5851     +906     
- Misses       1205     1806     +601     
- Partials      343      379      +36

Impacted Files	Coverage Δ
traits/api.py	`100.00% <ø> (+9.67%)`	⬆️
traits/base_trait_handler.py	`61.76% <ø> (ø)`
traits/ctrait.py	`71.07% <ø> (ø)`
traits/has_traits.py	`72.40% <ø> (-0.37%)`	⬇️
traits/observers/_i_observable.py	`0.00% <0.00%> (ø)`
traits/traits.py	`75.10% <ø> (-2.45%)`	⬇️
traits/util/resource.py	`15.25% <ø> (ø)`
traits/observers/_generated_parser.py	`51.96% <51.96%> (ø)`
traits/trait_types.py	`72.15% <80.00%> (-0.31%)`	⬇️
traits/observers/_trait_event_notifier.py	`96.61% <96.61%> (ø)`
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9f101c...07b2f84. Read the comment docs.

kitchoi · 2020-05-07T12:55:19Z

It appears that the lark output can differ from run to run: if I do python etstool.py generate-parser, I get a big diff in the DATA and MEMO constants from the existing _generated_parser.py.

I can reproduce the different outputs with lark 0.8, Python 3.8 (where dictionaries are ordered), even with PYTHONHASHSEED set to a fixed value. No idea why.

kitchoi · 2020-05-07T13:46:51Z

Merging. It is not clear if there is a trivial way to get the output consistent and it does not seem important enough to block the parser from going in...

mdickinson · 2020-05-07T14:11:11Z

It is not clear if there is a trivial way to get the output consistent and it does not seem important enough to block the parser from going in...

Agreed, but please could you open an issue? We may want to report upstream too, but a local issue would be the first step for that.

kitchoi · 2020-05-07T14:33:38Z

I typed and removed ("will open an issue upstream") because I did not want to promise that - I will need to work out a reproducible example first.

mdickinson · 2020-05-07T16:05:38Z

Yep, just a local issue would be fine; that issue will act as a trigger for us to send an upstream issue when we're ready.

kitchoi · 2020-05-08T12:49:18Z

Opened lark-parser/lark#584

I think lark-parser/lark#278 is about the parsed syntax tree showing nondeterministic behaviour.

Here the nondeterministic behaviour is about the generated code not being static, creating noisy diff despite using the same grammar.

mdickinson · 2020-05-11T08:38:26Z

@kitchoi Thank you!

One option we might consider is not tracking the parser code in version control, but instead generating it at package install time (i.e., as a result of running the setup script). I suspect that that would be more annoying than helpful, but there are tradeoffs either way.

kitchoi added 5 commits May 5, 2020 11:10

Add grammar source file for the mini-language

1b439e6

Add tests for parsing valid and invalid text

192db1f

Set up tool for generating the parser, then generate the parser

b08296e

Add a comment in the grammar file.

a6bf595

Add tests for metadata

ee7406f

corranwebster suggested changes May 7, 2020

View reviewed changes

Add the grammar file to source distribution

4e7f2ef

kitchoi requested a review from corranwebster May 7, 2020 09:15

corranwebster approved these changes May 7, 2020

View reviewed changes

Add a short description to the grammar file

1670573

kitchoi commented May 7, 2020

View reviewed changes

mdickinson reviewed May 7, 2020

View reviewed changes

kitchoi added 2 commits May 7, 2020 13:29

Add the missing encoding for output file

3ee33e4

Fix filename of etstool.py in the command

07b2f84

kitchoi merged commit 36d3d5d into master May 7, 2020

kitchoi deleted the 977-mini-language-parser branch May 7, 2020 13:47

kitchoi mentioned this pull request May 8, 2020

Implement new framework to observe and notify for trait changes #977

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mini-language standalone parser for observe #1066

Add mini-language standalone parser for observe #1066

kitchoi commented May 5, 2020

kitchoi commented May 5, 2020

corranwebster left a comment •

edited

corranwebster May 7, 2020

kitchoi May 7, 2020

corranwebster left a comment

kitchoi May 7, 2020

mdickinson May 7, 2020

mdickinson May 7, 2020

kitchoi May 7, 2020

mdickinson May 7, 2020

mdickinson May 7, 2020

kitchoi May 7, 2020

mdickinson commented May 7, 2020

mdickinson commented May 7, 2020

mdickinson commented May 7, 2020

codecov-io commented May 7, 2020

kitchoi commented May 7, 2020

kitchoi commented May 7, 2020

mdickinson commented May 7, 2020

kitchoi commented May 7, 2020

mdickinson commented May 7, 2020

kitchoi commented May 8, 2020

mdickinson commented May 11, 2020

	def _handle_trait(trees, default_notifies):
	""" Handle an element for a named trait.

	Parameters
	----------
	trees : list of lark.tree.Tree
	The children tree for the "trait" rule.
	It contains only one item.
	default_notifies : list of boolean
	The notify flag stack.

	Returns
	-------
	expression : traits.observers.expressions.Expression
	"""
	token, = trees
	# sanity check
	if token.type != _NAME_TOKEN:
	raise ValueError("Unexpected token: {!r}".format(token))
	name = token.value
	notify = default_notifies[-1]
	return _expr_module.trait(name, notify=notify)

		@@ -0,0 +1,54 @@
		// Grammar for Traits Mini Language used in observe

Add mini-language standalone parser for observe #1066

Add mini-language standalone parser for observe #1066

Conversation

kitchoi commented May 5, 2020

kitchoi commented May 5, 2020

corranwebster left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

corranwebster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdickinson commented May 7, 2020

mdickinson commented May 7, 2020

mdickinson commented May 7, 2020

codecov-io commented May 7, 2020

Codecov Report

kitchoi commented May 7, 2020

kitchoi commented May 7, 2020

mdickinson commented May 7, 2020

kitchoi commented May 7, 2020

mdickinson commented May 7, 2020

kitchoi commented May 8, 2020

mdickinson commented May 11, 2020

corranwebster left a comment •

edited