Run parser twice; enable invalid_* rules only on the second run #86289

lysnikolaou · 2020-10-22T23:35:43Z

BPO	42123
Nosy	@gvanrossum, @terryjreedy, @lysnikolaou, @pablogsal
PRs	bpo-42123: Run the parser two times and only enable invalid rules on the second run #22111 [3.9] bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111) #23011

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/lysnikolaou'
closed_at = <Date 2020-10-26.22:42:38.482>
created_at = <Date 2020-10-22.23:35:43.266>
labels = ['interpreter-core', '3.10', 'performance']
title = 'Run parser twice; enable invalid_* rules only on the second run'
updated_at = <Date 2020-10-28.00:14:18.810>
user = 'https://github.com/lysnikolaou'

bugs.python.org fields:

activity = <Date 2020-10-28.00:14:18.810>
actor = 'lys.nikolaou'
assignee = 'lys.nikolaou'
closed = True
closed_date = <Date 2020-10-26.22:42:38.482>
closer = 'lys.nikolaou'
components = ['Interpreter Core']
creation = <Date 2020-10-22.23:35:43.266>
creator = 'lys.nikolaou'
dependencies = []
files = []
hgrepos = []
issue_num = 42123
keywords = ['patch']
message_count = 5.0
messages = ['379384', '379508', '379697', '379698', '379811']
nosy_count = 4.0
nosy_names = ['gvanrossum', 'terry.reedy', 'lys.nikolaou', 'pablogsal']
pr_nums = ['22111', '23011']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue42123'
versions = ['Python 3.10']

lysnikolaou · 2020-10-22T23:35:43Z

We can avoid having to go through all the invalid rules (which might be a significant performance boost, since these may call expensive rules like primary or others), if we run the parser two times.

On the first run, all the invalid rules are disabled and do not get expanded. If a parse failure occurs anywhere, then we run the parser a second time with all these rules enabled, in order to get the correct error message.

Some benchmarking by Pablo show a ~4% speedup in the stdlib benchmark and a ~10% in the xxl benchmark.

terryjreedy · 2020-10-24T01:17:29Z

Since I do a lot of interactive compiling, I appreciate faster feedback. How much will the slowdown be on errors?

lysnikolaou · 2020-10-26T22:41:15Z

We do not have a big corpus of SyntaxErrors to test against, but some manual testing of running a file with a SyntaxError after a long complex line 1000 times shows no slowdown.

We keep the token stream for the second run, so we don't need to run the tokenizer all over again and the parsing is done much more quickly.

lysnikolaou · 2020-10-26T22:42:11Z

New changeset bca7014 by Lysandros Nikolaou in branch 'master':
bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111)
bca7014

lysnikolaou · 2020-10-28T00:14:18Z

New changeset 24a7c29 by Lysandros Nikolaou in branch '3.9':
[3.9] bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111) (GH-23011)
24a7c29

lysnikolaou added the 3.10 only security fixes label Oct 22, 2020

lysnikolaou self-assigned this Oct 22, 2020

lysnikolaou added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage 3.10 only security fixes labels Oct 22, 2020

lysnikolaou self-assigned this Oct 22, 2020

lysnikolaou added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 22, 2020

terryjreedy changed the title ~~Run the two times, only enable invalid_* rules on the second run~~ Run parser twice; enable invalid_* rules only on the second run Oct 24, 2020

lysnikolaou closed this as completed Oct 26, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run parser twice; enable invalid_* rules only on the second run #86289

Run parser twice; enable invalid_* rules only on the second run #86289

lysnikolaou commented Oct 22, 2020

lysnikolaou commented Oct 22, 2020

terryjreedy commented Oct 24, 2020

lysnikolaou commented Oct 26, 2020

lysnikolaou commented Oct 26, 2020

lysnikolaou commented Oct 28, 2020

Navigation Menu

Run parser twice; enable invalid_* rules only on the second run #86289

Run parser twice; enable invalid_* rules only on the second run #86289

Comments

lysnikolaou commented Oct 22, 2020

lysnikolaou commented Oct 22, 2020

terryjreedy commented Oct 24, 2020

lysnikolaou commented Oct 26, 2020

lysnikolaou commented Oct 26, 2020

lysnikolaou commented Oct 28, 2020