Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run parser twice; enable invalid_* rules only on the second run #86289

Closed
lysnikolaou opened this issue Oct 22, 2020 · 5 comments
Closed

Run parser twice; enable invalid_* rules only on the second run #86289

lysnikolaou opened this issue Oct 22, 2020 · 5 comments
Assignees
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@lysnikolaou
Copy link
Contributor

BPO 42123
Nosy @gvanrossum, @terryjreedy, @lysnikolaou, @pablogsal
PRs
  • bpo-42123: Run the parser two times and only enable invalid rules on the second run #22111
  • [3.9] bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111) #23011
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/lysnikolaou'
    closed_at = <Date 2020-10-26.22:42:38.482>
    created_at = <Date 2020-10-22.23:35:43.266>
    labels = ['interpreter-core', '3.10', 'performance']
    title = 'Run parser twice; enable invalid_* rules only on the second run'
    updated_at = <Date 2020-10-28.00:14:18.810>
    user = 'https://github.com/lysnikolaou'

    bugs.python.org fields:

    activity = <Date 2020-10-28.00:14:18.810>
    actor = 'lys.nikolaou'
    assignee = 'lys.nikolaou'
    closed = True
    closed_date = <Date 2020-10-26.22:42:38.482>
    closer = 'lys.nikolaou'
    components = ['Interpreter Core']
    creation = <Date 2020-10-22.23:35:43.266>
    creator = 'lys.nikolaou'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42123
    keywords = ['patch']
    message_count = 5.0
    messages = ['379384', '379508', '379697', '379698', '379811']
    nosy_count = 4.0
    nosy_names = ['gvanrossum', 'terry.reedy', 'lys.nikolaou', 'pablogsal']
    pr_nums = ['22111', '23011']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue42123'
    versions = ['Python 3.10']

    @lysnikolaou
    Copy link
    Contributor Author

    We can avoid having to go through all the invalid rules (which might be a significant performance boost, since these may call expensive rules like primary or others), if we run the parser two times.

    On the first run, all the invalid rules are disabled and do not get expanded. If a parse failure occurs anywhere, then we run the parser a second time with all these rules enabled, in order to get the correct error message.

    Some benchmarking by Pablo show a ~4% speedup in the stdlib benchmark and a ~10% in the xxl benchmark.

    @lysnikolaou lysnikolaou added the 3.10 only security fixes label Oct 22, 2020
    @lysnikolaou lysnikolaou self-assigned this Oct 22, 2020
    @lysnikolaou lysnikolaou added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage 3.10 only security fixes labels Oct 22, 2020
    @lysnikolaou lysnikolaou self-assigned this Oct 22, 2020
    @lysnikolaou lysnikolaou added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 22, 2020
    @terryjreedy
    Copy link
    Member

    Since I do a lot of interactive compiling, I appreciate faster feedback. How much will the slowdown be on errors?

    @terryjreedy terryjreedy changed the title Run the two times, only enable invalid_* rules on the second run Run parser twice; enable invalid_* rules only on the second run Oct 24, 2020
    @terryjreedy terryjreedy changed the title Run the two times, only enable invalid_* rules on the second run Run parser twice; enable invalid_* rules only on the second run Oct 24, 2020
    @lysnikolaou
    Copy link
    Contributor Author

    We do not have a big corpus of SyntaxErrors to test against, but some manual testing of running a file with a SyntaxError after a long complex line 1000 times shows no slowdown.

    We keep the token stream for the second run, so we don't need to run the tokenizer all over again and the parsing is done much more quickly.

    @lysnikolaou
    Copy link
    Contributor Author

    New changeset bca7014 by Lysandros Nikolaou in branch 'master':
    bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111)
    bca7014

    @lysnikolaou
    Copy link
    Contributor Author

    New changeset 24a7c29 by Lysandros Nikolaou in branch '3.9':
    [3.9] bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111) (GH-23011)
    24a7c29

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants