Skip to content

In IDLE colorizer, replace re with tokenizer #140347

@terryjreedy

Description

@terryjreedy

The current IDLE colorizer was designed when the python grammar was restricted to a context-free subtype. Parsing with RE's was sufficient or closely so to pick out substrings to be color tagged. The current grammar's sometime context-dependence make this more difficult or even impossible.

A few versions ago, the new C tokenizer used to compile Python code was exposed as a function in the tokenizer module. It replaced a python-coded tokenizer that did not always match the old C tokenizer and that must have been much slower. I assume that is was not seen as suitable for IDLE colorizer. However, the current exposed C tokenizer is being used to power the new PyREPL colorizer. The initial (draft) patch copies portions of that colorizer.

My main concerns are compatibility and speed. Some initial questions:

  1. The syntax categories in the REPL colorizer are not the same as IDLE's (listed on the Highlights settings page). What are they?
  2. IDLE colorizes 10000 lines editor contents as well as single interactive lines. Will the PR colorizer do the same, with similar speed?
  3. Is REPL colorizer stable? Are there open bug reports? (Should be labelled topic-repl.) The code is private and undocumented, I presume intentionally, and I do not expect that issues only relevant to IDLE would be welcome.
  4. In the IDLE Issues project, colorizer issues are in Highlights section. Does this fix any, or appear to make fixes easier?

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-IDLEtype-featureA feature request or enhancement

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions