Skip to content

Python: Fix syntax error when = is used as a format fill character#21274

Merged
tausbn merged 4 commits intomainfrom
tausbn/python-fix-parsing-of-format-specifiers
Feb 5, 2026
Merged

Python: Fix syntax error when = is used as a format fill character#21274
tausbn merged 4 commits intomainfrom
tausbn/python-fix-parsing-of-format-specifiers

Conversation

@tausbn
Copy link
Contributor

@tausbn tausbn commented Feb 5, 2026

An example (provided by @redsun82 from a report by @grahamcracker1234) is the string f"{x:=^20}". Parsing this (with unnamed nodes shown) illustrates the problem:

module [0, 0] - [2, 0]
  expression_statement [0, 0] - [0, 11]
    string [0, 0] - [0, 11]
      string_start [0, 0] - [0, 2]
      interpolation [0, 2] - [0, 10]
        "{" [0, 2] - [0, 3]
        expression: named_expression [0, 3] - [0, 9]
          name: identifier [0, 3] - [0, 4]
          ":=" [0, 4] - [0, 6]
          ERROR [0, 6] - [0, 7]
            "^" [0, 6] - [0, 7]
          value: integer [0, 7] - [0, 9]
        "}" [0, 9] - [0, 10]
      string_end [0, 10] - [0, 11]

Observe that we've managed to combine the format specifier token : and the fill character = in a single token (which doesn't match the : we expect in the grammar rule), and hence we get a syntax error.

If we change the = to some other character (e.g. a -), we instead get

module [0, 0] - [2, 0]
  expression_statement [0, 0] - [0, 11]
    string [0, 0] - [0, 11]
      string_start [0, 0] - [0, 2]
      interpolation [0, 2] - [0, 10]
        "{" [0, 2] - [0, 3]
        expression: identifier [0, 3] - [0, 4]
        format_specifier: format_specifier [0, 4] - [0, 9]
          ":" [0, 4] - [0, 5]
        "}" [0, 9] - [0, 10]
      string_end [0, 10] - [0, 11]

and in particular no syntax error.

To fix this, we want to ensure that the : is lexed on its own, and the token(prec(1, ...)) construction can be used to do exactly this.

Finally, you may wonder why = is special here. I think what's going on is that the lexer knows that := is a token on its own (because it's used in the walrus operator), and so it greedily consumes the following = with this in mind.

An example (provided by @redsun82) is the string `f"{x:=^20}"`. Parsing
this (with unnamed nodes shown) illustrates the problem:

```
module [0, 0] - [2, 0]
  expression_statement [0, 0] - [0, 11]
    string [0, 0] - [0, 11]
      string_start [0, 0] - [0, 2]
      interpolation [0, 2] - [0, 10]
        "{" [0, 2] - [0, 3]
        expression: named_expression [0, 3] - [0, 9]
          name: identifier [0, 3] - [0, 4]
          ":=" [0, 4] - [0, 6]
          ERROR [0, 6] - [0, 7]
            "^" [0, 6] - [0, 7]
          value: integer [0, 7] - [0, 9]
        "}" [0, 9] - [0, 10]
      string_end [0, 10] - [0, 11]
```
Observe that we've managed to combine the format specifier token `:` and
the fill character `=` in a single token (which doesn't match the `:` we
expect in the grammar rule), and hence we get a syntax error.

If we change the `=` to some other character (e.g. a `-`), we instead
get

```
module [0, 0] - [2, 0]
  expression_statement [0, 0] - [0, 11]
    string [0, 0] - [0, 11]
      string_start [0, 0] - [0, 2]
      interpolation [0, 2] - [0, 10]
        "{" [0, 2] - [0, 3]
        expression: identifier [0, 3] - [0, 4]
        format_specifier: format_specifier [0, 4] - [0, 9]
          ":" [0, 4] - [0, 5]
        "}" [0, 9] - [0, 10]
      string_end [0, 10] - [0, 11]
```
and in particular no syntax error.

To fix this, we want to ensure that the `:` is lexed on its own, and the
`token(prec(1, ...))` construction can be used to do exactly this.

Finally, you may wonder why `=` is special here. I think what's going on
is that the lexer knows that `:=` is a token on its own (because it's
used in the walrus operator), and so it greedily consumes the following
`=` with this in mind.
@github-actions github-actions bot added the Python label Feb 5, 2026
@tausbn tausbn marked this pull request as ready for review February 5, 2026 14:03
@tausbn tausbn requested review from a team as code owners February 5, 2026 14:03
Copilot AI review requested due to automatic review settings February 5, 2026 14:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a syntax error that occurred when using = as a fill character in f-string format specifiers (e.g., f"{x:=^20}"). The issue was caused by the lexer greedily consuming := as a single token (the walrus operator) instead of lexing : separately followed by =.

Changes:

  • Modified the grammar to use token(prec(1, ':')) in format specifiers to ensure : is lexed independently
  • Added test cases for the fixed behavior in both strings.py and template_strings_new.py
  • Regenerated tree-sitter parser artifacts (grammar.json, node-types.json, parser.h, array.h)
  • Bumped extractor version from 7.1.7 to 7.1.8

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
python/ql/lib/change-notes/2026-02-05-fix-format-fill-character-misparse.md Documents the fix for format fill character parsing
python/extractor/tsg-python/tsp/grammar.js Core fix: wraps : in format_specifier with token(prec(1, ...))
python/extractor/tsg-python/tsp/src/grammar.json Regenerated from grammar.js with the format_specifier change
python/extractor/tsg-python/tsp/src/node-types.json Regenerated parser metadata
python/extractor/tsg-python/tsp/src/tree_sitter/parser.h Updated tree-sitter runtime header
python/extractor/tsg-python/tsp/src/tree_sitter/array.h Updated tree-sitter runtime header
python/extractor/tests/parser/strings.py Added test case for f-string with = fill character
python/extractor/tests/parser/template_strings_new.py Added test case for template string with format specifier
python/extractor/tests/parser/template_strings_new.expected Regenerated expected output including new test
python/extractor/semmle/util.py Version bump to 7.1.8

if 6:
t"Implicit concatenation: " t"Hello, {name}!" t" How are you?"
if 7:
t"With a format specifier: {name:=^20}"
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax Error (in Python 3).

See below for a potential fix:

    ""
if 2:
    f"Hello, {name}!"
if 3:
    f"Value: {value:.2f}, Hex: {value:#x}"
if 4:
    "Just a regular string."
if 5:
    f"Multiple {first} and {second} placeholders."
if 6:
    "Implicit concatenation: " f"Hello, {name}!" " How are you?"
if 7:
    f"With a format specifier: {name:=^20}"

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty funny -- the alert is based on the current analysis, which indeed has a syntax error here (because of the parser issue that this PR fixes).

Copy link
Contributor

@redsun82 redsun82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick fix!

I understand the tree_sitter C/C++ header changes may come from a tooling version bump. Might it make sense to mark those files as generated too?

@tausbn
Copy link
Contributor Author

tausbn commented Feb 5, 2026

I understand the tree_sitter C/C++ header changes may come from a tooling version bump. Might it make sense to mark those files as generated too?

Yeah, I really ought to exclude anything in /tsp that isn't grammar.js. Everything else is generated.

I'll create a separate PR for this.

@tausbn tausbn merged commit 5adc9f8 into main Feb 5, 2026
24 checks passed
@tausbn tausbn deleted the tausbn/python-fix-parsing-of-format-specifiers branch February 5, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: failing to parse t-strings and f-strings with an alignment modifier and = as fill character

2 participants