-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Issue Description
While preparing a PR for PEP 723 support in pip, I noticed that the reference parser defined by the PEP and listed in the PyPA docs will collate multiple adjacent /// TYPE blocks as a single match, even when separated by a comment line (the spec refers to it as a "content line"). This greedy collation is surprising and makes distinguishing error cases a little complicated, so I think it merits a warning in the docs if it is not possible to update the specification itself.
I believe this quirk is caused by the last + in the reference regex being greedy and matching all the way to the trailing /// instead of to the first available one. In my limited experimentation, replacing this quantifier with +? resolves the issue, producing the expected number of matches.
This shouldn't slip through anybody's code unnoticed, as the collation will produce invalid TOML (the interior /// is invalid syntax), but it is a surprising enough edge case that I thought to report it here.
click for code
import re
script_A = """
# /// script
# data (1)
# ///
#
# /// script
# data (2)
# ///
"""
script_B = """
# /// script
# data (1)
# ///
# /// script
# data (2)
# ///
"""
# These lines adapted from PEP 723's reference parser:
# https://peps.python.org/pep-0723/#reference-implementation
REGEX = r"(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$"
name = "script"
matches_A = list(
filter(lambda m: m.group("type") == name, re.finditer(REGEX, script_A))
)
matches_B = list(
filter(lambda m: m.group("type") == name, re.finditer(REGEX, script_B))
)
# output:
# 1
# 2
print(len(matches_A))
print(len(matches_B))Code of Conduct
- I am aware that participants in this repository must follow the PSF Code of Conduct.