Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-98401: Invalid escape sequences emits SyntaxWarning #99011

Merged
merged 2 commits into from Nov 3, 2022
Merged

gh-98401: Invalid escape sequences emits SyntaxWarning #99011

merged 2 commits into from Nov 3, 2022

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Nov 2, 2022

A backslash-character pair that is not a valid escape sequence now generates a SyntaxWarning, instead of DeprecationWarning. For example, re.compile("\d+.\d+") now emits a SyntaxWarning ("\d" is an invalid escape sequence), use raw strings for regular expression: re.compile(r"\d+.\d+"). In a future Python version, SyntaxError will eventually be raised, instead of SyntaxWarning.

Octal escapes with value larger than 0o377 (ex: "\477"), deprecated in Python 3.11, now produce a SyntaxWarning, instead of DeprecationWarning. In a future Python version they will be eventually a SyntaxError.

codecs.escape_decode() and codecs.unicode_escape_decode() are left unchanged: they still emit DeprecationWarning.

  • The parser only emits SyntaxWarning for Python 3.12 (feature version), and still emits DeprecationWarning on older Python versions.
  • Fix SyntaxWarning by using raw strings in Tools/c-analyzer/ and wasm_build.py.

Parser/string_parser.c Outdated Show resolved Hide resolved
@vstinner
Copy link
Member Author

vstinner commented Nov 2, 2022

Previous attempt in 2018:

  • Commit: 6543912
  • Serhiy only changed the Python parser, not codecs: rationale
  • The new SyntaxWarning was seen in multiple projects: docutils (fixed), bottle, SymPy (bug in a docstring)
  • SyntaxWarning are only emitted once, and only when a Python file is parsed: when a cached PYC file is used, not warning is emitted
  • Serhiy reverted his change

@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2022

I tested the Python test suite with SyntaxWarning treated as error: it does pass.

$ PYTHONWARNINGS=error::SyntaxWarning ./python -W error::SyntaxWarning -m test -j0 -r -u all,-gui

@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2022

This issue mostly hit code defining regular expressions. Example in BeautifulSoup 3.2.2:

$ pycodestyle --select W605 $(find -name "*.py")
./BeautifulSoup.py:455:52: W605 invalid escape sequence '\d'
./BeautifulSoup.py:455:73: W605 invalid escape sequence '\w'

./BeautifulSoup.py:574:47: W605 invalid escape sequence '\d'
./BeautifulSoup.py:574:66: W605 invalid escape sequence '\w'

./BeautifulSoup.py:1080:38: W605 invalid escape sequence '\s'

./BeautifulSoup.py:1589:36: W605 invalid escape sequence '\s'

./BeautifulSoup.py:1933:16: W605 invalid escape sequence '\?'
./BeautifulSoup.py:1933:46: W605 invalid escape sequence '\?'
./BeautifulSoup.py:1935:35: W605 invalid escape sequence '\s'

Examples of code, re.compile() calls:

    BARE_AMPERSAND_OR_BRACKET = re.compile("([<>]|"
                                           + "&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;)"
                                           + ")")
(...)
                      (re.compile('<!\s+([^<>]*)>'),
(...)
        xml_encoding_match = re.compile(
            '^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data)

Doc/reference/lexical_analysis.rst Outdated Show resolved Hide resolved
for i in range(97, 123):
b = bytes([i])
if b not in b'abfnrtvx':
with self.assertWarns(DeprecationWarning):
with self.assertWarns(SyntaxWarning):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyntaxWarning is not related to codecs. It only should be emitted by the compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which warning should be emitted if SyntaxWarning is not the best choice? UnicodeWarning?

Does UnicodeWarning make sense for codecs.escape_decode()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeprecationWarning, as for all other deprecated features.

Parser/string_parser.c Outdated Show resolved Hide resolved
@serhiy-storchaka
Copy link
Member

I tested the Python test suite with SyntaxWarning treated as error: it does pass.

Did you remove all pyc files and regenerate frozen modules before tests? Also try to regenerate generated code using the new Python binary as PYTHON_FOR_REGEN.

@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2022

Did you remove all pyc files and regenerate frozen modules before tests? Also try to regenerate generated code using the new Python binary as PYTHON_FOR_REGEN.

I ran PYTHONWARNINGS=error::SyntaxWarning make regen-all PYTHON_FOR_REGEN=./python: it works as expected.

@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2022

@serhiy-storchaka:

Perhaps SyntaxWarning could be raised for invalid octals too. It is a more obvious error, and less common, so it may have lesser deprecation period.

Ok, I updated my PR: Use SyntaxWarning for invalid octal sequence.

A backslash-character pair that is not a valid escape sequence now
generates a SyntaxWarning, instead of DeprecationWarning.  For
example, re.compile("\d+\.\d+") now emits a SyntaxWarning ("\d" is an
invalid escape sequence), use raw strings for regular expression:
re.compile(r"\d+\.\d+"). In a future Python version, SyntaxError will
eventually be raised, instead of SyntaxWarning.

Octal escapes with value larger than 0o377 (ex: "\477"), deprecated
in Python 3.11, now produce a SyntaxWarning, instead of
DeprecationWarning. In a future Python version they will be
eventually a SyntaxError.

codecs.escape_decode() and codecs.unicode_escape_decode() are left
unchanged: they still emit DeprecationWarning.

* The parser only emits SyntaxWarning for Python 3.12 (feature
  version), and still emits DeprecationWarning on older Python
  versions.
* Fix SyntaxWarning by using raw strings in Tools/c-analyzer/ and
  wasm_build.py.
@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2022

Hum, I messed up my PR so I squashed two commits and I fixed my PR:

  • invalid octal sequence and invalid escape sequence now both emit SyntaxWarning (and will emit SyntaxError in the future)
  • codecs.escape_decode() and codecs.unicode_escape_decode() are left unchanged

I also mentioned the future convertion to SyntaxError in the doc (What's New / NEWS entries).

@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2022

@mdickinson @serhiy-storchaka @hugovk: Would you mind to review my PR?

@serhiy-storchaka
Copy link
Member

Did you remove all pyc files?

find -name '*.py[co]' -exec rm -rf '{}' +

@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2022

Let me try these commands:

export PYTHONWARNINGS=error::SyntaxWarning 
git clean -fdx
./configure --with-pydebug CFLAGS=-O0
make
make regen-all PYTHON_FOR_REGEN=./python
make

./python -c 's = "\477"'

The last command fails as expected with:

  File "<string>", line 1
    s = "\477"
        ^^^^^^
SyntaxError: invalid octal escape sequence '\477'

@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2022

Oops, test_string_literals didn't work when run with ./python -W error::SyntaxWarning: it's now fixed.

I ran the test suite with:

export PYTHONWARNINGS=error::SyntaxWarning 
./python -W error::SyntaxWarning -m test -j0 -r -u all,-gui 

Note: I would prefer to run the whole test suite with -Werror, but it fails even without this PR :-( Python currently emits many deprecation warnings for example :-(

@vstinner vstinner merged commit a60ddd3 into python:main Nov 3, 2022
@vstinner vstinner deleted the syntax_warning branch November 3, 2022 16:53
markmentovai added a commit to markmentovai/bits that referenced this pull request Jan 21, 2024
Fixes this warning experienced with Python 3.12
(python/cpython#98401,
python/cpython#99011):

faa_cs_aan.py:690: SyntaxWarning: invalid escape sequence '\.'
  email_match = re.search('For Inquiries: ([0-9a-z._-]+@[0-9a-z.-]+)\.?$',
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants