Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python re bug #81508

Closed
aixianle mannequin opened this issue Jun 18, 2019 · 5 comments
Closed

python re bug #81508

aixianle mannequin opened this issue Jun 18, 2019 · 5 comments
Labels
3.7 (EOL) end of life topic-regex type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@aixianle
Copy link
Mannequin

aixianle mannequin commented Jun 18, 2019

BPO 37327
Nosy @ezio-melotti, @aldwinaldwin

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-06-18.10:19:49.677>
created_at = <Date 2019-06-18.06:17:11.564>
labels = ['expert-regex', '3.7', 'invalid', 'type-crash']
title = 'python re bug'
updated_at = <Date 2019-06-18.10:19:49.674>
user = 'https://bugs.python.org/aixianle'

bugs.python.org fields:

activity = <Date 2019-06-18.10:19:49.674>
actor = 'mrabarnett'
assignee = 'none'
closed = True
closed_date = <Date 2019-06-18.10:19:49.677>
closer = 'mrabarnett'
components = ['Regular Expressions']
creation = <Date 2019-06-18.06:17:11.564>
creator = 'aixian le'
dependencies = []
files = []
hgrepos = []
issue_num = 37327
keywords = []
message_count = 5.0
messages = ['345953', '345955', '345957', '345958', '345978']
nosy_count = 4.0
nosy_names = ['ezio.melotti', 'mrabarnett', 'aldwinaldwin', 'aixian le']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue37327'
versions = ['Python 3.6', 'Python 3.7']

@aixianle
Copy link
Mannequin Author

aixianle mannequin commented Jun 18, 2019

the code is:
banner = "HTTP/1.0 404 Not Found\r\nDate: Mon, 17 Jun 2019 13:15:44 GMT\r\nServer: \r\nConnection: close\r\nContent-Type: text/html\r\n\r\n<HTML><HEAD><TITLE>404 Not Found</TITLE></HEAD>\r\n<BODY><H1>404 Not Found</H1>\r\nThe requested URL /PSIA/index was not found on this server.\r\n</BODY></HTML>\r\n"
regex = "^HTTP/1\\.0 404 Not Found\\r\\n(?:[^\<]+|<(?!/head>))*?<style>"
print("start")
regex_re = re.compile(regex)
print("start1")
regex_re.search(banner)
print("end")
when I execute this code ,python cannot finished.

@aixianle aixianle mannequin added topic-regex type-crash A hard crash of the interpreter, possibly with a core dump labels Jun 18, 2019
@aixianle
Copy link
Mannequin Author

aixianle mannequin commented Jun 18, 2019

the code is:
banner = "HTTP/1.0 404 Not Found\r\nDate: Mon, 17 Jun 2019 13:15:44 GMT\r\nServer: \r\nConnection: close\r\nContent-Type: text/html\r\n\r\n<HTML><HEAD><TITLE>404 Not Found</TITLE></HEAD>\r\n<BODY><H1>404 Not Found</H1>\r\nThe requested URL /PSIA/index was not found on this server.\r\n</BODY></HTML>\r\n"
regex = "^HTTP/1\\.0 404 Not Found\\r\\n(?:[^\<]+|<(?!/head>))*?<style>"
print("start")
regex_re = re.compile(regex)
print("start1")
regex_re.search(banner)
print("end")
when I execute this code ,python cannot finished.

@aixianle aixianle mannequin added the 3.7 (EOL) end of life label Jun 18, 2019
@aldwinaldwin
Copy link
Mannequin

aldwinaldwin mannequin commented Jun 18, 2019

When I run the regex on https://regex101.com/, after some small adjustments ("HTTP\/1\.0" and "\/head"), it mentions 'Catastrophic backtracking has been detected and the execution of your expression has been halted.' I don't know much about regex, but it seems there is some eternal loop or something.

I'd suggest to try to make the regex work first on other regex compiler, before calling it a python bug.

@aldwinaldwin
Copy link
Mannequin

aldwinaldwin mannequin commented Jun 18, 2019

neither the banner contains "<style>"

@mrabarnett
Copy link
Mannequin

mrabarnett mannequin commented Jun 18, 2019

The problem is the "(?:[^\<]+|<(?!/head>))*?".

If I simplify it a little I get "(?:[^\<]+)*?", which is a repeat within a repeat.

There are many ways in which it could match, and if what follows fails to match (it doesn't because there's no "<style>" in the target string, as Aldwin pointed out), it'll try them all, which can take a long time.

@mrabarnett mrabarnett mannequin closed this as completed Jun 18, 2019
@mrabarnett mrabarnett mannequin added the invalid label Jun 18, 2019
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life topic-regex type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

0 participants