Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re.match raises MemoryError #56386

Closed
EungJunYi mannequin opened this issue May 25, 2011 · 6 comments
Closed

re.match raises MemoryError #56386

EungJunYi mannequin opened this issue May 25, 2011 · 6 comments
Labels
performance Performance or resource usage topic-regex

Comments

@EungJunYi
Copy link
Mannequin

EungJunYi mannequin commented May 25, 2011

BPO 12177
Nosy @pitrou, @ezio-melotti, @skrah, @serhiy-storchaka
Superseder
  • bpo-9669: regexp: zero-width matches in MIN_UNTIL
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-02-05.17:31:08.714>
    created_at = <Date 2011-05-25.18:04:24.523>
    labels = ['expert-regex', 'performance']
    title = 're.match raises MemoryError'
    updated_at = <Date 2013-02-05.17:31:08.712>
    user = 'https://bugs.python.org/EungJunYi'

    bugs.python.org fields:

    activity = <Date 2013-02-05.17:31:08.712>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-02-05.17:31:08.714>
    closer = 'serhiy.storchaka'
    components = ['Regular Expressions']
    creation = <Date 2011-05-25.18:04:24.523>
    creator = 'EungJun.Yi'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 12177
    keywords = []
    message_count = 6.0
    messages = ['136880', '136906', '136913', '136929', '137147', '181464']
    nosy_count = 7.0
    nosy_names = ['pitrou', 'ezio.melotti', 'mrabarnett', 'skrah', 'EungJun.Yi', 'Matthew.Boehm', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '9669'
    type = 'resource usage'
    url = 'https://bugs.python.org/issue12177'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2', 'Python 3.3']

    @EungJunYi
    Copy link
    Mannequin Author

    EungJunYi mannequin commented May 25, 2011

    re.match raises MemoryError when trying to match r'()+?1' to 'a1', as shown below.

    ~$ python
    Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) 
    [GCC 4.5.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> re.match(r'()+?1', 'a1')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/re.py", line 137, in match
        return _compile(pattern, flags).match(string)
    MemoryError
    >>>
    
    ~$ python3
    Python 3.2 (r32:88445, Mar 25 2011, 19:28:28) 
    [GCC 4.5.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> re.match(r'()+?1', 'a1')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.2/re.py", line 153, in match
        return _compile(pattern, flags).match(string)
    MemoryError
    >>>

    @EungJunYi EungJunYi mannequin added topic-regex performance Performance or resource usage labels May 25, 2011
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 25, 2011

    Confirmed. The test case quickly uses 8GB of memory.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented May 25, 2011

    This also raises MemoryError:

    re.match(r'()*?1', 'a1')
    

    but none of these do:

    re.match(r'()+1', 'a1')
    re.match(r'()*1', 'a1')
    

    @EungJunYi
    Copy link
    Mannequin Author

    EungJunYi mannequin commented May 26, 2011

    This also raises in 2.6.5

    Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
    [GCC 4.4.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> re.match('()+?1', 'a1')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.6/re.py", line 137, in match
        return _compile(pattern, flags).match(string)
    MemoryError

    @MatthewBoehm
    Copy link
    Mannequin

    MatthewBoehm mannequin commented May 28, 2011

    Here are some windows results with Python 2.7:

    >>> import re
    >>> re.match("()*?1", "1")
    <_sre.SRE_Match object at 0x025C0E60>
    >>> re.match("()+?1", "1")
    >>> re.match("()+?1", "11")
    <_sre.SRE_Match object at 0x025C0E60>
    >>> re.match("()*?1", "11")
    <_sre.SRE_Match object at 0x025C3C60>
    <_sre.SRE_Match object at 0x025C3C60>
    >>> re.match("()*?1", "a1")
    
    Traceback (most recent call last):
      File "<pyshell#12>", line 1, in <module>
        re.match("()*?1", "a1")
      File "C:\Python27\lib\re.py", line 137, in match
        return _compile(pattern, flags).match(string)
    MemoryError
    >>> re.match("()+?1", "a1")
    
    Traceback (most recent call last):
      File "<pyshell#13>", line 1, in <module>
        re.match("()+?1", "a1")
      File "C:\Python27\lib\re.py", line 137, in match
        return _compile(pattern, flags).match(string)
    MemoryError

    Note that when matching to a string starting with "1", the matcher will not throw a MemoryError.

    @serhiy-storchaka
    Copy link
    Member

    This is a duplicate of bpo-9669.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage topic-regex
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant