Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search is to long with regex like ^(.+|dontmatch)*$ #42394

Closed
vstinner opened this issue Sep 21, 2005 · 6 comments
Closed

Search is to long with regex like ^(.+|dontmatch)*$ #42394

vstinner opened this issue Sep 21, 2005 · 6 comments

Comments

@vstinner
Copy link
Member

BPO 1297193
Nosy @pitrou, @vstinner, @orivej
Files
  • bug2.py: Test file (bug2.py)
  • bug.c: Test file in C language
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-09-09.11:35:35.865>
    created_at = <Date 2005-09-21.03:09:18.000>
    labels = ['expert-regex']
    title = 'Search is to long with regex like ^(.+|dontmatch)*$'
    updated_at = <Date 2008-09-09.11:35:35.863>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2008-09-09.11:35:35.863>
    actor = 'pitrou'
    assignee = 'niemeyer'
    closed = True
    closed_date = <Date 2008-09-09.11:35:35.865>
    closer = 'pitrou'
    components = ['Regular Expressions']
    creation = <Date 2005-09-21.03:09:18.000>
    creator = 'vstinner'
    dependencies = []
    files = ['1785', '1786']
    hgrepos = []
    issue_num = 1297193
    keywords = []
    message_count = 6.0
    messages = ['26345', '26346', '26347', '26348', '72846', '72848']
    nosy_count = 5.0
    nosy_names = ['niemeyer', 'pitrou', 'vstinner', 'rsc', 'orivej']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1297193'
    versions = ['Python 2.4']

    @vstinner
    Copy link
    Member Author

    Hi,

    I don't know if it's a bug or a feature, but Python
    looks very slow to process the regex ^(.+|D)*A$ for
    string like "xxxxxxB" (add x to run python slower).

    I found this bug when a try to match strings like "abc"
    or "a\"b" or "\"abc" or "" with the regex:

    >\"((?:[^\\"]+|\\\")*)\"<<.

    I know that ".+" construction in not ... optimal, but
    Python should optimize regex like (.+|something)
    .

    Haypo

    @vstinner
    Copy link
    Member Author

    Logged In: YES
    user_id=365388

    I try the same regex with Linux Regex library, and it looks
    fast (but, can I compare C with Python ?) and not "buggy".

    See bug.c file, and type gcc -o bug bug.c && time ./bug (I
    get 0.000s for user ...).

    Victor

    @vstinner
    Copy link
    Member Author

    Logged In: YES
    user_id=365388

    The "bug" only occurs when the regex doesn't match, else
    it's very fast to match the string.

    Haypo

    @vstinner
    Copy link
    Member Author

    Logged In: YES
    user_id=365388

    First fix: (a+|b)* can be replaced by (a|b). Other
    optimizations : (a|b+)
    and (a+|b+) can be replaced by (a|b)*.

    For the case (a*|b)+ ... Wait, I'm thinking on the problem
    :-) Hum, looks like (?:(a|b)+)?. Im' not sure.

    This optimizatons should be done in compile() function.

    Haypo

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 9, 2008

    It's not a bug, it's a feature. I wrote a library
    (http://hachoir.org/wiki/hachoir-regex) to "optimize" regex, so you
    can close this issue.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 9, 2008

    Fair enough :)

    @pitrou pitrou closed this as completed Sep 9, 2008
    @pitrou pitrou closed this as completed Sep 9, 2008
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants