Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex URL converter matches incorrect value #2481

Closed
tunetheweb opened this issue Aug 1, 2022 · 22 comments
Closed

regex URL converter matches incorrect value #2481

tunetheweb opened this issue Aug 1, 2022 · 22 comments
Assignees

Comments

@tunetheweb
Copy link

tunetheweb commented Aug 1, 2022

I have the following code in a Flask App using Werkzeug

# Redirect requests for the older image URLs to new URLs
@app.route(
    '/static/images/2019/<regex("(privacy|jamstack|capabilities)"):folder>/<image>'
)
def redirect_old_hero_images(folder, image):
    return redirect("/static/images/2020/%s/%s" % (folder, image)), 301

prior to 2.2.0 this would allow the following redirect:

/static/images/2019/jamstack/random.png

to

/static/images/2020/jamstack/random.png

Now however it incorrectly sends us to:

/static/images/2020/jamstack/jamstack

Pinning werkzeug to 2.1.2 fixes the issue.

Environment:

  • Python version: 3.8
  • Werkzeug version: 2.2.0 and 2.2.1
@davidism davidism changed the title Bug in regex path matching in 2.2.1 regex URL converter matches incorrect value Aug 1, 2022
@davidism
Copy link
Member

davidism commented Aug 1, 2022

Werkzeug does not provide a regex() converter. Please provide a minimal reproducible example.

@tunetheweb
Copy link
Author

tunetheweb commented Aug 2, 2022

Apologies for that, wasn't aware that this wrapped some other code.

Here's a minimal reproducible pytest example which works in 2.1.2 but not in 2.2.0 nor 2.2.1:

from werkzeug import routing as r
from werkzeug.routing import BaseConverter

class RegexConverter(BaseConverter):
    def __init__(self, url_map, *items):
        super(RegexConverter, self).__init__(url_map)
        self.regex = items[0]

def test_regex():
    m = r.Map(
        [
            r.Rule("/<regex('(privacy|jamstack|capabilities)'):folder>/<image>", endpoint="simple"),
        ], converters={'regex': RegexConverter}
    )
    a = m.bind("localhost", path_info="/jamstack/random.png")

    assert a.match() == ("simple", {"folder": "jamstack", "image": "random.png"})

Result from 2.2.1:

>       assert a.match() == ("chapter_image", {"folder": "jamstack", "image": "random.png"})
E       AssertionError: assert ('chapter_ima...: 'jamstack'}) == ('chapter_ima...'random.png'})
E         At index 1 diff: {'folder': 'jamstack', 'image': 'jamstack'} != {'folder': 'jamstack', 'image': 'random.png'}
E         Use -v to get more diff

You can see it's incorrectly providing the folder match (jamstack) to both the folder AND the image variable.

@pgjones
Copy link
Member

pgjones commented Aug 2, 2022

If you change the rule to r.Rule("/<regex('privacy|jamstack|capabilities'):folder>/<image>", endpoint="simple") i.e. remove the parenthesis within the regex definition it will work as expected. This is a definite change from the previous router, however I think it makes more sense (converters shouldn't define more than one group, so they shouldn't need to define groups at all). I think its best for me to update the documentation for this, however I'd like your views.

@tunetheweb
Copy link
Author

That works find for me so I'm happy enough with that.

However, I do wonder if it limits routing somewhat for more complex regexs?

For example if you wanted to match 2020-2021 you might use the following:

r.Rule("/<regex('20(20|21|22)'):folder>/<image>", endpoint="simple")

Here the grouping is used to group the options from anything before or after them.
Not sure how much of an issue this would really be though.

Anyway, I'm happy to close this issue and will let you make the call on that.

@davidism
Copy link
Member

davidism commented Aug 2, 2022

You can use the non-grouping syntax if you need parens.

@pgjones
Copy link
Member

pgjones commented Aug 2, 2022

You could do r.Rule("/20<regex('20|21|22'):folder>/<image>", endpoint="simple") for that case.

@tunetheweb
Copy link
Author

You can use the non-grouping syntax if you need parens.

Yeah I've confirmed that r.Rule("/<regex('20(?:20|21|22)'):folder>/<image>", endpoint="simple"), works for that.

You could do r.Rule("/20<regex('20|21|22'):folder>/<image>", endpoint="simple") for that case.

That doesn't seem to work in 2.1.2 nor 2.2.x

Anyway, think with the first option, we've enough to close this. Or want to keep it open until you update the docs?

@davidism davidism closed this as completed Aug 2, 2022
@pgjones
Copy link
Member

pgjones commented Aug 2, 2022

This works for me,

class RegexConverter(r.BaseConverter):
    def __init__(self, url_map, *items):
        super().__init__(url_map)
        self.regex = items[0]

def test_regex():
    m = r.Map(
        [
            r.Rule("/20<regex('21|22|23'):folder>/<image>", endpoint="simple"),
        ], converters={'regex': RegexConverter}
    )
    a = m.bind("localhost", path_info="/2021/random.png")

    assert a.match() == ("simple", {"folder": "21", "image": "random.png"})

What didn't work for you?

@davidism
Copy link
Member

davidism commented Aug 2, 2022

Oh, didn't notice that @pgjones wanted to document it.

@davidism davidism reopened this Aug 2, 2022
@tunetheweb
Copy link
Author

This works for me,

class RegexConverter(r.BaseConverter):
    def __init__(self, url_map, *items):
        super().__init__(url_map)
        self.regex = items[0]

def test_regex():
    m = r.Map(
        [
            r.Rule("/20<regex('21|22|23'):folder>/<image>", endpoint="simple"),
        ], converters={'regex': RegexConverter}
    )
    a = m.bind("localhost", path_info="/2021/random.png")

    assert a.match() == ("simple", {"folder": "21", "image": "random.png"})

What didn't work for you?

The folder value should be 2021 not just 21. Obviously you can work around it, but as a pure substitute for the previous grouping functionality available prior to 2.2.0, it's not the same.

@davidism
Copy link
Member

davidism commented Aug 2, 2022

This demonstrates why we don't have a regex converter built-in. Regex is the behind-the-scenes implementation of the routing system. It's very hard to validate that any arbitrary regex can be inserted into that safely and correctly.

For the original example, you'd be better off using the any(a, b, ...) converter. For other stuff, we would encourage you to write specific converters rather than leaving regex wide open.

@tunetheweb
Copy link
Author

Yup today I learned! Will convert all my Regex converters to inbuilt ones as they seem to be sufficient for my needs.

pgjones added a commit to pgjones/werkzeug that referenced this issue Aug 3, 2022
This should clarify that the regex itself is matched as a group and
hence discourage gropuing within. See also pallets#2481.
@pgjones
Copy link
Member

pgjones commented Aug 3, 2022

See #2488

@snarfed
Copy link

snarfed commented Aug 3, 2022

This demonstrates why we don't have a regex converter built-in. Regex is the behind-the-scenes implementation of the routing system. It's very hard to validate that any arbitrary regex can be inserted into that safely and correctly.

Thanks for this! I'm seeing a related failure in werkzeug 2.2.0's new router on the same custom RegexConverter: it hangs in an infinite loop on on moderately complex regexps, details in snarfed/webutil#8. My use case is matching domains and fediverse addresses (which contain domains), examples here and here.

I'm only using RegexConverter as an easy way to accept valid domains and 404 anything else. It'll be easy for me to switch to doing that check and returning 404 manually when necessary, I'll plan to do that.

@davidism
Copy link
Member

davidism commented Aug 3, 2022

It looks like that's the same issue, you have a capturing group in your regex. The regexes for converters are intended to be fairly simple, with further validation done in to_python. Your solution to accept a string and do the check in the view is fine too.

@snarfed
Copy link

snarfed commented Aug 3, 2022

@davidism thanks! Just FYI, I still see the hang even if I remove the capturing group, eg:

app.add_url_rule(r'/<regex("[^/:]+\.[^/:]+")>', view_func=lambda: 'foo')

@snarfed
Copy link

snarfed commented Aug 3, 2022

Looks like the [...]s are to blame, minimal repro is a character set with a single character:

app.add_url_rule(r'/<regex("[X]")>', view_func=lambda: 'foo')

(Still just FYI! I'm still happy to fix it on my end by simplifying or dropping my RegexConverter usage.)

@davidism
Copy link
Member

davidism commented Aug 4, 2022

@pgjones this seems like an issue, because I'd expect a custom converter might legitimately want to use a character group []. We should figure out why that doesn't work now.

@pgjones
Copy link
Member

pgjones commented Aug 4, 2022

'/<regex("[X]")>' is an invalid rule as regex("[X]") is not a valid variable name, needs to be something like `'/<regex("[X]"):name>'. The current router hangs rather than reporting this though, fix incoming.

'/<regex("[^/:]+\.[^/:]+")>' is problematic as it has / not used as a path part.

@snarfed
Copy link

snarfed commented Aug 4, 2022

Thanks for looking! Apologies, the missing variable names were oversights on my part, trying a bit too hard to minimize test cases. It does hang similarly with eg /<regex("[X]":foo)>, as you mentioned, so the missing variable names don't seem to be the root cause.

@pgjones
Copy link
Member

pgjones commented Aug 5, 2022

Added a second commit to #2489 I think fixes these issues.

@snarfed
Copy link

snarfed commented Aug 5, 2022

@pgjones this fix is great, all of my tests are passing again. Thank you! Looking forward to seing it released!

snarfed added a commit to snarfed/flask-gae-static that referenced this issue Aug 9, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants