Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnyUrl adds trailing slash #7186

Open
1 task done
charel-felten-rq opened this issue Aug 21, 2023 · 25 comments
Open
1 task done

AnyUrl adds trailing slash #7186

charel-felten-rq opened this issue Aug 21, 2023 · 25 comments
Assignees
Labels
help wanted Pull Request welcome urls Related to `Url` types

Comments

@charel-felten-rq
Copy link

charel-felten-rq commented Aug 21, 2023

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

AnyURL now adds a trailing slash in validation of URLs that dont have a trailing slash. This caused CORS issues for us, because CORS doesnt allow trailing slash (I think, seems to be written here: https://www.w3.org/TR/2008/WD-access-control-20080214/)

In my opinion i see this as a bug because I dont see the option toggle this addition of trailing slash. Maybe its expected behaviour, please let me know then how I can turn it off.

Example Code

from pydantic import AnyHttpUrl as AnyUrlv2
from pydantic.v1 import AnyHttpUrl as AnyUrlv1

print(AnyUrlv2("http://localhost:3000"))
print(AnyUrlv1("http://localhost:3000", scheme="http"))


prints:

http://localhost:3000/
http://localhost:3000

Python, Pydantic & OS Version

pydantic version: 2.2.1
        pydantic-core version: 2.6.1
          pydantic-core build: profile=release pgo=false
                 install path: /Users/charelfeltenrq/.pyenv/versions/3.10.8/lib/python3.10/site-packages/pydantic
               python version: 3.10.8 (main, Dec  2 2022, 11:54:08) [Clang 14.0.0 (clang-1400.0.29.202)]
                     platform: macOS-13.3.1-arm64-arm-64bit
     optional deps. installed: ['email-validator', 'typing-extensions']

Selected Assignee: @dmontagu

@charel-felten-rq charel-felten-rq added bug V2 Bug related to Pydantic V2 unconfirmed Bug not yet confirmed as valid/applicable labels Aug 21, 2023
@Kludex Kludex removed the unconfirmed Bug not yet confirmed as valid/applicable label Aug 21, 2023
@charel-felten-rq
Copy link
Author

charel-felten-rq commented Aug 21, 2023

somewhat related: pydantic/pydantic-settings#104

At this point, to my understanding, it seems its intended to add the slash. Is it? In that case, I am interested in whats the optimal way to remove this trailing slash, without forgoing the other validation (as seems to be the proposed solution in the related issue)

@charel-felten-rq
Copy link
Author

This problem also becomes apparent in the url builder:

from pydantic import AnyHttpUrl as AnyHttpUrl

AnyHttpUrl.build(
    scheme="https",
    host="google.com",
    path="/search",
)

produces: Url('https://google.com//search')

Is that the desired outcome? If yes, then does it mean that all Url builds we have in our codebase should now have a path that starts without a slash?

@samuelcolvin
Copy link
Member

The second case is unrelated.

This is done by the URL rust crate, so it probably mandated by a rfc somewhere.

@samuelcolvin
Copy link
Member

Sorry, the second case is related.

@charel-felten-rq
Copy link
Author

charel-felten-rq commented Aug 22, 2023

Another weird situation

from pydantic import AnyHttpUrl as AnyHttpUrl

AnyHttpUrl.build(
    scheme="https",
    host="google.com/",
    path="search",
)

Url('https://google.com//search')

@adriangb
Copy link
Member

Another weird situation

That's technically valid and works, not sure we need to do anything special there.

I think we might come up with a better solution for this long term but for now if you don't need a Url instance you could do something like:

from typing import Annotated, Any, List

from pydantic_core import CoreSchema, core_schema

from pydantic import AfterValidator, BeforeValidator, GetCoreSchemaHandler, TypeAdapter
from pydantic.networks import AnyUrl


class Chain:
    def __init__(self, validations: List[Any]) -> None:
        self.validations = validations
    
    def __get_pydantic_core_schema__(self, source_type: Any, handler: GetCoreSchemaHandler) -> CoreSchema:
        return core_schema.chain_schema(
            [
                *(handler.generate_schema(v) for v in self.validations),
                handler(source_type)
            ]
        )


def strip_slash(value: str) -> str:
    return value.rstrip('/')


MyAnyUrl = Annotated[str, AfterValidator(strip_slash), BeforeValidator(str), Chain([AnyUrl])]

ta = TypeAdapter(MyAnyUrl)

print(ta.validate_python('https://www.google.com'))
# https://www.google.com

@charel-felten-rq
Copy link
Author

charel-felten-rq commented Aug 24, 2023

Thank you for posting this workaround, I am not 100% sure whats going on there, but to my understanding it seems that it creates a type MyAnyUrl which is a str, but includes the validation from AnyUrl. That could be useful, maybe we will use it or a similar idea.

However, I am still a bit baffled that it seems to be seen as not a problem that AnyUrl adds trailing / to all Urls, to me this is not validation anymore as it actually changes the Url. And also .build creating Urls with // in them is problematic for our situation.

For now, we have decided to roll back to pydantic v1. Will there be any way to keep using the pydantic v1 Url behaviour (i.e. not adding /) in v2? We tried via from pydantic.v1 import AnyUrl but this caused errors inside BaseSettings. Or is this really the desired behaviour and you are not planning on changing it back?

Lastly, just wanted to mention, I am a very happpy user of pydantic, and I love the work you all do, just this change of behaviour of Url is giving us a bit of a headache.

@tworedz
Copy link

tworedz commented Aug 24, 2023

Also finding this new behavior very annoying. In pydantic v1 urls inherited from str, so it can be easily passed to library codes, where they usually expects string representation of urls. But now we need to str(model.url_field) or model.url_field.unicode_string() when passing urls, because now it is pydantic_core._pydantic_core.Url

I'm not sure which variant right (old or new url), but this little slash causes very strange and hours of uncatchable bug fixing.

FYI, I come up with this solution:

from typing import Annotated
from pydantic import AfterValidator, AnyUrl, BaseModel

URL = Annotated[AnyUrl, AfterValidator(lambda x: str(x).rstrip("/"))]

class Model(BaseModel):
    url: URL

print(Model(url="https://google.com").url)
# > https://google.com

But sadly we cannot use this solution everywhere, because sometimes we need the last slash.

Seems, that v2 url validator adding slashes only to naked urls, i.e. without path, which is acceptable:

from pydantic import AnyUrl, BaseModel

class Model(BaseModel):
    url: AnyUrl

print(Model(url="https://google.com").url)
# > https://google.com/

print(Model(url="https://google.com/").url)
# > https://google.com/

print(Model(url="https://google.com/api").url)
# > https://google.com/api

print(Model(url="https://google.com/api/").url)
# > https://google.com/api/

Would be nice, if old behavior preserves, or like author suggested, we can choose behavior type. Thanks in advance!

@Tradunsky
Copy link

+1 this is a bug.

In addition to a broken behavior when config URLs used to be used to append a path, like:
https://google.com +"/search" (and all other variants with appending path in python)

There is also a violated principle of a least surprise when original value modified by a validation framework as when an URL is configured without slash is returned or saved in other components: DB, external services, they receive a new value which silently violated previous protocol until those actually start to execute it in a request.

@chbndrhnns
Copy link
Contributor

Stumbled upon this, as well. It's not possible to take an instance of pydantic.AnyHttpUrl and construct the same new instance of pydantic.AnyHttpUrl using the attributes of the former. This worked perfectly fine in v1.

I consider this a breaking change. Can we please add it to the migration docs at least?

To my knowledge, the double slash will cause webservers to respond with a 404 as they expect a single slash as a path separator.

def test_v2():
    from pydantic import TypeAdapter, AnyHttpUrl

    redirect_uri = TypeAdapter(AnyHttpUrl).validate_python(
        "http://localhost:3000/api/v1"
    )
    assert (
        str(
            AnyHttpUrl.build(
                scheme=redirect_uri.scheme,
                host=redirect_uri.host,
                port=redirect_uri.port,
                path=redirect_uri.path,
            )
        )
        == "http://localhost:3000/api/v1"
    )


def test_v1():
    from pydantic.v1 import parse_obj_as, AnyHttpUrl

    redirect_uri = parse_obj_as(AnyHttpUrl, "http://localhost:3000/api/v1")
    assert (
        str(
            AnyHttpUrl.build(
                scheme=redirect_uri.scheme,
                host=redirect_uri.host,
                port=redirect_uri.port,
                path=redirect_uri.path,
            )
        )
        == "http://localhost:3000/api/v1"
    )

@chbndrhnns
Copy link
Contributor

chbndrhnns commented Oct 24, 2023

That's technically valid and works

@adriangb
It might be technically valid. However, a URL containing a double slash is pointing to a different endpoint (not sure about the semantic) and thus I expect webservers to respond with a 404.

The use case I have in mind is sending a redirect URI to a client during an OAuth2 dance.

swelborn added a commit to swelborn/distiller that referenced this issue Oct 31, 2023
new behavior adds trailing `/` to end of links without a path:

pydantic/pydantic#7186 (comment)
@alexsantos
Copy link

alexsantos commented Nov 3, 2023

I had some validation checks because of this issue: the validator was testing for http://loinc.org but I was getting http://loinc.org/ on the model_dump_json()
This is what I did:

from typing import Optional

from pydantic import BaseModel, AnyUrl, Extra, field_serializer


class Coding(BaseModel):
    system: Optional[AnyUrl] = None
    version: Optional[str] = None
    code: Optional[str] = None
    display: Optional[str] = None
    userSelected: Optional[bool] = None

    @field_serializer('system')
    def serialize_system(self, system: AnyUrl):
        return str(system).rstrip('/')

    class Config:
        extra = Extra.forbid

@Saran33
Copy link

Saran33 commented Nov 21, 2023

Another less-than-ideal workaround (if it's acceptable to strip all trailing slashes):

from typing import Annotated
from pydantic import AnyHttpUrl, AfterValidator, BaseModel, PlainValidator, TypeAdapter

AnyHttpUrlAdapter = TypeAdapter(AnyHttpUrl)

HttpUrlStr = Annotated[
    str,
    PlainValidator(lambda x: AnyHttpUrlAdapter.validate_strings(x)),
    AfterValidator(lambda x: str(x).rstrip("/")),
]


class Model(BaseModel):
    url: HttpUrlStr


url = Model(url="https://google.com").url
print(url)
# > https://google.com

This also solves another somewhat related issue: #6395
In addition to the above points, IMO, when appending paths, this is more readable:

URL = f"{BASE_URL}/api/v1"

than this:

URL = f"{BASE_URL}api/v1"

@rafrafek
Copy link

rafrafek commented Dec 1, 2023

I tried:

service_url: HttpUrl

@field_validator("service_url", mode="after")
@classmethod
def service_url_remove_trailing_slash(cls, service_url: HttpUrl) -> str:
    return str(service_url).rstrip("/")

But then I got warnings:

/site-packages/pydantic/main.py:308: UserWarning: Pydantic serializer warnings:
    Expected `url` but got `str` - serialized value may not be as expected
    return self.__pydantic_serializer__.to_python(

How to keep the field as HttpUrl without the trailing slash?

Edit:

HttpUrl no longer supports + operation with strings, e.g. foo.my_url + "/bar" #8281

And if I workaround this by using custom type based on str then I can no longer use properties like .host on it.

@slingshotvfx
Copy link

slingshotvfx commented Dec 6, 2023

Seems, that v2 url validator adding slashes only to naked urls, i.e. without path, which is acceptable:

from pydantic import AnyUrl, BaseModel

class Model(BaseModel):
    url: AnyUrl

print(Model(url="https://google.com").url)
# > https://google.com/

print(Model(url="https://google.com/").url)
# > https://google.com/

print(Model(url="https://google.com/api").url)
# > https://google.com/api

print(Model(url="https://google.com/api/").url)
# > https://google.com/api/

This inconsistency is my biggest issue, especially when trying to add on paths or sub-paths as others have mentioned. If you can't be certain if a url has a trailing slash or not out of the box then we almost need an os.path.join style method that ensures exactly one forward slash between each path component.

For now I guess I'll just do this:

new_url = f"{str(URL).rstrip('/')}/path/here"

@pavradev
Copy link

pavradev commented Jan 2, 2024

We got this problem when we migrated from pydantic 1 as well. In addition to that, a lot of code was broken since AnyUrl is not accepted as str any more. This is our solution if it will be helpful to anyone:

AnyUrlTypeAdapter = TypeAdapter(AnyUrl)
AnyUrlStr = Annotated[
    str,
    BeforeValidator(lambda value: AnyUrlTypeAdapter.validate_python(value) and value),
]

So we currently just use our "wrapper" AnyUrlStr everywhere for compatibility. It also keeps the original string without any modification if it passes the validation.

@JensHeinrich
Copy link

Wouldn't it be better to enhance the AnyUrl type by giving it it's own implementation of __add__ providing a feature like urllib's urljoin ?

@vgavro
Copy link

vgavro commented Jan 25, 2024

It's definitely a BUG, not "documentation issue".
Consider this example:

In [14]: url = AnyUrl.build(scheme='http', host='example.com', path='/')

In [15]: url = AnyUrl.build(scheme=url.scheme, host=url.host, path=url.path)

In [16]: url = AnyUrl.build(scheme=url.scheme, host=url.host, path=url.path)

In [17]: print(url)
http://example.com////

@samuelcolvin
Copy link
Member

Thanks everyone for commenting.

I'd be happy to accept a PR to allow an opt-in change of behaviour. While i know many people here think the current behaviour is a bug, there will be others who are relying on it, so we can't change the default behavior in a minor release.

Here's my proposal for what needs to change:

  • change this code to add another fiels to the PyUrl rust struct called something like omit_trailing_slash: bool, default false only applies when lib_url.path == "/"
  • Based on that flag, change the behaviour of __str__, __repr__, unicode_string and path to omit the trailing slash if set
  • Same changes for PyMultiHostUrl
  • change the url_schema() core schema to accept omit_trailing_slash
  • same for multi_host_url_schema
  • we'll then need a PR to pydantic to add a config flag url_omit_trailing_slash (default False) which changes how the core schema is built

If anyone's interested, feel free to submit a PR. Might be a nice start for anyone interested in getting their hands dirty with rust as (AFAIK) it shouldn't be a too complicated change, but also not trivial.

@samuelcolvin samuelcolvin added help wanted Pull Request welcome and removed documentation labels Jan 26, 2024
@tworedz
Copy link

tworedz commented Jan 26, 2024

If no one is against it, I want to try!

@samuelcolvin
Copy link
Member

@tworedz, go for it.

Wouldn't it be better to enhance the AnyUrl type by giving it it's own implementation of __add__ providing a feature like urllib's urljoin ?

@JensHeinrich, interesting idea - PR welcome.

@vgavro
Copy link

vgavro commented Jan 26, 2024

@samuelcolvin that's the problem with this ticket - it puts the problem wrong.

The problem is not with trailing slashes, it's about leading path slash.

/api and /api/ are different paths, pydantic handles them correctly, flag name omit_trailing_slash would be confusing!

There are TWO problems:

  1. Behavior with normalizing urls - adding slash to path if path is empty. From HTTP protocol point of view it makes perfect sense, (because you can't send a request to "" path) - but for other protocols may not. So option like "ensure_absolute_path" is possible, enabled by default for compatibility, and may be overridden, but I believe this is sane default anyway.

  2. BUG with Url.build (why I created separate Url.build adds additional leading "/" to path #8640 for that) - that adding LEADING slashes no matter if there WAS another leading slash already. Strictly saying, it doesn't relate to normalizing urls problem.

@davidhewitt
Copy link
Contributor

For what it's worth, the outcome of pydantic/pydantic-core#1173 was that we think it's best to get the following upstream feature added for us to do Url.build properly: servo/rust-url#835

Based on the upvotes on that issue, anyone who is interested in taking that on would make a lot of people happy!

@kfreezen
Copy link

I keep running into this issue in a project I work on. The real problem isn't omission of slashes, it's the fact that hostnames get slashes added to them even if the user or developer did not specify them.

So

http://localhost:3000 -> http://localhost:3000/

But

http://localhost:3000/api -> http://localhost:3000/api

I feel like I've been on a discussion about this somewhere already, but I keep running into this issue. If we had a way to make it so that we could use the type validation features but not add a slash to hostname (like it was in v1) that would be fantastic.

@samuelcolvin would omit_trailing_slash: bool then omit slashes in all cases, even if user specifies them? I don't think it should, because then you have the same problem as before, except now you're just omitting the slash instead of adding it.

@shawnwall
Copy link
Contributor

also having this issue with CORS and need to work around

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Pull Request welcome urls Related to `Url` types
Projects
None yet
Development

No branches or pull requests