Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix URL encoding issue in HttpDownloader to handle special characters #5809

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

hstct
Copy link
Contributor

@hstct hstct commented Sep 17, 2024

closes #5686

@lubosmj
Copy link
Member

lubosmj commented Sep 18, 2024

Please, write a changelog message named after the issue number, like so: CHANGES/5686.bugfix.

https://pulpproject.org/pulpcore/docs/dev/tutorials/quickstart/?h=changelog#3-update-the-changelog-entry

@hstct
Copy link
Contributor Author

hstct commented Sep 18, 2024

Please, write a changelog message named after the issue number, like so: CHANGES/5686.bugfix.

https://pulpproject.org/pulpcore/docs/dev/tutorials/quickstart/?h=changelog#3-update-the-changelog-entry

Ah yea, sorry it was late yesterday xD

@hstct
Copy link
Contributor Author

hstct commented Sep 18, 2024

For additional context:

The issue we faced wasn't simply that the filename contained %3a, which should be decoded to :, but rather that the decoding process shouldn't have occurred in the first place. In reality, %3A is the encoded version of :, but only the % symbol was being encoded by the web server, resulting in the string becoming %253a. This allowed the file to be found despite the unexpected encoding.

@hstct hstct force-pushed the encode_url_path_before_download branch 2 times, most recently from 1ac61f8 to 0a433ce Compare September 19, 2024 11:21
@hstct
Copy link
Contributor Author

hstct commented Sep 19, 2024

Added some tests for this as well.

@hstct hstct force-pushed the encode_url_path_before_download branch from 0a433ce to e9f80d4 Compare September 19, 2024 13:10
@hstct
Copy link
Contributor Author

hstct commented Sep 19, 2024

idk why the test is failing on github 😢

@git-hyagi
Copy link
Contributor

hi @hstct

I was checking this error, and I think it might be related to the way aiohttp server is parsing the request (but it is just a guess, I could not make it work yet):

def gen_fixture_server(gen_threaded_aiohttp_server):
def _gen_fixture_server(fixtures_root, ssl_ctx):
app = web.Application()
call_record = add_recording_route(app, fixtures_root)
return gen_threaded_aiohttp_server(app, ssl_ctx, call_record)
yield _gen_fixture_server

I will do more tests and let you know if I find something.

@git-hyagi
Copy link
Contributor

I managed to make the test work by modifying the pulpcore/pytest_plugin.py file with:

@pytest.fixture
def gen_fixture_server(gen_threaded_aiohttp_server):
    def _gen_fixture_server(fixtures_root, ssl_ctx):
        app = web.Application(middlewares=[_aiohttp_request_middleware])
        call_record = add_recording_route(app, fixtures_root)
        return gen_threaded_aiohttp_server(app, ssl_ctx, call_record)

    yield _gen_fixture_server


@web.middleware
async def _aiohttp_request_middleware(request,handler):
    unquoted_url = urllib.parse.unquote(request.url.human_repr())
    url = urllib.parse.urlparse(unquoted_url)
    response = await handler(request.clone(rel_url=url.path))
    return response

but I don't know the impact of this change in the other tests, or if this could be even considered a viable "solution".

@hstct
Copy link
Contributor Author

hstct commented Sep 20, 2024

hi @git-hyagi thanks for the help!

As far as I can tell and this has been the whole problem with this issue in general is that it always is dependent on how the web server is serving the files, whether or not it will encode %3a to %253a or read it as an encoded :. So your approach really did help out here. I've been running tests for pulp_rpm, pulpcore, and pulp_deb locally and they seem to work fine. However I doubt that there are many tests that purposefully use problematic filenames. And that is what it is in the end, files having problematic names with special characters that get encoded when served online. So this is a workaround to make it more robust but I also can't tell if it covers all edge cases.

@hstct hstct force-pushed the encode_url_path_before_download branch from e9f80d4 to 35eaf4a Compare September 20, 2024 06:30
@hstct hstct force-pushed the encode_url_path_before_download branch from 35eaf4a to bd062e7 Compare September 20, 2024 06:31
Copy link
Contributor

@git-hyagi git-hyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!
LGTM

Copy link
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice testing!

@gerrod3 gerrod3 merged commit d32580d into pulp:main Sep 20, 2024
12 checks passed
@quba42 quba42 deleted the encode_url_path_before_download branch September 23, 2024 08:32
@quba42
Copy link
Contributor

quba42 commented Sep 23, 2024

I am afraid we may have shot ourselves in the foot here. Our internal integration tests revealed that while this fixes what we set out to fix, it broke the far more important use case of "download URL's with special characters in them".

I have just verified this using latest pulpcore main branch in oci-env. I tried syncing a fairly standard APT repo, which failed because Pulp tried to download adduser_3.118%252Bdeb11u1_all.deb instead of adduser_3.118+deb11u1_all.deb.

I suggest we revert this PR for now, until we can figure out a better approach. Unfortunately I fear there may be no good way to make the down-loader robust for both use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pulp_file fails to sync file repos literally containing URL encoding strings like %3a in a file name
5 participants