perf: use filename field instead of parsing filename from URL by radoering · Pull Request #10895 · python-poetry/poetry

radoering · 2026-05-15T07:38:41Z

This makes locking 15 % faster in some cases.

Pull Request Check List

Requires: python-poetry/poetry-core#941

Added tests for changed code.
Updated documentation for changed code.

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

Using file["filename"] directly in _link_cache will raise a KeyError for existing JSON sources that don’t provide filename; consider falling back to parsing the URL when the field is absent to maintain backward compatibility.
In test_attributes, the second file has url ending with .tar.gz but filename set to .whl; align these values or add an explicit comment if the mismatch is intentional to avoid confusion.
The temporary poetry-core @ git+... dependency in pyproject.toml should be reverted to a released version (or made conditional) before merging, once the upstream change is available.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Using `file["filename"]` directly in `_link_cache` will raise a `KeyError` for existing JSON sources that don’t provide `filename`; consider falling back to parsing the URL when the field is absent to maintain backward compatibility.
- In `test_attributes`, the second file has `url` ending with `.tar.gz` but `filename` set to `.whl`; align these values or add an explicit comment if the mismatch is intentional to avoid confusion.
- The temporary `poetry-core @ git+...` dependency in `pyproject.toml` should be reverted to a released version (or made conditional) before merging, once the upstream change is available.

## Individual Comments

### Comment 1
<location path="tests/repositories/link_sources/test_json.py" line_range="36" />
<code_context>
             # all (with non-default values)
             {
                 "url": "https://example.org/demo-0.1.tar.gz",
+                "filename": "demo-0.1.whl",
                 "requires-python": ">=3.6",
                 "yanked": True,
</code_context>
<issue_to_address>
**issue (testing):** Filename in the "all (with non-default values)" test case does not match the URL extension

In `test_attributes`, this entry uses URL `demo-0.1.tar.gz` but sets `filename` to `demo-0.1.whl`. That mismatch could mask differences in how `.tar.gz` vs `.whl` are handled. Please set `"filename": "demo-0.1.tar.gz"` so the test reflects a realistic, consistent case.
</issue_to_address>

### Comment 2
<location path="tests/repositories/link_sources/test_json.py" line_range="202" />
<code_context>
 )
 def test_base_url(url: str, repo_url: str, expected: str) -> None:
-    page = SimpleJsonPage(repo_url, {"files": [{"url": url}]})
+    page = SimpleJsonPage(
+        repo_url, {"files": [{"url": url, "filename": url.rpartition("/")[-1]}]}
+    )
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case where `filename` differs from the URL basename (e.g. query string) to prove the new behavior

Right now `test_base_url` still has `filename` equal to the URL basename, so it doesn’t exercise the new behavior. Please add a parametrized case where the URL includes extra components (e.g. a query string) but `filename` is a clean value like `demo-0.1.whl`, and assert that `link.filename` (or equivalent) matches `filename` and is not affected by the query string to demonstrate the correctness gain from this change.

Suggested implementation:

```python
def test_base_url(url: str, repo_url: str, expected: str) -> None:
    filename = url.rpartition("/")[-1]
    page = SimpleJsonPage(
        repo_url,
        {"files": [{"url": url, "filename": filename}]},
    )
    link = next(iter(page.links))
    assert link.url == expected
    assert getattr(link, "filename", None) == filename


def test_base_url_filename_with_query_string() -> None:
    url = "https://example.org/packages/demo-0.1.whl?download=1"
    repo_url = "https://example.org/simple/"
    filename = "demo-0.1.whl"

    page = SimpleJsonPage(
        repo_url,
        {"files": [{"url": url, "filename": filename}]},
    )
    link = next(iter(page.links))

    # The link's filename must come from the JSON `filename` field and not include the query string.
    assert getattr(link, "filename", None) == filename

```

The comment asked for a "parametrized case". If the surrounding code uses `@pytest.mark.parametrize("url, repo_url, expected", [...])` for `test_base_url`, you may want to:
1. Extend that parametrize list to include a new tuple where the URL has a query string but `expected` is whatever your `SimpleJsonPage` logic expects for `link.url`.
2. Optionally add `expected_filename` as a fourth parameter in the parametrize decorator and function signature, and then assert `link.filename == expected_filename` instead of deriving `filename` from the URL.
Adjust the new `test_base_url_filename_with_query_string` or merge it into the existing parametrized `test_base_url` depending on how strictly you want to follow the "single parametrized test" style in your test suite.
</issue_to_address>

### Comment 3
<location path="tests/repositories/link_sources/test_json.py" line_range="29-38" />
<code_context>
     content = {
         "files": [
             # minimal
</code_context>
<issue_to_address>
**suggestion (testing):** Assert that the `Link` object actually uses the provided `filename` field

In tests like `test_metadata` and `test_attributes`, you now pass a `filename` in the JSON but never assert on `link.filename`. Please add an assertion that `link.filename` matches the provided value (e.g. `"demo-0.1.whl"`) so the new code path is explicitly exercised and we know it’s not just falling back to URL parsing.

Suggested implementation:

```python
    assert link.url == "https://example.org/demo-0.1.whl"
    assert link.filename == "demo-0.1.whl"

```

```python
    assert link.url == "https://example.org/demo-0.1.tar.gz"
    assert link.filename == "demo-0.1.whl"

```

If the exact URL/assert strings differ in your codebase (e.g. additional parameters, different quoting, or attribute order), adjust the `SEARCH` fragments to match the actual `assert link.url == ...` lines in `test_metadata` and `test_attributes`. The key is to add `assert link.filename == "demo-0.1.whl"` immediately after the existing `link.url` assertions for the respective `Link` instances.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

This makes locking 15 % faster in some cases.

sourcery-ai Bot reviewed May 15, 2026

View reviewed changes

Comment thread tests/repositories/link_sources/test_json.py Outdated

Comment thread tests/repositories/link_sources/test_json.py

Comment thread tests/repositories/link_sources/test_json.py

perf: use filename field instead of parsing filename from URL

6c32445

This makes locking 15 % faster in some cases.

radoering force-pushed the perf/link-filename branch from b05dbef to 6c32445 Compare May 15, 2026 11:21

radoering marked this pull request as draft May 15, 2026 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use filename field instead of parsing filename from URL#10895

perf: use filename field instead of parsing filename from URL#10895
radoering wants to merge 1 commit into
python-poetry:mainfrom
radoering:perf/link-filename

radoering commented May 15, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

radoering commented May 15, 2026

Pull Request Check List

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant