Skip to content

perf: use filename field instead of parsing filename from URL#10895

Draft
radoering wants to merge 1 commit into
python-poetry:mainfrom
radoering:perf/link-filename
Draft

perf: use filename field instead of parsing filename from URL#10895
radoering wants to merge 1 commit into
python-poetry:mainfrom
radoering:perf/link-filename

Conversation

@radoering
Copy link
Copy Markdown
Member

This makes locking 15 % faster in some cases.

Pull Request Check List

Requires: python-poetry/poetry-core#941

  • Added tests for changed code.
  • Updated documentation for changed code.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • Using file["filename"] directly in _link_cache will raise a KeyError for existing JSON sources that don’t provide filename; consider falling back to parsing the URL when the field is absent to maintain backward compatibility.
  • In test_attributes, the second file has url ending with .tar.gz but filename set to .whl; align these values or add an explicit comment if the mismatch is intentional to avoid confusion.
  • The temporary poetry-core @ git+... dependency in pyproject.toml should be reverted to a released version (or made conditional) before merging, once the upstream change is available.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Using `file["filename"]` directly in `_link_cache` will raise a `KeyError` for existing JSON sources that don’t provide `filename`; consider falling back to parsing the URL when the field is absent to maintain backward compatibility.
- In `test_attributes`, the second file has `url` ending with `.tar.gz` but `filename` set to `.whl`; align these values or add an explicit comment if the mismatch is intentional to avoid confusion.
- The temporary `poetry-core @ git+...` dependency in `pyproject.toml` should be reverted to a released version (or made conditional) before merging, once the upstream change is available.

## Individual Comments

### Comment 1
<location path="tests/repositories/link_sources/test_json.py" line_range="36" />
<code_context>
             # all (with non-default values)
             {
                 "url": "https://example.org/demo-0.1.tar.gz",
+                "filename": "demo-0.1.whl",
                 "requires-python": ">=3.6",
                 "yanked": True,
</code_context>
<issue_to_address>
**issue (testing):** Filename in the "all (with non-default values)" test case does not match the URL extension

In `test_attributes`, this entry uses URL `demo-0.1.tar.gz` but sets `filename` to `demo-0.1.whl`. That mismatch could mask differences in how `.tar.gz` vs `.whl` are handled. Please set `"filename": "demo-0.1.tar.gz"` so the test reflects a realistic, consistent case.
</issue_to_address>

### Comment 2
<location path="tests/repositories/link_sources/test_json.py" line_range="202" />
<code_context>
 )
 def test_base_url(url: str, repo_url: str, expected: str) -> None:
-    page = SimpleJsonPage(repo_url, {"files": [{"url": url}]})
+    page = SimpleJsonPage(
+        repo_url, {"files": [{"url": url, "filename": url.rpartition("/")[-1]}]}
+    )
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case where `filename` differs from the URL basename (e.g. query string) to prove the new behavior

Right now `test_base_url` still has `filename` equal to the URL basename, so it doesn’t exercise the new behavior. Please add a parametrized case where the URL includes extra components (e.g. a query string) but `filename` is a clean value like `demo-0.1.whl`, and assert that `link.filename` (or equivalent) matches `filename` and is not affected by the query string to demonstrate the correctness gain from this change.

Suggested implementation:

```python
def test_base_url(url: str, repo_url: str, expected: str) -> None:
    filename = url.rpartition("/")[-1]
    page = SimpleJsonPage(
        repo_url,
        {"files": [{"url": url, "filename": filename}]},
    )
    link = next(iter(page.links))
    assert link.url == expected
    assert getattr(link, "filename", None) == filename


def test_base_url_filename_with_query_string() -> None:
    url = "https://example.org/packages/demo-0.1.whl?download=1"
    repo_url = "https://example.org/simple/"
    filename = "demo-0.1.whl"

    page = SimpleJsonPage(
        repo_url,
        {"files": [{"url": url, "filename": filename}]},
    )
    link = next(iter(page.links))

    # The link's filename must come from the JSON `filename` field and not include the query string.
    assert getattr(link, "filename", None) == filename

```

The comment asked for a "parametrized case". If the surrounding code uses `@pytest.mark.parametrize("url, repo_url, expected", [...])` for `test_base_url`, you may want to:
1. Extend that parametrize list to include a new tuple where the URL has a query string but `expected` is whatever your `SimpleJsonPage` logic expects for `link.url`.
2. Optionally add `expected_filename` as a fourth parameter in the parametrize decorator and function signature, and then assert `link.filename == expected_filename` instead of deriving `filename` from the URL.
Adjust the new `test_base_url_filename_with_query_string` or merge it into the existing parametrized `test_base_url` depending on how strictly you want to follow the "single parametrized test" style in your test suite.
</issue_to_address>

### Comment 3
<location path="tests/repositories/link_sources/test_json.py" line_range="29-38" />
<code_context>
     content = {
         "files": [
             # minimal
</code_context>
<issue_to_address>
**suggestion (testing):** Assert that the `Link` object actually uses the provided `filename` field

In tests like `test_metadata` and `test_attributes`, you now pass a `filename` in the JSON but never assert on `link.filename`. Please add an assertion that `link.filename` matches the provided value (e.g. `"demo-0.1.whl"`) so the new code path is explicitly exercised and we know it’s not just falling back to URL parsing.

Suggested implementation:

```python
    assert link.url == "https://example.org/demo-0.1.whl"
    assert link.filename == "demo-0.1.whl"

```

```python
    assert link.url == "https://example.org/demo-0.1.tar.gz"
    assert link.filename == "demo-0.1.whl"

```

If the exact URL/assert strings differ in your codebase (e.g. additional parameters, different quoting, or attribute order), adjust the `SEARCH` fragments to match the actual `assert link.url == ...` lines in `test_metadata` and `test_attributes`. The key is to add `assert link.filename == "demo-0.1.whl"` immediately after the existing `link.url` assertions for the respective `Link` instances.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread tests/repositories/link_sources/test_json.py Outdated
Comment thread tests/repositories/link_sources/test_json.py
Comment thread tests/repositories/link_sources/test_json.py
This makes locking 15 % faster in some cases.
@radoering radoering force-pushed the perf/link-filename branch from b05dbef to 6c32445 Compare May 15, 2026 11:21
@radoering radoering marked this pull request as draft May 15, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant