perf: use filename field instead of parsing filename from URL#10895
Draft
radoering wants to merge 1 commit into
Draft
perf: use filename field instead of parsing filename from URL#10895radoering wants to merge 1 commit into
radoering wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- Using
file["filename"]directly in_link_cachewill raise aKeyErrorfor existing JSON sources that don’t providefilename; consider falling back to parsing the URL when the field is absent to maintain backward compatibility. - In
test_attributes, the second file hasurlending with.tar.gzbutfilenameset to.whl; align these values or add an explicit comment if the mismatch is intentional to avoid confusion. - The temporary
poetry-core @ git+...dependency inpyproject.tomlshould be reverted to a released version (or made conditional) before merging, once the upstream change is available.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Using `file["filename"]` directly in `_link_cache` will raise a `KeyError` for existing JSON sources that don’t provide `filename`; consider falling back to parsing the URL when the field is absent to maintain backward compatibility.
- In `test_attributes`, the second file has `url` ending with `.tar.gz` but `filename` set to `.whl`; align these values or add an explicit comment if the mismatch is intentional to avoid confusion.
- The temporary `poetry-core @ git+...` dependency in `pyproject.toml` should be reverted to a released version (or made conditional) before merging, once the upstream change is available.
## Individual Comments
### Comment 1
<location path="tests/repositories/link_sources/test_json.py" line_range="36" />
<code_context>
# all (with non-default values)
{
"url": "https://example.org/demo-0.1.tar.gz",
+ "filename": "demo-0.1.whl",
"requires-python": ">=3.6",
"yanked": True,
</code_context>
<issue_to_address>
**issue (testing):** Filename in the "all (with non-default values)" test case does not match the URL extension
In `test_attributes`, this entry uses URL `demo-0.1.tar.gz` but sets `filename` to `demo-0.1.whl`. That mismatch could mask differences in how `.tar.gz` vs `.whl` are handled. Please set `"filename": "demo-0.1.tar.gz"` so the test reflects a realistic, consistent case.
</issue_to_address>
### Comment 2
<location path="tests/repositories/link_sources/test_json.py" line_range="202" />
<code_context>
)
def test_base_url(url: str, repo_url: str, expected: str) -> None:
- page = SimpleJsonPage(repo_url, {"files": [{"url": url}]})
+ page = SimpleJsonPage(
+ repo_url, {"files": [{"url": url, "filename": url.rpartition("/")[-1]}]}
+ )
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case where `filename` differs from the URL basename (e.g. query string) to prove the new behavior
Right now `test_base_url` still has `filename` equal to the URL basename, so it doesn’t exercise the new behavior. Please add a parametrized case where the URL includes extra components (e.g. a query string) but `filename` is a clean value like `demo-0.1.whl`, and assert that `link.filename` (or equivalent) matches `filename` and is not affected by the query string to demonstrate the correctness gain from this change.
Suggested implementation:
```python
def test_base_url(url: str, repo_url: str, expected: str) -> None:
filename = url.rpartition("/")[-1]
page = SimpleJsonPage(
repo_url,
{"files": [{"url": url, "filename": filename}]},
)
link = next(iter(page.links))
assert link.url == expected
assert getattr(link, "filename", None) == filename
def test_base_url_filename_with_query_string() -> None:
url = "https://example.org/packages/demo-0.1.whl?download=1"
repo_url = "https://example.org/simple/"
filename = "demo-0.1.whl"
page = SimpleJsonPage(
repo_url,
{"files": [{"url": url, "filename": filename}]},
)
link = next(iter(page.links))
# The link's filename must come from the JSON `filename` field and not include the query string.
assert getattr(link, "filename", None) == filename
```
The comment asked for a "parametrized case". If the surrounding code uses `@pytest.mark.parametrize("url, repo_url, expected", [...])` for `test_base_url`, you may want to:
1. Extend that parametrize list to include a new tuple where the URL has a query string but `expected` is whatever your `SimpleJsonPage` logic expects for `link.url`.
2. Optionally add `expected_filename` as a fourth parameter in the parametrize decorator and function signature, and then assert `link.filename == expected_filename` instead of deriving `filename` from the URL.
Adjust the new `test_base_url_filename_with_query_string` or merge it into the existing parametrized `test_base_url` depending on how strictly you want to follow the "single parametrized test" style in your test suite.
</issue_to_address>
### Comment 3
<location path="tests/repositories/link_sources/test_json.py" line_range="29-38" />
<code_context>
content = {
"files": [
# minimal
</code_context>
<issue_to_address>
**suggestion (testing):** Assert that the `Link` object actually uses the provided `filename` field
In tests like `test_metadata` and `test_attributes`, you now pass a `filename` in the JSON but never assert on `link.filename`. Please add an assertion that `link.filename` matches the provided value (e.g. `"demo-0.1.whl"`) so the new code path is explicitly exercised and we know it’s not just falling back to URL parsing.
Suggested implementation:
```python
assert link.url == "https://example.org/demo-0.1.whl"
assert link.filename == "demo-0.1.whl"
```
```python
assert link.url == "https://example.org/demo-0.1.tar.gz"
assert link.filename == "demo-0.1.whl"
```
If the exact URL/assert strings differ in your codebase (e.g. additional parameters, different quoting, or attribute order), adjust the `SEARCH` fragments to match the actual `assert link.url == ...` lines in `test_metadata` and `test_attributes`. The key is to add `assert link.filename == "demo-0.1.whl"` immediately after the existing `link.url` assertions for the respective `Link` instances.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This makes locking 15 % faster in some cases.
b05dbef to
6c32445
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This makes locking 15 % faster in some cases.
Pull Request Check List
Requires: python-poetry/poetry-core#941