New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype supporting the same package hash json API response that pypi provides #440
base: master
Are you sure you want to change the base?
Conversation
Hey @matteius! Thanks a lot for this PR, this is really awesome! I will try to look closer into this asap, but first will get the CI issues fixed. I see the |
Thanks @dee-me-tree-or-love -- I have updated the branch from master. Willing to discuss or amend any parts of this PR as it makes sense. We merged in the changes to pipenv so pypiserver is our test runner's private pypiserver, so you'll probably be hearing more from me. Thanks for your help! |
Hey @matteius, thanks for your news, this is super nice to hear 😊 I'm very happy that I've been a bit busy last week with some other chores, but I will try to give a proper review to your PR here this week, thank you for being on the line! I'll keep you posted. |
We may be able to have this behavior driven off the preferred hash that is passed in on the CLI -- I think when I first took a stab at this I had no experience with the code base, so I didn't quite pickup on the fact that is configurable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @matteius! I'm terribly sorry for taking such a while for the review, but I got my hands on it at last! It's looking great and I agree with your comments here about the configurable hash function. I really like how the current prototype is written, so I just left some tiny comments with some cleanups and suggestions. Most importantly, do you think it feasible to adjust it so that it will use the configured hash and then we can wait for #459 so that no changes on the client side are necessary for all the new server boots? :) In any case, one last tiny request, if you could write a couple of tests for this new feature, I'd be super grateful 😊 🙏 Let me know what you think and sorry for taking ages once again! :D Thanks for the proposal, if I can be of any help further with this, do let me know ✌️
@@ -373,6 +373,8 @@ def server_static(filename): | |||
|
|||
|
|||
@app.route("/:project/json") | |||
@app.route("/pypi/:project/json") | |||
@app.route("/simple/:project/json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matteius, oh, right! could you elaborate a little on why these routes were added? :) I'm not sure I recognize it fully, in case this is not strictly critical for this feature, do you mind if we add the routes in a separate PR? 🤔
@@ -389,12 +391,29 @@ def json_info(project): | |||
if not packages: | |||
raise HTTPError(404, f"package {project} not found") | |||
|
|||
package_links = defaultdict(list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool! I really like the use of defaultdict
👍
package_links = defaultdict(list) | ||
for pkg in packages: | ||
package_links[pkg.version].append(pkg.relfn_unix) | ||
# links = [pkg.relfn_unix for pkg in packages] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a little cleanup of unused things 😊
# links = [pkg.relfn_unix for pkg in packages] |
for x in packages: | ||
releases[x.version] = [ | ||
{"url": urljoin(req_url, "../../packages/" + x.relfn)} | ||
for package in packages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for renaming this to package
, imho, I like it also a bit better this way 👍
matching_links = [] | ||
for version, links in package_links.items(): | ||
if version == package.version: | ||
matching_links += links |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's sure a nit, but what do you think about doing it this way? 😀
matching_links = [] | |
for version, links in package_links.items(): | |
if version == package.version: | |
matching_links += links | |
matching_links = [links for version, links in package_links.items() if version == package.version] |
) | ||
}, | ||
} | ||
for link in matching_links |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, just a wild thought got onto me here, do you think it would be possible to maintain packages
and package_links
as a single data structure? 🤔
thinking of something along those lines:
packages = sorted(
...
)
# ...
packages_with_links = [dict(package=pkg, version=pkg.version, link=pkg.relfn_unix) for pkg in packages]
# ... and then working with `packages_with_links` in place of `matching_links` and `packages` nestedly?
But it's just a rough fantasy at this point, it sure needs working out better. What do you think?
{ | ||
"url": urljoin(req_url, "../../packages/" + link), | ||
"digests": { | ||
"sha256": config.backend.digest_sha256( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here indeed, I thought it would be awesome to use the "configured" hash mechanism instead :) I think in relation to #459 upcoming in the next major release, I hope that this will make things work out of the box. What do you think? Would this be possible to experiment with?
def digest_sha256( | ||
self, pkg: PkgFile, file_name: str = None | ||
) -> t.Optional[str]: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking as you mention too, if we could reuse the configured hash and then rely on the changes with #459 in the future, this would probably be not necessary. Do you think I figure this out correctly? :)
Note: This was fairly slow until I added a
@functools.lru_cache(maxsize=1000)
to thedigest_file
, then it became super fast even for the initial load.Fixes #437
I see that the project has a configurable hashing algorithm, but its really the sha256 hash that is important to pipenv, and the md5 (project default) would not be sufficient.
Sample output from local testing of this prototype.