Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype supporting the same package hash json API response that pypi provides #440

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

matteius
Copy link

@matteius matteius commented Aug 23, 2022

Note: This was fairly slow until I added a @functools.lru_cache(maxsize=1000) to the digest_file, then it became super fast even for the initial load.

Fixes #437

I see that the project has a configurable hashing algorithm, but its really the sha256 hash that is important to pipenv, and the md5 (project default) would not be sufficient.

Sample output from local testing of this prototype.

{
	"info": {
		"version": "2.19.1"
	},
	"releases": {
		"2.19.1": [{
			"url": "http://127.0.0.1:8080/packages/requests/requests-2.19.1-py2.py3-none-any.whl",
			"digests": {
				"sha256": "sha256:63b52e3c866428a224f97cab011de738c36aec0185aa91cfacd418b5d58911d1"
			}
		}, {
			"url": "http://127.0.0.1:8080/packages/requests/requests-2.19.1.tar.gz",
			"digests": {
				"sha256": "sha256:ec22d826a36ed72a7358ff3fe56cbd4ba69dd7a6718ffd450ff0e9df7a47ce6a"
			}
		}],
		"2.18.4": [{
			"url": "http://127.0.0.1:8080/packages/requests/requests-2.18.4-py2.py3-none-any.whl",
			"digests": {
				"sha256": "sha256:6a1b267aa90cac58ac3a765d067950e7dbbf75b1da07e895d1f594193a40a38b"
			}
		}, {
			"url": "http://127.0.0.1:8080/packages/requests/requests-2.18.4.tar.gz",
			"digests": {
				"sha256": "sha256:9c443e7324ba5b85070c4a818ade28bfabedf16ea10206da1132edaa6dda237e"
			}
		}],
		"2.14.0": [{
			"url": "http://127.0.0.1:8080/packages/requests/requests-2.14.0-py2.py3-none-any.whl",
			"digests": {
				"sha256": "sha256:a90555c0be723f5c711de36f256b21a65fc599602274fb3d5c4f83ac23aae3c5"
			}
		}, {
			"url": "http://127.0.0.1:8080/packages/requests/requests-2.14.0.tar.gz",
			"digests": {
				"sha256": "sha256:8c4f778459cb4a6bad7ceff4aa65a75697db28c21a6b41ea9a6c371df2a822c2"
			}
		}],
		"1.0.0": [{
			"url": "http://127.0.0.1:8080/packages/requests/requests-1.0.0.tar.gz",
			"digests": {
				"sha256": "sha256:f10d8fbcc02a58056ab44f79ff9b3f9fe78e410296527885250bbb36d15be8c6"
			}
		}]
	}
}

@dee-me-tree-or-love
Copy link
Member

Hey @matteius! Thanks a lot for this PR, this is really awesome! I will try to look closer into this asap, but first will get the CI issues fixed. I see the test-docker failed steps are caused by some external issue, so I'd like to get this out of the way. Besides, I'll try to give a more thorough review soon, really sorry for all the waiting and thanks for your time :)

@dee-me-tree-or-love
Copy link
Member

Hey @matteius! I've fixed the CI issue #439 in #444. It should now work well if you update your branch from master :) Thanks for your patience, I will get to the reviewing soon! ✌️

@matteius
Copy link
Author

matteius commented Sep 7, 2022

Thanks @dee-me-tree-or-love -- I have updated the branch from master. Willing to discuss or amend any parts of this PR as it makes sense. We merged in the changes to pipenv so pypiserver is our test runner's private pypiserver, so you'll probably be hearing more from me. Thanks for your help!

@dee-me-tree-or-love
Copy link
Member

Hey @matteius, thanks for your news, this is super nice to hear 😊 I'm very happy that pypiserver suits your needs! And I'm certainly looking forward to more collaborations. All ideas and input are very welcome :D 👍

I've been a bit busy last week with some other chores, but I will try to give a proper review to your PR here this week, thank you for being on the line! I'll keep you posted.

@matteius
Copy link
Author

We may be able to have this behavior driven off the preferred hash that is passed in on the CLI -- I think when I first took a stab at this I had no experience with the code base, so I didn't quite pickup on the fact that is configurable.

Copy link
Member

@dee-me-tree-or-love dee-me-tree-or-love left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @matteius! I'm terribly sorry for taking such a while for the review, but I got my hands on it at last! It's looking great and I agree with your comments here about the configurable hash function. I really like how the current prototype is written, so I just left some tiny comments with some cleanups and suggestions. Most importantly, do you think it feasible to adjust it so that it will use the configured hash and then we can wait for #459 so that no changes on the client side are necessary for all the new server boots? :) In any case, one last tiny request, if you could write a couple of tests for this new feature, I'd be super grateful 😊 🙏 Let me know what you think and sorry for taking ages once again! :D Thanks for the proposal, if I can be of any help further with this, do let me know ✌️

@@ -373,6 +373,8 @@ def server_static(filename):


@app.route("/:project/json")
@app.route("/pypi/:project/json")
@app.route("/simple/:project/json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matteius, oh, right! could you elaborate a little on why these routes were added? :) I'm not sure I recognize it fully, in case this is not strictly critical for this feature, do you mind if we add the routes in a separate PR? 🤔

@@ -389,12 +391,29 @@ def json_info(project):
if not packages:
raise HTTPError(404, f"package {project} not found")

package_links = defaultdict(list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool! I really like the use of defaultdict 👍

package_links = defaultdict(list)
for pkg in packages:
package_links[pkg.version].append(pkg.relfn_unix)
# links = [pkg.relfn_unix for pkg in packages]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a little cleanup of unused things 😊

Suggested change
# links = [pkg.relfn_unix for pkg in packages]

for x in packages:
releases[x.version] = [
{"url": urljoin(req_url, "../../packages/" + x.relfn)}
for package in packages:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for renaming this to package, imho, I like it also a bit better this way 👍

Comment on lines +403 to +406
matching_links = []
for version, links in package_links.items():
if version == package.version:
matching_links += links
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sure a nit, but what do you think about doing it this way? 😀

Suggested change
matching_links = []
for version, links in package_links.items():
if version == package.version:
matching_links += links
matching_links = [links for version, links in package_links.items() if version == package.version]

)
},
}
for link in matching_links
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, just a wild thought got onto me here, do you think it would be possible to maintain packages and package_links as a single data structure? 🤔

thinking of something along those lines:

    packages = sorted(
        ...
    )

    # ...

    packages_with_links = [dict(package=pkg, version=pkg.version, link=pkg.relfn_unix) for pkg in packages]

    # ... and then working with `packages_with_links` in place of `matching_links` and `packages` nestedly?

But it's just a rough fantasy at this point, it sure needs working out better. What do you think?

{
"url": urljoin(req_url, "../../packages/" + link),
"digests": {
"sha256": config.backend.digest_sha256(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here indeed, I thought it would be awesome to use the "configured" hash mechanism instead :) I think in relation to #459 upcoming in the next major release, I hope that this will make things work out of the box. What do you think? Would this be possible to experiment with?

def digest_sha256(
self, pkg: PkgFile, file_name: str = None
) -> t.Optional[str]:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking as you mention too, if we could reuse the configured hash and then rely on the changes with #459 in the future, this would probably be not necessary. Do you think I figure this out correctly? :)

@dee-me-tree-or-love dee-me-tree-or-love added the features.endpoints Focusing on the endpoints / server functionality label Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
features.endpoints Focusing on the endpoints / server functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support pypi API for fetching package hashes/digests
2 participants