-
Notifications
You must be signed in to change notification settings - Fork 938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate the contents of identity centric metadata #8635
Comments
I recall issues like this coming up at least once, if not a few times, over in pypi-support. Someone would fork a repository, change the name in So I'm 👍 for some sort of blue verified checkmark or something from that perspective. With my publisher hat on, though, I would hope this would be completely automated and I wouldn't have to do anything special to earn that blue checkmark. |
One idea: we could add a blue checkmark for all links in the sidebar that contain a link back to the project's pypi page or That being said, it wouldn't help if they point to forked versions, but in that case, the github star count might be a tell. |
👍 Any progress on this issue? I've been looking at malware from PyPI and it is common for the Some related context is this HN discussion: https://news.ycombinator.com/item?id=33438678 Many commenters are asking about providing this sort of information. I see some considerations that need discussion:
Some validation is easier than others as well - e.g. email validation is pretty straightforward, but homepage validation would require something like the ACME protocol. |
Haha, rereading my 2-year-old comment above about a blue check marks seems to resonate strangely in today's terms 😅 Who would have guessed... |
My general thoughts here is that for metadata that we can 'verify', we should probably elevate that metadata in the UI over 'unverified' metadata. We can already validate email addresses that correspond to verified emails of maintainers. That won't include the ability to verify mailinglist-style emails, but that could potentially be added to organizations once that feature lands. With #12465, we'll be able to 'validate' the source repository as well, so any metadata that either references the given upstream source repository can be considered verified as well. I agree that domains/urls will need to use the ACME protocol or something similar. I think there's probably a UX question on how these would be done per-project, if we wanted to go that route. |
Mastodon has a link verification system, that might be nice. That's never going to be foolproof though. |
From attempting to perform identity-assurance checks on packages manually: bidirectional references can be a reassuring indicator. In context here: when a PyPi package points to a GitHub repository as its source code, then that's interpretable as a useful but as-yet-untrusted statement. When up-to-date references are inspected within the contents of the cloned linked repository and they point back to the same original package on PyPi, then confidence in the statement increases. For reproducible-build-compliant packages the situation improves further: any third party can confirm not only that the source origin and package destination are in concordance, but also whether the published artifact from the destination is bit-for-bit genuine by comparing it a build-from scratch of the corresponding raw origin source materials. This can be verified on both a historic and ongoing basis. So that's two orthogonal identity validation mechanisms:
These don't prevent an attacker copying the source in entirety and creating a duplicate under a different name with an internally-consistent reference graph. Given widespread free communication I think it's reasonable to expect that enough of the package consumer population will be (or become) aware of and gravitate towards the authentic package to solve that problem. |
Following on to my previous comment, here's a mockup of what I'm imagining to separate the metadata we can verify today (source repository, maintainer email, GitHub statistics, Owner/Maintainers) from the unverifiable metadata:
Over time we can move things from below the fold to above it, but this should be a big improvement as-is for now. I pushed the diff for the mockup here, there's some hacky stuff in there just to get the mockup to look good, but it could be a good starting point. |
I'm starting working on this for creating the verified session and adding "Owner"/"Maintainers" on that :) |
I wonder if it makes more sense to have verified details and then unverified details or to have each category with a verfified sub-section and a non-verified sub-section. It feels weird to break the project links apart from one another. When your eyes have reached the place where the repository is, it's not very clear that if the documentation isn't there, you have to look a some place else entirely to find a different link section that might contain the link to the docs. I'd even argue that in this case, the whole thing would look more readable if the project doesn't use trusted publishers, which is What about something like this ? (not arguing it's better, just a suggestion for the discussions) (Would @nlhkabu have an opinion on the matter ?) |
Currently if I'm looking at a project on PyPI, it can be difficult to determine if it's "real" or not. I can look and see the user names that are publishing the project as well as certain key pieces of metadata such as the project home page, the source repository, etc.
Unfortunately, there's no way to verify that a project that has say..
https://github.com/pypa/pip
in it's home page, is actually the real pip, and isn't a fake imposter pip. The same could go for other URLs, or email addresses etc. Thus it would be useful if there was some way to actually prove ownership of those URLs/emails, and either differentiate them in the UI somehow, or hide them completely unless they've been proven to be owned by one of the publishing users.The text was updated successfully, but these errors were encountered: