Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In pypi, it is impossible to reupload a removed file. #74

Open
Natim opened this issue Sep 4, 2015 · 93 comments
Open

In pypi, it is impossible to reupload a removed file. #74

Natim opened this issue Sep 4, 2015 · 93 comments

Comments

@Natim
Copy link

@Natim Natim commented Sep 4, 2015

HTTPError: 400 Client Error: This filename has previously been used, you should use a different version.
@Natim
Copy link
Author

@Natim Natim commented Sep 4, 2015

Also the previous version has been removed and is impossible to find.

@daenney
Copy link

@daenney daenney commented Sep 4, 2015

It's probably still available in the Fastly caches, which is why you need to use a new filename. The old filename will have been marked as to cache indefinitely so even if you could upload a filename with the same name, if they had already fetched the old version they would never get the new one.

@Natim
Copy link
Author

@Natim Natim commented Sep 4, 2015

In my case it isn't a problem because it is the exact same file.

@hickford
Copy link
Contributor

@hickford hickford commented Sep 4, 2015

See Donald's email at http://comments.gmane.org/gmane.comp.python.distutils.devel/22739

I've pushed changes to PyPI where it is no longer possible to reuse a filename and attempting to do it will give an 400 error "This filename has previously been used, you should use a different version."

@hickford
Copy link
Contributor

@hickford hickford commented Sep 4, 2015

Npm did the same in 2014. See http://blog.npmjs.org/post/77758351673/no-more-npm-publish-f

While it is annoying to have to bump the version number for typos documentation changes, I believe in the long run, the benefits of greater reliability and data integrity are well worth it.

I presume the justification is the same for PyPI. It's an FAQ, so should probably go in documentation somewhere.

@Natim
Copy link
Author

@Natim Natim commented Sep 4, 2015

Then we shouldn't allow people to remove their files if they cannot put them back.

@Natim
Copy link
Author

@Natim Natim commented Sep 4, 2015

I think we should allow to reupload the same removed file

@tylerdave
Copy link

@tylerdave tylerdave commented Sep 4, 2015

There are very good reasons for the current behavior. Authors should be able to delete for any number of reasons (legal, security, etc.) Users of the package should be able to rely on getting the exact same thing every time they install a package of a specific version.

If you delete a package that someone relies on, they know the version is gone and they need to make a change to fix it. If you could delete a package and replace it with something different but with the same version, it can break their program is any number of subtle ways and it would be very hard to determine the cause of the problem.

Allowing this would break the entire version number contract. You may have what seems to be a good reason to replace a version but allowing it is not worth making versions unreliable.

@hickford
Copy link
Contributor

@hickford hickford commented Sep 4, 2015

Absolutely.

@Natim
Copy link
Author

@Natim Natim commented Sep 4, 2015

If you delete a package that someone relies on

You broke their package and you cannot put it back.

@Natim
Copy link
Author

@Natim Natim commented Sep 4, 2015

If you could delete a package and replace it with something different but with the same version, it can break their program

That's not what I am asking for.

I am asking for putting back the package I removed.

@Natim
Copy link
Author

@Natim Natim commented Sep 4, 2015

Allowing this would break the entire version number contract.

Allowing to put back the version you removed doesn't break any contracts. + You already have the previous package hash so you can check the version didn't change and that you are really re-uploading the file that you removed.

@tylerdave
Copy link

@tylerdave tylerdave commented Sep 4, 2015

There I agree. If it can be ensured via the hash that only the exact same package is uploaded to the same version then I don't see this being a problem in concept.

@hickford
Copy link
Contributor

@hickford hickford commented Sep 4, 2015

So long as the documentation and confirmation makes it clear that unpublishing is permanent, then I think it's reasonable and prudent.

It is generally considered bad behavior to remove versions of a library that others are depending on! Even if a package version is unpublished, that specific name and version combination can never be reused. In order to publish the package again, a new version number must be used.

https://docs.npmjs.com/cli/unpublish

@hickford
Copy link
Contributor

@hickford hickford commented Sep 4, 2015

To prevent malicious abuse, perhaps the policy should be strengthened to 'no uploads to old versions' #75

@dstufft
Copy link
Member

@dstufft dstufft commented Sep 4, 2015

Unless you have the physical file laying around still, it's unlikely you're going to have something that matches the same hash. The setup.py sdist command does not have deterministic output, each time you run it even if the code hasn't changed. This also means you can't use setup.py to upload the file, since setup.py will only let you upload a file that it has created in the currently executing command, not an already created file. That doesn't make it impossible to upload a file with the same hash, but it makes it tricky which suggests it's a bad UX to expect authors to have to navigate.

Most likely the eventual solution to this is that "delete" won't actually be a full out absolute deletion, it'll be more like a soft delete where it just acts as if it's deleted without actually deleting it (so it won't show up in the API, won't appear anywhere, etc) but there will be a list of these deleted things when the author logs in and a button that says "Restore" that allows them to restore a file they've previously deleted. Possibly this would have a periodic cleanup where if something was soft deleted for some period of time (a month? 6 months? a year?) we'll go through and clean it up and actually hard delete it then. Perhaps we'd also enable it for authors to trigger an immediate hard delete of something they've soft deleted, but there would be plenty of big warnings that if they press that button there is no recovery possible.

@Natim
Copy link
Author

@Natim Natim commented Sep 4, 2015

That doesn't make it impossible to upload a file with the same hash, but it makes it tricky which suggests it's a bad UX to expect authors to have to navigate.

With twine it is as simple as:

twine upload cliquet-2.5.0-py2.py3-none.whl
@dstufft
Copy link
Member

@dstufft dstufft commented Sep 4, 2015

Right, I wrote twine, but not everyone uses that so you have to explain to them that they have to use twine to be able to reupload not setup.py upload. In addition you have to explain to them they need the exact same file, not one created the same way. It's fiddly and people will get confused.

@Natim
Copy link
Author

@Natim Natim commented Sep 4, 2015

People are not dumb, if they need to do something complicated they will eventually succeed. The fact is even if they know all the things, they won't be able to do it.

But yeah #75 is a workaround for now, (using .zip instead of .tar.gz for instance)

@domibarton
Copy link

@domibarton domibarton commented Dec 27, 2015

As I already wrote it in #75

I think that behaviour is quite OK for the live repo.

Though, to be honest, it's a huge PITA for the test repo. I support integrity and all that stuff on "production" systems. However, developers need to have their code / packages checked somewhere and it's a PITA if you can't upload them same version twice while testing a new release.

There's no other way than the test repo to test your package. With git (or any other SCM) you can easily create a new branch and test it until you're sure everything works. Or if you've a look at PHP Packagist (compose) there's a -dev version for each development branch. On Docker the same, you can test your feature/release branches before tagging and going "live".

With the new policy you basically say: You've ONE SINGLE TRY and that one SHOULD WORK. No chance for a 2nd try. IMHO this isn't the purpose of a testing system and breaks the whole "we've a testing repo" idea. To be honest, I think this only leads to annoyed developers and a lot of "crippled versions" because developers couldn't properly test their versions before going live.

tl;dr:
I suppose you do that on the live system but not on the test system.

@brianmay
Copy link

@brianmay brianmay commented Mar 6, 2016

In my case, I forgot to sign the upload. It appears once you have uploaded the package it is impossible to fix any problems you made with the upload without making a new release. Even if you just want to upload the exact same version again.

@daenney
Copy link

@daenney daenney commented Mar 6, 2016

But how do you know it is "the exact same version"? Unless it checks the uploads are binary identical it would allow you to upload a totally different release with the same version which can cause any amount of problems.

@torarnv
Copy link

@torarnv torarnv commented Mar 14, 2016

Just his what @domibarton is describing. What's the point of a test repo if you can't make mistakes?

@Natim
Copy link
Author

@Natim Natim commented Mar 14, 2016

Just his what @domibarton is describing. What's the point of a test repo if you can't make mistakes?

Why cannot you do package x.y.z.dev0 and then package x.y.z.dev1?

@torarnv
Copy link

@torarnv torarnv commented Mar 14, 2016

I could, and then having to remember to wipe those temp changes from my working tree before pushing to the live pypi repo.

@snare
Copy link

@snare snare commented Apr 24, 2016

I uploaded a new version of Voltron yesterday and the server threw a 500 error during the upload. This resulted in a partial file being hosted as the current wheel for this package. The file size was smaller than my local one, and the hash differed.

This operation needs to be atomic. If the upload fails, you have no opportunity to try again. The only option is to use a different version number, which is not an appropriate solution.

IMO it should be a requirement that the hash of the upload is verified by the author before it is marked as "published".

@Natim
Copy link
Author

@Natim Natim commented Apr 25, 2016

Yes I have the same problem with my last uploaded packages.

@alexlaurence
Copy link

@alexlaurence alexlaurence commented May 16, 2019

You have to update your Github repo, then you can reupload. Just make a commit, and PyPi will allow the upload.

scls19fr referenced this issue in python-windrose/windrose Jun 7, 2019
@OneAdder
Copy link

@OneAdder OneAdder commented Jun 16, 2019

I think it still looks pretty weird. I just had a problem with this: I accidentally built my package the wrong way and uploaded it. It was not a problem within the package: I just forgot to delete the old build.

Now I had to change the version name for PyPI and explicitly say in the description that versions 0.8.5 and 0.8.5.post0 are the same version

@pradyunsg
Copy link
Member

@pradyunsg pradyunsg commented Jun 16, 2019

You don't have to say that - post releases are meant to be the same as the existing release and only differ in metadata.

njdister added a commit to njdister/njdister-github3.py that referenced this issue Jun 24, 2019
@MartinThoma
Copy link

@MartinThoma MartinThoma commented Aug 19, 2019

One important thing seems not to be mentioned so far: It is better for security and reliability to NOT allow changing the code of an uploaded version. As a user of a 3rd party library, I can once check it and then pin the version. My program is guaranteed to work the same way. Even if the account of the maintainers get hacked, the worst thing that can happen is that the package is deleted. So it either works the same way or not at all. But at least the attacker cannot bring malicious code in another code base.

@Natim
Copy link
Author

@Natim Natim commented Aug 20, 2019

Note that if you remove the project altogether and create it again, you can reupload code from previous version.

@1313e
Copy link

@1313e 1313e commented Aug 20, 2019

One important thing seems not to be mentioned so far: It is better for security and reliability to NOT allow changing the code of an uploaded version. As a user of a 3rd party library, I can once check it and then pin the version. My program is guaranteed to work the same way. Even if the account of the maintainers get hacked, the worst thing that can happen is that the package is deleted. So it either works the same way or not at all. But at least the attacker cannot bring malicious code in another code base.

Good point.
In that case, you could be allowed to delete and reupload a release if that happens within (let's say) a week after first release.
That way, we can replace releases when we made a mistake, while the problem you mentioned will not be an issue.

@brianmay
Copy link

@brianmay brianmay commented Aug 20, 2019

Note that this has limitations. e.g. AFAIK I could upload a source file, and sometime later a malicious actor could upload a wheel file for the same version that has malicious content (i.e. it doesn't have to match the source).

@im-n1
Copy link

@im-n1 im-n1 commented Aug 20, 2019

IMHO pypi.org should be immutable. test.pypi.org should be mutable because it's test environment.

@joelmiller
Copy link

@joelmiller joelmiller commented Aug 23, 2019

I don't think many of the commenters have been suggesting that pypi itself should allow resubmissions with the same name, and I agree it should be immutable. (but the error messages shouldn't say "deleting can't be undone" if in fact "deleted files can never be replaced"). There's a lot of problems there - different mirrors might have different versions, malicious actors, etc.

I'm okay with different mirrors of a test environment having different versions, and a malicious actor can't really get far by infecting something that's explicitly a test.

@ikamensh
Copy link

@ikamensh ikamensh commented Jan 29, 2020

It is essential to disallow delete and upload of existing file. Without this you can't guarantee anything; malicious actor could inject code to your project and you wouldn't be able to do anything about it.

@brianmay
Copy link

@brianmay brianmay commented Jan 30, 2020

@ikamensh It isn't quite that simple. I believe there is nothing stopping an upload of the same version of the package in another format, e.g. if you only upload a source file, a malicious wheel file can be uploaded at a later date, which the clients will use in preference. So I am not entirely convinced this is a security gain. At least it does stop replacing already uploaded files however.

Strategies such as pipenv which store the checksum of the file are probably are the better approach to ensure the package doesn't change unexpectedly.

@brainwane
Copy link
Member

@brainwane brainwane commented Feb 4, 2020

@mnm678
Copy link

@mnm678 mnm678 commented Feb 4, 2020

I agree with the general sentiment of this thread: once a package is uploaded to PyPI, it should not change. If an attacker was able to compromise a project, they should not be able to replace an existing package with arbitrary code. Not only that, but even a well intentioned change to an existing package version could break a client's code. If a client pins a version of a package, they should get the same code every time that package is downloaded.

Earlier in the thread, @dstufft suggested a soft-delete function which would allow developers to undo the deletion of a package without reuploading the file. This seems like a good compromise between the security risks of new uploads and the usability problem of accidental deletion. In this case an attacker could undo the deletion of a version, but would not be able to add malicious code to that version (although they could still add a new, bad version unless something like PEP 480 is used to verify the developer's identity).

However, the discussion about PyPI-test is a different story. The security risks in a test repository are less important as any packages are not used in production, so I don't see a problem with allowing test versions of packages to be replaced.

@uranusjr
Copy link
Member

@uranusjr uranusjr commented Feb 4, 2020

FYI the soft-deletion feature has been formalised as PEP 592 “yank.” It is not yet implemented for PyPI (pypa/warehouse#5837).

@brainwane
Copy link
Member

@brainwane brainwane commented Apr 20, 2020

Relatedly: an implementation proposal for a "draft releases" feature on PyPI pypa/warehouse#726 is now being discussed at
https://discuss.python.org/t/feature-proposal-for-pypi-draft-releases/3903 . Check it out to see whether you have comments to share there. @alanbato would like comments on the proposal by 30 April (10 days from now) so he can go ahead and start implementation work.

(More context on distutils-sig.)

@brainwane
Copy link
Member

@brainwane brainwane commented Apr 23, 2020

Now that PEP 592 is accepted and implemented pypa/warehouse#5837, I hope people in this thread will consider looking at the yanking feature and see which of your use cases it suits.

@TylerGubala
Copy link

@TylerGubala TylerGubala commented May 2, 2020

With the new policy you basically say: You've ONE SINGLE TRY and that one SHOULD WORK. No chance for a 2nd try. IMHO this isn't the purpose of a testing system and breaks the whole "we've a testing repo" idea. To be honest, I think this only leads to annoyed developers and a lot of "crippled versions" because developers couldn't properly test their versions before going live.

Literally this. In my repo, Blenderpy I have the additional headache that I often don't know exactly everything to test in advance, as there can be a lot of edge cases in the precompiled C/C++ code that may be remedied by configuring with different cmake arguments.

Why cannot you do package x.y.z.dev0 and then package x.y.z.dev1?

For example, in my case people performing pip install bpy==2.81 want the api version 2.81.

So now I effectively have my own version squatting on me, wherein people factually cannot install the proper version because I made an honest mistake.

So I think there are a lot of important considerations here. There is the "version contract", but then there are case by case times where I think that does not apply. I should think that if I accidentally upload some bad version of my own repo, then go back and try to remedy the issue, that does not critically break the version contract.

I'm sorry but I do not understand the concept of shackling yourself to a "version contract" that basically screws yourself and your users in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.