Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download cache is not concurrency safe #1141

Closed
rbtcollins opened this issue Aug 14, 2013 · 1 comment
Closed

Download cache is not concurrency safe #1141

rbtcollins opened this issue Aug 14, 2013 · 1 comment
Labels
auto-locked Outdated issues that have been locked by automation type: bug A confirmed bug or unintended behavior

Comments

@rbtcollins
Copy link

The PIP download cache has files added to it by cache_download. However this is the body of that method:
def cache_download(target_file, temp_location, content_type):
logger.notify('Storing download in cache at %s' % display_path(target_file))
shutil.copyfile(temp_location, target_file)
fp = open(target_file+'.content-type', 'w')
fp.write(content_type)
fp.close()
os.unlink(temp_location)

There are two racey operations here:
target_file might be partially written if something interrupts the copy - e.g. the machine is powered off, or python killed etc. That is somewhat tolerable, since the reading code looks for both the file name and the content-type file.

the content-type file is also written unsafely, without the mitigating aspect of the target_file race.

So you can have other processes observe:

  • a full target file
  • an empty content type file
    because moving multiple files isn't atomic, and this creates the third race -

which will result in a pip process trying to reuse that file reading an invalid content type in unpack_http_url and passing that to unpack_file.

There are various ways to solve this:

  • locks
  • use directories [remember though that you can't have more than HARD_LINK_LIMIT direct children of a directory on ext* file systems, so need to nest the directories]
  • embed the content type in the file name
  • make the reader more resilient
@qwcode
Copy link
Contributor

qwcode commented Aug 14, 2013

One pip process doesn't involve any concurrent use of the download cache. This is about wanting to share a download cache across multiple/concurrent pip processes?

derekhiggins pushed a commit to derekhiggins/tripleo-image-elements that referenced this issue Apr 17, 2014
The element bind mounts a pip cache inside the image build chroot so
that pip downloads can be reused across image builds.  While similar
in purpose to the PyPi element that sets up a mirror, this element
just allows for a reusable download cache and doesn't require anything
to be setup beforehand.

The pip-cache element is not concurrency safe, and that is indicated
in the README for the element.  An upstream bug was file as well:
pypa/pip#1141

Change-Id: Ibd1d4ea17c24923ed939357ada95b781e3179cfd
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 5, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants