Download cache is not concurrency safe #1141

rbtcollins · 2013-08-14T22:07:47Z

The PIP download cache has files added to it by cache_download. However this is the body of that method:
def cache_download(target_file, temp_location, content_type):
logger.notify('Storing download in cache at %s' % display_path(target_file))
shutil.copyfile(temp_location, target_file)
fp = open(target_file+'.content-type', 'w')
fp.write(content_type)
fp.close()
os.unlink(temp_location)

There are two racey operations here:
target_file might be partially written if something interrupts the copy - e.g. the machine is powered off, or python killed etc. That is somewhat tolerable, since the reading code looks for both the file name and the content-type file.

the content-type file is also written unsafely, without the mitigating aspect of the target_file race.

So you can have other processes observe:

a full target file
an empty content type file
because moving multiple files isn't atomic, and this creates the third race -

which will result in a pip process trying to reuse that file reading an invalid content type in unpack_http_url and passing that to unpack_file.

There are various ways to solve this:

locks
use directories [remember though that you can't have more than HARD_LINK_LIMIT direct children of a directory on ext* file systems, so need to nest the directories]
embed the content type in the file name
make the reader more resilient

qwcode · 2013-08-14T23:31:14Z

One pip process doesn't involve any concurrent use of the download cache. This is about wanting to share a download cache across multiple/concurrent pip processes?

The element bind mounts a pip cache inside the image build chroot so that pip downloads can be reused across image builds. While similar in purpose to the PyPi element that sets up a mirror, this element just allows for a reusable download cache and doesn't require anything to be setup beforehand. The pip-cache element is not concurrency safe, and that is indicated in the README for the element. An upstream bug was file as well: pypa/pip#1141 Change-Id: Ibd1d4ea17c24923ed939357ada95b781e3179cfd

dstufft mentioned this issue Apr 24, 2014

Use CacheControl instead of custom cache code #1748

Merged

11 tasks

dstufft closed this as completed in #1748 May 9, 2014

lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 5, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download cache is not concurrency safe #1141

Download cache is not concurrency safe #1141

rbtcollins commented Aug 14, 2013

qwcode commented Aug 14, 2013

Download cache is not concurrency safe #1141

Download cache is not concurrency safe #1141

Comments

rbtcollins commented Aug 14, 2013

qwcode commented Aug 14, 2013