Many Disk Cache Instances Cause Inode Overflow After Consistent Use #220

SoundsSerious · 2021-08-20T06:39:46Z

Hi Grant,

I've been using disk cache on my AWS systems for probably a year, and i love it quite alot! I have many types of data i'm caching, thus many diskcaches (~100), and i'm using a high performance low capacity NVME disk to keep things fast.

Because of the low size of the disk the number of inodes is 3.2M. Today with all the folders created by diskcache that aren't cleaned I got an Error 28 - no space left on device.

Is there a recommended way to clean up a disk cache filesystem? It creates a ton of references that are never cleaned or managed. Could we ensure that all previous references are used before assigning new ones?

Deleting its contents takes quite a while and i can't spare that downtime. Is there a cleanup callback, behavior we could add when a file is removed from a folder, and that folder no longer has references? In general I would have better control and knowledge of what is being purged / cleaned up.

The text was updated successfully, but these errors were encountered:

SoundsSerious · 2021-08-20T06:48:37Z

Similar issue mentioned in

#138

grantjenks · 2021-08-20T15:40:51Z

I like the username, @SoundsSerious . And I agree now. If it’s bitten two people then let’s fix it.

Can you take a look at the Cache.check code? There’s a fix=True parameter that I think cleans up empty directories. It won’t work for you though if you’re worried about down time. The “fix” parameter will cause it to take an exclusive lock on the cache which might lock things up for an indeterminate amount of time.

So I think there are three ideas worth exploring here:

On cache delete, if the directory is empty, then delete it.
Reduce the fan-out of directories.
Create a command that’ll go and clean up empty directories asynchronously.

I’ll need to give each of these some thought. Your input is welcome.

grantjenks · 2021-08-20T15:44:34Z

As a temporary workaround, consider running “ find {CACHE_PATH} -type d -empty -delete”. That’ll clean up the empty dirs. There may be a race condition when a key is set though.

grantjenks · 2021-08-20T15:51:16Z

How big are your caches? Are they the default 1GB or bigger?

I’m thinking of using a single subdir with just 256 possibilities. How would that affect your setup?

grantjenks · 2021-08-20T16:01:56Z

Follow-up thought, maybe the directory structure should be a function of the total cache size and min file size. With the default settings, we need only 2 characters in the dir layout. But at a terabyte you may want the current 2x2.

I’m also unsure how to cleanup the dirs without causing a race condition. I think on cache set, there’d have to be retry logic for directory creation in case a different process deleted the directory in between steps.

SoundsSerious · 2021-08-22T20:19:25Z

Hey @grantjenks! Thank you for all the thoughtful feedback!

I realize this is definitely a challenging problem, since it seems like most of the disk cache work is done on access unless i'm mistaken. Especially with all the sub optimizations that probably are going on for a wide number of platforms / use cases.

My scenerio is like this, i have a web server with some complicated responses that need to be cached for a variety of sources (maybe 100 caches). None of them is all that substantial 1GB would probably be the max size.

Your temporary solution is a good one, didn't think of using find! I'll be using that to solve the issue for now so no rush on the other solutions. I'm also using cache.check occassionally so maybe i can test out the fix parameter, although for bigger caches that can cause some

Some thoughts on those:

Is probably the easiest, and seems like it would fit with the current style.
This one is good too for weird scenarios like mine where you have shitty hardware, or the opposite you have a huge cache on a gigantic drive.
I think one is probably the best, but it seems like it would be alot of work! Async seems like it would solve alot of issues with blocking and disk access in a worker routines, or for use with some kind of access client interface. For example multiple clients could access the same cache and the code could keep making progress at the same time. However maybe this is more a diskcache2 thing :)

Anyways just some thoughts, i think you could close this issue if you'd like since the cache.check fix will work, as well as a find command on a cron routine.

grantjenks · 2021-08-31T05:14:21Z

I created a new branch at origin/cleanup-dirs with a WIP of (1). The plan is to keep the two levels of directories but delete the lower level if it's empty. So you could end up with 256 empty dirs but that's not so bad as 65k.

SoundsSerious · 2021-08-31T21:53:46Z

I like that, its simple and efficient! Yea we can definitely handle 256 inodes!

In the mean time your find empty directories bash command is working great as a nightly cron command.

SoundsSerious · 2021-08-31T21:58:41Z

I'll see if i can give this a test this week. Thanks again!

SoundsSerious · 2021-09-04T05:16:58Z

@grantjenks

Hey finally found the time to test this. Pulled in the latest code and tried this stress test:

import diskcache as dc
import numpy

cache = dc.Cache('/tmp/dc_test_rm')
N = 255*255
for i in range(N):
    print(i)
    data = {'dict':'hey','value':i, 'data':numpy.array(range(i-N,i))}
    cache.add(i,data,expire=600)

SoundsSerious · 2021-09-04T05:18:40Z

I get this error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_1516/80055230.py in <module>
      7     print(i)
      8     data = {'dict':'hey','value':i, 'data':numpy.array(range(0,i))}
----> 9     cache.add(i,data,expire=600)

~/python-diskcache/diskcache/core.py in add(self, key, value, expire, read, tag, retry)
    996         db_key, raw = self._disk.put(key)
    997         expire_time = None if expire is None else now + expire
--> 998         size, mode, filename, db_value = self._disk.store(value, read, key=key)
    999         columns = (expire_time, tag, size, mode, filename, db_value)
   1000 

~/python-diskcache/diskcache/core.py in store(self, value, read, key)
    256                 filename, full_path = self.filename(key, value)
    257 
--> 258                 with open(full_path, 'xb') as writer:
    259                     writer.write(result)
    260 

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/dc_test_rm/11/2d/fbc930685edbd263d586c38f688c.val'```

SoundsSerious · 2021-09-04T05:21:41Z

This seems like it would be a worst case scenerio with rapid sequential access, but admittedly my knowledge of the system could be better.

Maybe this could be solved with a custom wrapper ensure_and_open(full_path, 'xb' ), where first the file is checked, and has a catch for FileNotFoundError that will create the file then recursively calls it.

grantjenks · 2021-09-07T03:13:46Z

My apologies @SoundsSerious . The origin/cleanup-dirs is a work-in-progress (WIP) and does not work yet. I may make some more progress tonight.

grantjenks · 2021-09-07T05:49:01Z

I've updated the origin/cleanup-dirs with working changes. There are tests for the new functionality and both levels of directories are cleaned up.

grantjenks · 2021-09-14T05:05:50Z

Fixed by #222 . To be released this month

SoundsSerious · 2021-09-14T05:53:44Z

@grantjenks Hey Nice work! Hopefully I can give this a test run in the coming week

grantjenks closed this as completed Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many Disk Cache Instances Cause Inode Overflow After Consistent Use #220

Many Disk Cache Instances Cause Inode Overflow After Consistent Use #220

SoundsSerious commented Aug 20, 2021 •

edited

SoundsSerious commented Aug 20, 2021

grantjenks commented Aug 20, 2021

grantjenks commented Aug 20, 2021

grantjenks commented Aug 20, 2021

grantjenks commented Aug 20, 2021

SoundsSerious commented Aug 22, 2021 •

edited

grantjenks commented Aug 31, 2021 •

edited

SoundsSerious commented Aug 31, 2021

SoundsSerious commented Aug 31, 2021

SoundsSerious commented Sep 4, 2021 •

edited

SoundsSerious commented Sep 4, 2021

SoundsSerious commented Sep 4, 2021

grantjenks commented Sep 7, 2021

grantjenks commented Sep 7, 2021

grantjenks commented Sep 14, 2021

SoundsSerious commented Sep 14, 2021

Many Disk Cache Instances Cause Inode Overflow After Consistent Use #220

Many Disk Cache Instances Cause Inode Overflow After Consistent Use #220

Comments

SoundsSerious commented Aug 20, 2021 • edited

SoundsSerious commented Aug 20, 2021

grantjenks commented Aug 20, 2021

grantjenks commented Aug 20, 2021

grantjenks commented Aug 20, 2021

grantjenks commented Aug 20, 2021

SoundsSerious commented Aug 22, 2021 • edited

grantjenks commented Aug 31, 2021 • edited

SoundsSerious commented Aug 31, 2021

SoundsSerious commented Aug 31, 2021

SoundsSerious commented Sep 4, 2021 • edited

SoundsSerious commented Sep 4, 2021

SoundsSerious commented Sep 4, 2021

grantjenks commented Sep 7, 2021

grantjenks commented Sep 7, 2021

grantjenks commented Sep 14, 2021

SoundsSerious commented Sep 14, 2021

SoundsSerious commented Aug 20, 2021 •

edited

SoundsSerious commented Aug 22, 2021 •

edited

grantjenks commented Aug 31, 2021 •

edited

SoundsSerious commented Sep 4, 2021 •

edited