Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many Disk Cache Instances Cause Inode Overflow After Consistent Use #220

Closed
SoundsSerious opened this issue Aug 20, 2021 · 16 comments
Closed

Comments

@SoundsSerious
Copy link

SoundsSerious commented Aug 20, 2021

Hi Grant,

I've been using disk cache on my AWS systems for probably a year, and i love it quite alot! I have many types of data i'm caching, thus many diskcaches (~100), and i'm using a high performance low capacity NVME disk to keep things fast.

Because of the low size of the disk the number of inodes is 3.2M. Today with all the folders created by diskcache that aren't cleaned I got an Error 28 - no space left on device.

Is there a recommended way to clean up a disk cache filesystem? It creates a ton of references that are never cleaned or managed. Could we ensure that all previous references are used before assigning new ones?

Deleting its contents takes quite a while and i can't spare that downtime. Is there a cleanup callback, behavior we could add when a file is removed from a folder, and that folder no longer has references? In general I would have better control and knowledge of what is being purged / cleaned up.

@SoundsSerious
Copy link
Author

Similar issue mentioned in

#138

@grantjenks
Copy link
Owner

I like the username, @SoundsSerious . And I agree now. If it’s bitten two people then let’s fix it.

Can you take a look at the Cache.check code? There’s a fix=True parameter that I think cleans up empty directories. It won’t work for you though if you’re worried about down time. The “fix” parameter will cause it to take an exclusive lock on the cache which might lock things up for an indeterminate amount of time.

So I think there are three ideas worth exploring here:

  1. On cache delete, if the directory is empty, then delete it.
  2. Reduce the fan-out of directories.
  3. Create a command that’ll go and clean up empty directories asynchronously.

I’ll need to give each of these some thought. Your input is welcome.

@grantjenks
Copy link
Owner

As a temporary workaround, consider running “ find {CACHE_PATH} -type d -empty -delete”. That’ll clean up the empty dirs. There may be a race condition when a key is set though.

@grantjenks
Copy link
Owner

How big are your caches? Are they the default 1GB or bigger?

I’m thinking of using a single subdir with just 256 possibilities. How would that affect your setup?

@grantjenks
Copy link
Owner

Follow-up thought, maybe the directory structure should be a function of the total cache size and min file size. With the default settings, we need only 2 characters in the dir layout. But at a terabyte you may want the current 2x2.

I’m also unsure how to cleanup the dirs without causing a race condition. I think on cache set, there’d have to be retry logic for directory creation in case a different process deleted the directory in between steps.

@SoundsSerious
Copy link
Author

SoundsSerious commented Aug 22, 2021

Hey @grantjenks! Thank you for all the thoughtful feedback!

I realize this is definitely a challenging problem, since it seems like most of the disk cache work is done on access unless i'm mistaken. Especially with all the sub optimizations that probably are going on for a wide number of platforms / use cases.

My scenerio is like this, i have a web server with some complicated responses that need to be cached for a variety of sources (maybe 100 caches). None of them is all that substantial 1GB would probably be the max size.

Your temporary solution is a good one, didn't think of using find! I'll be using that to solve the issue for now so no rush on the other solutions. I'm also using cache.check occassionally so maybe i can test out the fix parameter, although for bigger caches that can cause some

Some thoughts on those:

  1. Is probably the easiest, and seems like it would fit with the current style.
  2. This one is good too for weird scenarios like mine where you have shitty hardware, or the opposite you have a huge cache on a gigantic drive.
  3. I think one is probably the best, but it seems like it would be alot of work! Async seems like it would solve alot of issues with blocking and disk access in a worker routines, or for use with some kind of access client interface. For example multiple clients could access the same cache and the code could keep making progress at the same time. However maybe this is more a diskcache2 thing :)

Anyways just some thoughts, i think you could close this issue if you'd like since the cache.check fix will work, as well as a find command on a cron routine.

@grantjenks
Copy link
Owner

grantjenks commented Aug 31, 2021

I created a new branch at origin/cleanup-dirs with a WIP of (1). The plan is to keep the two levels of directories but delete the lower level if it's empty. So you could end up with 256 empty dirs but that's not so bad as 65k.

@SoundsSerious
Copy link
Author

I like that, its simple and efficient! Yea we can definitely handle 256 inodes!

In the mean time your find empty directories bash command is working great as a nightly cron command.

@SoundsSerious
Copy link
Author

I'll see if i can give this a test this week. Thanks again!

@SoundsSerious
Copy link
Author

SoundsSerious commented Sep 4, 2021

@grantjenks

Hey finally found the time to test this. Pulled in the latest code and tried this stress test:

import diskcache as dc
import numpy

cache = dc.Cache('/tmp/dc_test_rm')
N = 255*255
for i in range(N):
    print(i)
    data = {'dict':'hey','value':i, 'data':numpy.array(range(i-N,i))}
    cache.add(i,data,expire=600) 
    

@SoundsSerious
Copy link
Author

I get this error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_1516/80055230.py in <module>
      7     print(i)
      8     data = {'dict':'hey','value':i, 'data':numpy.array(range(0,i))}
----> 9     cache.add(i,data,expire=600)

~/python-diskcache/diskcache/core.py in add(self, key, value, expire, read, tag, retry)
    996         db_key, raw = self._disk.put(key)
    997         expire_time = None if expire is None else now + expire
--> 998         size, mode, filename, db_value = self._disk.store(value, read, key=key)
    999         columns = (expire_time, tag, size, mode, filename, db_value)
   1000 

~/python-diskcache/diskcache/core.py in store(self, value, read, key)
    256                 filename, full_path = self.filename(key, value)
    257 
--> 258                 with open(full_path, 'xb') as writer:
    259                     writer.write(result)
    260 

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/dc_test_rm/11/2d/fbc930685edbd263d586c38f688c.val'```

@SoundsSerious
Copy link
Author

This seems like it would be a worst case scenerio with rapid sequential access, but admittedly my knowledge of the system could be better.

Maybe this could be solved with a custom wrapper ensure_and_open(full_path, 'xb' ), where first the file is checked, and has a catch for FileNotFoundError that will create the file then recursively calls it.

@grantjenks
Copy link
Owner

My apologies @SoundsSerious . The origin/cleanup-dirs is a work-in-progress (WIP) and does not work yet. I may make some more progress tonight.

@grantjenks
Copy link
Owner

I've updated the origin/cleanup-dirs with working changes. There are tests for the new functionality and both levels of directories are cleaned up.

@grantjenks
Copy link
Owner

Fixed by #222 . To be released this month

@SoundsSerious
Copy link
Author

@grantjenks Hey Nice work! Hopefully I can give this a test run in the coming week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants