Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The file system cache is not removing files properly #61

Open
lolbyte-code opened this issue Apr 5, 2023 · 8 comments
Open

The file system cache is not removing files properly #61

lolbyte-code opened this issue Apr 5, 2023 · 8 comments

Comments

@lolbyte-code
Copy link

Would be great to have this option available. I've found that the file system cache slowly eats up disk space and I have to intermittently clear it out (TTL of 1 day doesn't seem to actually clean up the fs).

@LamentOfKaito
Copy link

Not the owner or a maintainer, but may I ask a few questions?

Add full support for in-memory caching

What features is the in-memory cache? Like, what are you trying to do? Or what do you want it to do?

I have to intermittently clear it out

Do you mean you manually clean up the FS cache?

TTL of 1 day doesn't seem to actually clean up the fs

What do you mean by this? TTL of 1 day is not good enough for your use case, or are expired cache files not removed?

@lolbyte-code
Copy link
Author

Some background: I maintain an app called LolByte which relies on a web server that uses this library to talk to the Riot API.

What features is the in-memory cache? Like, what are you trying to do? Or what do you want it to do?

Cache data in-memory rather than on the file system. Even if the cache grew unbounded it’s easier to restart the JVM / container than manually clean up files. If the issue with the FS growing were fixed I wouldn’t need the in-memory implementation at all.

Do you mean you manually clean up the FS cache?

Yes eventually the FS cache takes up all the disk space on my server. I try to clean it up when the disk utilization hits 80% but I’d like to remove this toil.

What do you mean by this? TTL of 1 day is not good enough for your use case, or are expired cache files not removed?

TTL of 1 day is fine. I suspect the TTL may be getting enforced correctly but for whatever reason files are still being left on the FS. I haven’t looked deep into the code to verify this but like I said my disk utilization slowly climbs up over time.

@LamentOfKaito
Copy link

LamentOfKaito commented Aug 7, 2023

Cache data in-memory rather than on the file system.

R4j supports in-memory caching, but I assume you want to cache everything and not only Match data.

I suspect the TTL may be getting enforced correctly but for whatever reason files are still being left on the FS. I haven’t looked deep into the code to verify this but like I said my disk utilization slowly climbs up over time.

That's what I was curious about. I haven't tested this project yet, but I have read its source code, esp the caching part.
The main logic is:

I can only think of 3+1 things:

  • You are not setting TTL properly. If TTL = INFINITY, nothing will happen.
  • Some other program is modifying the file (or at least its modification time) which keeps the file "up-to-date".
  • Unlikely, but your FS (or the JRE itself) does not support storing modification dates. According to the spec (javadocs), JRE may return an implementation-specific value for the modification date if the FS does not support it. This can mess up how cache invalidation/clean-up works.
  • Or, there is a bug in the implementation. I will need to test it first.

One last question (might help us fix this issue): What OS/FS/JRE are you using?

@LamentOfKaito
Copy link

LamentOfKaito commented Aug 7, 2023

I don't know how I missed this, but the update method sets lastModifiedTime while clearPath checks lastAccessTime.
Maybe that was unintentional.
IMO, the code should be updated to use only Files.setLastModifiedTime and getLastModifiedTime (clearer intentions and avoids needless casting and intermediate values).

However, AFAIK, that does not directly affect how it works.

  • Reading the attributes of a file (last modified or last accessed) does not update the last accessed value.
  • Updating "last modified" will* update "last accessed".
  • The code will work fine as long as "expired" cache files are not read.
    If they are read, they effectively get automatically "update"d. This is an implementation bug.
    Does it pertain to your particular case though?

* It will eventually get updated. Apparently, setLastModifiedTime returns before changes are fully applied (flushed?).
Here are some tests:

// Run using `jshell -R -ea -- Test.java` (enable assertions)
// Tested on Windows 10 x64, the default FS (NTFS), and OpenJDK 15.
import java.nio.file.*;
import java.nio.file.attribute.FileTime;
import java.time.Instant;

var path = Paths.get("some.txt");

// Scenario 1: Reading attributes does not change last accessed time.
{    
    var accessed1 = ((FileTime)Files.getAttribute(path, "lastAccessTime")).toMillis();
    Thread.sleep(2 * 1000); // in seconds * 1000
    var accessed2 = ((FileTime)Files.getAttribute(path, "lastAccessTime")).toMillis();
    assert accessed2 == accessed1;
}

// Scenario 2: Setting modified time updates last accessed time.
{
    var accessed1 = ((FileTime)Files.getAttribute(path, "lastAccessTime")).toMillis();
    Files.setLastModifiedTime(path, FileTime.from(Instant.now()));
    Thread.sleep(2 * 1000); // in seconds * 1000
    var accessed2 = ((FileTime)Files.getAttribute(path, "lastAccessTime")).toMillis();
    // The following throws an assertion error if `sleep` is commented out.
    assert accessed2 > accessed1;    
}

/exit

@LamentOfKaito
Copy link

One last point:
I suggest that we tag this issue as a Bug and retitle it to something like The file system cache is not removing files properly.
(Since this is the actual issue and you are willing to use the FS cache instead of the in-memory cache if it worked correctly).

@lolbyte-code
Copy link
Author

Thanks for all your help on this!

R4j supports in-memory caching, but I assume you want to cache everything and not only Match data.

Ideally, yes!

You are not setting TTL properly. If TTL = INFINITY, nothing will happen.

Here is where I initialize R4J: https://github.com/lolbyte-code/lolbyte-service/blob/4c99f9ea13c88634d369403cd0e6ea8c4f582a29/src/main/kotlin/com/badger/lolbyte/client/R4JClient.kt#L42C19-L42C22. I don't think this is setting the TTL to INFINITY but correct me if I am wrong.

One last question (might help us fix this issue): What OS/FS/JRE are you using?

Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-76-generic x86_64)
ext4
OpenJDK 11

@lolbyte-code lolbyte-code changed the title Add full support for in-memory caching The file system cache is not removing files properly Aug 7, 2023
@lolbyte-code
Copy link
Author

@ReprimandedKaito - Doesn't look like I have permissions to tag this as a bug?

@stelar7
Copy link
Owner

stelar7 commented Aug 7, 2023

If they are read, they effectively get automatically "update"d. This is an implementation bug.

No its not, thats on purpose.

The cache clearing timer isnt started with the default cache provider. Since it sets the TTL to "use-hints", it will be left on disk, and updated when the timer expires.

Looks like it doesnt clear based on the cache-hints if thats not set tho, so Ill look into that later when i have a chance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants