MMap based chunk cache and disk based caching #406

felixbuenemann · 2022-01-30T18:28:19Z

This changes the chunk cache to use MMap for allocating cache chunks.

This significantly reduces plexdrive memory usage, since most allocations bypass the Go garbage collector.

If persistent chunk cache is enabled (--chunk-disk-cache), the cache is persisted to the file or device specified by --chunk-file (defaults to "chunks.dat" in the config directory).

If no persistent cache is used, plexdrive will use anonymous mmap and will efficiently use the swap file as disk cache, if the size of the chunk cache exceeds the amount of free memory.

The chunk cache uses crc32 to verify the integrity of cache chunks to prevent against corruption, eg. if plexdrive crashes. The checksum will be verified lazily once a cached chunk is loaded.

Another change is that plexdrive now signals changed files to fuse, so it can enable kernel caching of open files. This means that files cached in memory by the kernel can be accessed with nearly no cpu usage by plexdrive.

Since it is possible for files on google drive to change, plexdrive now downloads them using the cached RevisionID of the drive object, which prevents chunks from different versions of the file to be mixed up. So you always get the version that corresponds to the cached metadata in the cache.bolt.

Since additional metadata is stored per file, the cache.bolt in the configuration directory should be deleted, when upgrading to this new version of plexdrive.

The code for this was mostly written two years ago and has been used successfully in production since then. I've rebased the code to the current master and did some quick testing to ensure nothing broke. I've kept the original commits since the sheer number of changes would have been a major effort to compress into more focused commits. So there are quite a few changes that are replaced by later commits in the feature branch.

Resolves #360

…e#393" This reverts commit e8c9513.

We didn't raise the index version when swapping the chunk fields, so this swaps the field by guessing the existing field order. Should be removed once code is table.

and skip storing pageToken if it hasn't changed.

This avoids enconding/decodind and unneeded copies, but it makes the chunk cache file endianess dependent. To detect endianess difference the PD magic bytes are not stores as uint16 so the flip to DP if endianess has changes and the cache becomes invalid.

Note that resizing is currently not supported, since we lack information to discover the last position of the journal.

Instead of creating a separate mmap region for each chunk this maps one or multiple regions at the maximum allowed size and creates the chunks as slices from these mappings. This should in theory also work with 32-Bit Linux systems and LFS support, were a single mmap region will be limited to about 2GB, so multiple mappings are used if the chunk file is bigger than 2GB.

This isn't needed, since we're at EOF anyways, so nothing after the toc needs to be aligned and the size itself doesn't have to be aligned.

This should ensure consistent downloads, since it was previously possible that the cached MD5 sum of a recently changed object was used as the chunk cache / request id for the download. By also using the cached revision id, this should no longer be a problem.

This ensures that drive objects have stable node hashes, enables keeping kernel buffer caches across file opens and invalidates the buffer cache when changes are detected. Also added code to ensure the root node is a folder.

Since both the persistent cache to the chunk-file and the non-persistent cache use mmap, the naming was confusing.

felixbuenemann · 2022-01-31T23:54:46Z

There's more detailed info on the cache storage data structure in #389.

felixbuenemann added 17 commits January 30, 2022 16:36

Revert "put cache.bolt in config directory by default, fixes plexdriv…

2c5b29d

…e#393" This reverts commit e8c9513.

WIP: Implement MMAP backed file cache for chunks

bf839f0

Remove index from stack

63824ff

TEMP: Hack to swap checksum/size of chunk metadata

16dad97

We didn't raise the index version when swapping the chunk fields, so this swaps the field by guessing the existing field order. Should be removed once code is table.

Add mutex for change checking, tweak sighandler

348e46e

and skip storing pageToken if it hasn't changed.

WIP: Use MD5+ChunkOffset as unique Request ID

728febe

Remove id to key mapping, use const where possible

9897c47

Support raw block devices for chunk cache

376bb5d

Note that resizing is currently not supported, since we lack information to discover the last position of the journal.

Add debug loggging for journal validation

50664bc

Don't align journal TOC to page size

2854b20

This isn't needed, since we're at EOF anyways, so nothing after the toc needs to be aligned and the size itself doesn't have to be aligned.

Enable persistent kernel buffer caching

0e66908

This ensures that drive objects have stable node hashes, enables keeping kernel buffer caches across file opens and invalidates the buffer cache when changes are detected. Also added code to ensure the root node is a folder.

Use private anononymous mmap for download buffers

765da73

Rename chunk-mmap option to chunk-disk-cache

e245d99

Since both the persistent cache to the chunk-file and the non-persistent cache use mmap, the naming was confusing.

Document new chunk cache options in README

a6f6f9f

felixbuenemann merged commit 420a892 into plexdrive:master Jan 30, 2022

felixbuenemann mentioned this pull request Jan 31, 2022

WIP: Implement MMap backed file cache for chunks #389

Closed

7 tasks

This was referenced Feb 2, 2022

cache.bolt isn't in config path by default #393

Closed

Fix cache-file and chunk-file default path #407

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMap based chunk cache and disk based caching #406

MMap based chunk cache and disk based caching #406

felixbuenemann commented Jan 30, 2022

felixbuenemann commented Jan 31, 2022 •

edited

MMap based chunk cache and disk based caching #406

MMap based chunk cache and disk based caching #406

Conversation

felixbuenemann commented Jan 30, 2022

felixbuenemann commented Jan 31, 2022 • edited

felixbuenemann commented Jan 31, 2022 •

edited