-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMap based chunk cache and disk based caching #406
Merged
felixbuenemann
merged 17 commits into
plexdrive:master
from
felixbuenemann:mmap-new-md5sum-rebased
Jan 30, 2022
Merged
MMap based chunk cache and disk based caching #406
felixbuenemann
merged 17 commits into
plexdrive:master
from
felixbuenemann:mmap-new-md5sum-rebased
Jan 30, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We didn't raise the index version when swapping the chunk fields, so this swaps the field by guessing the existing field order. Should be removed once code is table.
and skip storing pageToken if it hasn't changed.
This avoids enconding/decodind and unneeded copies, but it makes the chunk cache file endianess dependent. To detect endianess difference the PD magic bytes are not stores as uint16 so the flip to DP if endianess has changes and the cache becomes invalid.
Note that resizing is currently not supported, since we lack information to discover the last position of the journal.
Instead of creating a separate mmap region for each chunk this maps one or multiple regions at the maximum allowed size and creates the chunks as slices from these mappings. This should in theory also work with 32-Bit Linux systems and LFS support, were a single mmap region will be limited to about 2GB, so multiple mappings are used if the chunk file is bigger than 2GB.
This isn't needed, since we're at EOF anyways, so nothing after the toc needs to be aligned and the size itself doesn't have to be aligned.
This should ensure consistent downloads, since it was previously possible that the cached MD5 sum of a recently changed object was used as the chunk cache / request id for the download. By also using the cached revision id, this should no longer be a problem.
This ensures that drive objects have stable node hashes, enables keeping kernel buffer caches across file opens and invalidates the buffer cache when changes are detected. Also added code to ensure the root node is a folder.
Since both the persistent cache to the chunk-file and the non-persistent cache use mmap, the naming was confusing.
7 tasks
There's more detailed info on the cache storage data structure in #389. |
This was referenced Feb 2, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes the chunk cache to use MMap for allocating cache chunks.
This significantly reduces plexdrive memory usage, since most allocations bypass the Go garbage collector.
If persistent chunk cache is enabled (
--chunk-disk-cache
), the cache is persisted to the file or device specified by--chunk-file
(defaults to "chunks.dat" in the config directory).If no persistent cache is used, plexdrive will use anonymous mmap and will efficiently use the swap file as disk cache, if the size of the chunk cache exceeds the amount of free memory.
The chunk cache uses crc32 to verify the integrity of cache chunks to prevent against corruption, eg. if plexdrive crashes. The checksum will be verified lazily once a cached chunk is loaded.
Another change is that plexdrive now signals changed files to fuse, so it can enable kernel caching of open files. This means that files cached in memory by the kernel can be accessed with nearly no cpu usage by plexdrive.
Since it is possible for files on google drive to change, plexdrive now downloads them using the cached RevisionID of the drive object, which prevents chunks from different versions of the file to be mixed up. So you always get the version that corresponds to the cached metadata in the cache.bolt.
Since additional metadata is stored per file, the
cache.bolt
in the configuration directory should be deleted, when upgrading to this new version of plexdrive.The code for this was mostly written two years ago and has been used successfully in production since then. I've rebased the code to the current master and did some quick testing to ensure nothing broke. I've kept the original commits since the sheer number of changes would have been a major effort to compress into more focused commits. So there are quite a few changes that are replaced by later commits in the feature branch.
Resolves #360