-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend cache #1585
Backend cache #1585
Conversation
seems like I need to rebase master too |
I have written up my ideas for object serialization in #1589. Yes it does involve touching every remote, but it is worth it I think. Your comments on that issue would be appreciated.
That doesn't account for writes does it. I think the new optional interface might be clearer, but I can be convinced either way I'm sure!
This means you are corrupting the data. The crypt remote will detect any form of data corruption so it is a very sensitive test!
Good idea. As far as code organization vs commits, what I'd like to see is a different commit (or PR if you think you can get them in earlier) for each logical change of stuff outside the /cache directory. So if you need to change Inside the /cache directory can you squash everything into one commit at the end. Make sure you check out the commit message guidelines and the following section about vendor stuff (can you put vendor commits in a separate commit). I just had a quick look at the travis log. It is moaning about lots of places that error returns need checking, and you seem to have broken the crypt unit tests somehow! Try to get the build tests working locally if you can so you can run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things I noticed with a quick skim over the code!
crypt/crypt.go
Outdated
@@ -87,7 +87,7 @@ func NewFs(name, rpath string) (fs.Fs, error) { | |||
remotePath := path.Join(remote, cipher.EncryptFileName(rpath)) | |||
wrappedFs, err := fs.NewFs(remotePath) | |||
// if that didn't produce a file, look for a directory | |||
if err != fs.ErrorIsFile { | |||
if err != fs.ErrorIsFile && err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what broke the crypt unit tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer needed so removed
fstest/fstests/gen_tests.go
Outdated
@@ -164,5 +164,6 @@ func main() { | |||
generateTestProgram(t, fns, "Box") | |||
generateTestProgram(t, fns, "QingStor", buildConstraint("!plan9")) | |||
generateTestProgram(t, fns, "AzureBlob", buildConstraint("go1.7")) | |||
generateTestProgram(t, fns, "Cache", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to lose the , ""
here - I change the interface recently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Makefile
Outdated
@@ -18,6 +18,10 @@ rclone: | |||
go install -v --ldflags "-s -X github.com/ncw/rclone/fs.Version=$(TAG)" $(BUILDTAGS) | |||
cp -av `go env GOPATH`/bin/rclone . | |||
|
|||
rclone_linux: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a local change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by mistake, removed
Seems like Bolt isn't compiling on mips |
710ca12
to
1d81a60
Compare
Time for a status update. PR:
Now on functionality:
Still pending:
|
Seems |
Sounds like you are making great progress - well done :-) I agree 100% with you about punting ReadBlockAt, Serialization, etc for the moment. I think we should try to merge the minimum working code first and then iterate from there. Let me know when you are ready to merge the minimum and I'll do a thorough code review. Keep rebasing the branch and squashing the commits where appropriate - I know it is annoying for anyone following the branch, but it will make merging so much easier! @remusb wrote
Absolutely. Check out the sftp remote which does the same thing. Don't forget an equivalent of the sftp_unsupported.go file, and check out |
Just a quick update: I need to research a better storage DB or just store the file data on the disk. Bolt isn't the ideal choice for this. The DB on the disk grows beyond reasonable sizes and since it's locked all the time during mounts there is no way to compact the file during mounting. This is after days/weeks of keeping a build up with cache and it functions just fine after adding several extra features:
I just need to move the bulk of the data on another DB storage and will likely push back a working version. As it is, with Bolt, this isn't usable after a couple of days of rclone being on with caching. |
That sounds really good, apart from the problems with bolt.
Using the filesystem sounds like a reasonable idea to me depending on exactly how many keys you want into the data; the filesystem is a pretty good multi-user persistent key/value store ;-) |
@remusb I would say the Key information that needs to be cached is the directory structure. This will allow faster access from the disk and less back and forth reading and re-reading everything. Bolt would probably be fine for that, but actual disk caching you should just use local file storage. If you want some ideas on another go project that did caching in bolt look at https://github.com/dweidenfeld/plexdrive |
I second this. The filesystem is well crafted to be used for cached file chunks. |
Nice to see that progress is being made. |
Finally after a few good days of letting this run as a Plex encrypted permanent storage in google drive I can say it works ok (at least for my needs) with a reasonable sized drive (500GB+). And yes, I completely went to disk based storage of the chunks. And I was also thinking the same as @seuffert to switch to RAM based structure for the file lists. I even know rclone has this support Avoiding bans was something I had to struggle a bit with. Plex is weird in a lot of ways. It constantly scans and to my latest surprise, it's not always the same file parts causing rclone to hit google drive very often. The problem is that by default rclone will preload chunks through the cache workers which added more gdrive hits than necessary during scans. I had to add some logic per stream to check if it is a long running or a short running one and adapt how rclone loads the chunks from google drive based on that. This also removes the need to configure rclone first for a warm up and then disable it which I found to be annoying. It also now means that playback during scans won't be affected as playback streams will adapt to use preloading while scans will still use the warm up one. @zenjabba I studied a bit Plexdrive (it's a very good caching software and I applaud the creators for it) before starting on this. I think they switched to bolt before I started on this module so I might be a bit disconnected from that progress. Still, I know they were not using it for chunk storage either. |
Great Work!! |
@naeloob yes, write is supported by default as any other rclone backend. What I've been asked offline was to add support for writes through cache to be saved locally thus saving the need to download the file afterwards if it's needed immediately after write. |
@remusb Nice! I cannot wait to test it!! |
Same here, I wonder if @ncw will set it in v1.39 milestone. As for write/cache it would be great if files where stored locally ( eg same way we use unionfs now) and then moved to remote asap. The main problem atm is that direct copy is unreliable with bigger files, more prone to errors since its not going trough all checks/retries etc... It would be also quite important that we could set flags how those cashed files are moved to remote, eg bwlimit, nr of retries etc... We should first test cached end points and get proper optimizations for best results. And his latest stable release with boltdb works great, I have it on 2 servers on which I can run full scans, radarr, sonarr where each of them are opening files as well if you have media analysis enabled + plex scan and i never got any bans. ( my library is over 100TB .. 5K+ moves and 20K+ episodes ) @remusb really awesome you picked it up since first serious talks about implementation of it started at beginning of 2017 |
Imho read Cache should be stable first. Write cache must not be a blocker for a first release. |
I agree |
@remusb wrote
Great news :-) Can you adjust the build tags so the build passes. According to the travis logs it looks like the build is failing on plan9 so can you disable those architectures. (Aside that output is really tricky to read!) You can run this locally with
Not sure I understand the question? The file lists need to be persisted so can't be stored entirely in ram. I'd say storing them on disk is fine - the OS will cache hot data for us. I'd like to merge the first version of this for 1.39 marked as experimental - how does that sound to you? In the future I'd like to be able to add the caching layer just by using a |
04fb8b6
to
d07e0a7
Compare
Ok, final first version done I would say. I'm ok with marking it as experimental, there's no more test data or tests that I can perform to provide any other value. Regarding file and dir info in RAM, I dropped it for now as it would just add more changes that will just take a long time to implement before even releasing a first version. Bolt works fine for this use case only but chunks (file data) are stored on disk as files (+ RAM if not disabled). Updates:
@ncw open for a review when you have the time Thanks LE: @ncw I thought that it only fails locally but it seems even with the plan9 ignores copied from sftp it still tries to compile for it. |
Want to be the first to say I'm excited by the direction and progress! |
40d718d
to
ae6696a
Compare
@ncw updated some of the inline comments.
The config file looks interesting. Are those values blank on intention? It seems I just need to get a working environment for Windows which seems more trickier than it seems, the most basic tests don't even start on it (backslash issues, import errors on fstests). And thanks for the extensive review, quite helpful. |
@@ -66,7 +66,7 @@ import ( | |||
"github.com/ncw/rclone/fs" | |||
"github.com/ncw/rclone/fstest/fstests" | |||
"github.com/ncw/rclone/{{ .FsName }}" | |||
{{ if eq .FsName "crypt" }} _ "github.com/ncw/rclone/local" | |||
{{ if (or (eq .FsName "crypt") (eq .FsName "cache")) }} _ "github.com/ncw/rclone/local" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncw just want to flag this to your attention cause it's kind of outside the standard changes. Since both cache
acts as a wrapper(along crypt
) I had to add it to this if
to have that import added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks fine - cache and crypt are similar in that respect.
e9e571d
to
e5d0a98
Compare
Great
That sounds fine.
I'll add an fs.CacheDirectory() call at some point to be parent for the cache as I think according the LSB cache stuff is supposed to go in .cache
:-)
It is a belt an braces thing to make sure the NilObject prints properly - it is much better than a traceback in your log files!
I just ran through
Yes you need to use the filepath module instead of the path module for cross platform paths.
No worries! BTW I have made a start on getting mount to be able to use Read and Write handles. I've implemented a very simple cache there, but at some point I'd like to be re-using the internals from your cache backend directly. |
@ncw I can't replicate the last failure in travis. Even ran the test in a loop until it's encountered but nothing. Could be environmental as that just means an I/O interruption in a reading. Have you seen this before in other PRs? |
Hmm normally the tests are pretty reliable so it seems likely to me that it is in your code. I note that the failure is using I've pressed the rebuild button to see if we get a different result this time! |
Not to side track, are there plans later on to expand this out to take advantage of the changelog endpoints to keep this up to date outside of changes made within |
Something broke somewhere and I can't pin it. You added some sort of buffering in between fuse and the remotes? Was there any other major change that could help me understand what could it be? Tests pass, content matches but media players don't seem to want to play the files anymore for reasons that I can't seem to find. Still trying to get to the bottom of it. I rebased by the way, I suspect I was running locally on an older base than what was running in travis.
@toomuchio ideally yes. Right now cached data expires in configurable times but eventually I would like to add support for that. |
What I did was factor the VFS layer out of mountlib so I could re-use it elsewhere. This was a really big refactor so it is possible I broke stuff (though the tests all pass).
This is likely the readahead done in the Accounting layer. If you set
If the chunks are on disk, they are very likely still in the OS buffercache so are probably effectively in RAM too - I wouldn't have thought this should make very much difference?
It would probably be a good idea to measure the max latency of each request somehow - I suspect if anything, the payers are sensitive to that. A few lines of code in the mount ReadAt command would show that. I think to make this work really well we are going to have to give mount more access to the cache internals. I'd like to do that in the future though! |
31f816f
to
a8268d9
Compare
@ncw Ok, I found the issue. It's because I removed the SeekOption from reading on the source remote. Local doesn't support it and naturally when I'm performing my tests I'm using that. The cache workers were essentially just downloading the first part of the file and it broke the file completely. Tests didn't picked it up either cause it happened after the first chunk and that's MBs large which wasn't used anywhere. I added a test to go over a large file byte by byte on cache to ensure it doesn't happen again. Thing is, I added both options now. Remotes that support one or the other will just work. Remotes that support both of them will be kind of unknown. I've tested with drive and it seems that RangeOption wins over SeekOption. This is in dire need of a feature to be advertised by remotes on how they support reading. I can open a PR at some point for it but just not in this one, it will touch a lot although the risk should be minimal for it. |
@remusb glad you fixed it! Is it ready to merge now? I made an issue about the RangeOption #1825 - your thoughts on there would be appreciated.
Looking at the code I think rclone will supply two |
@ncw I don't have any other reason not to merge now. I see that the travis The current state has been running locally on my end for the past couple of days and things look good. We might get reports of file reading not working properly across other remotes as that Range option is flaky but we can deal with it in the issue you opened already. |
Fantastic. I'm going to merge this now then! |
Opening this here to continue the discussion from #711
Thanks @ncw for the suggestion. I'll reopen the PR with adequate commits once this gets to a near merge state.
To pick up from the issue:
I would be very interested. My initial idea was to allow each backend to export their objects and just manage the storage and cache retrieval from the cache backend. But then I realized that meant modifying most of the backends and FS interfaces and I wanted to avoid a big impacting change.
Yes, makes sense. If I understand correctly, a cache backend would be able to allow this even if the source FS wouldn't.
I actually had to do some changes to the ReadHandle which could relate to this. It's doing buffered reads and calls Open multiple times on the object to build up the buffer. While it doesn't stop it from working, I had terrible performance while doing a similar buffering on the backend.
I added a new feature flag
FileReadRaw
that basically tellsReadFileHandle
to allow the backend to return the chunks that fuse asks for directly (latest commit has this).I assume this is what you were thinking too no? Your naming make more sense though.
I did had something in mind while doing that but I forgot what it was.
Anyway, I want to rewrite the structure part (listings). Bolt supports nested buckets and there's no reason to not replicate a proper folder structure in it too. That way it makes Object storage easier.
I'm also struggling with wrapping crypt. If I wrap crypt under cache, I get a black screen during video playback and no weird errors in the log which suggests the bytes aren't right. I I wrap cache under crypt, I get
ErrorEncryptedBadBlock
mostly all the time. It originates fromPoly1305 one-time message authentication code
failing but I never got to understand why.I studied the crypt code as best as I could and apart from decrypting the stream on the fly, it doesn't seem to be doing anything wrong. The same symptoms happen regardless of the caching technique (which I changed quite a couple of times). Oh and yeah, if I mount crypt directly or cache directly, both work but if I wrap each other then it doesn't which is even more frustrating.
Anyway, to get back to it, stuff to do:
ReadBlockAt
andWriteBlockAt
to minimize the blast radius? @ncwI'll probably split these and get a PR open with a first version to get some feedback