Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend cache #1585

Merged
merged 2 commits into from
Nov 15, 2017
Merged

Backend cache #1585

merged 2 commits into from
Nov 15, 2017

Conversation

remusb
Copy link
Collaborator

@remusb remusb commented Aug 7, 2017

Opening this here to continue the discussion from #711
Thanks @ncw for the suggestion. I'll reopen the PR with adequate commits once this gets to a near merge state.

To pick up from the issue:

I see you are serializing fs.Object~s. I have a plan to add a couple of optional methods to Fses to allow object serialization and deserialization - I think this will make your life easier. You then wouldn't have to call NewObject in CachedObject.RefreshObject (which can be expensive). I could bring forwards this plan if you want.

I would be very interested. My initial idea was to allow each backend to export their objects and just manage the storage and cache retrieval from the cache backend. But then I realized that meant modifying most of the backends and FS interfaces and I wanted to avoid a big impacting change.

To be truly useful with mount, mount needs to be able to do non sequential writes, and reads from the same object. We could make two new optional interfaces on the Fs support that, Say ReadBlockAt and WriteBlockAt, or maybe on the open handle might be eaiser.

Yes, makes sense. If I understand correctly, a cache backend would be able to allow this even if the source FS wouldn't.

I actually had to do some changes to the ReadHandle which could relate to this. It's doing buffered reads and calls Open multiple times on the object to build up the buffer. While it doesn't stop it from working, I had terrible performance while doing a similar buffering on the backend.
I added a new feature flag FileReadRaw that basically tells ReadFileHandle to allow the backend to return the chunks that fuse asks for directly (latest commit has this).

I assume this is what you were thinking too no? Your naming make more sense though.

BTW your ListR implementation looks wrong - it doesn't seem to be recursing...

I did had something in mind while doing that but I forgot what it was.
Anyway, I want to rewrite the structure part (listings). Bolt supports nested buckets and there's no reason to not replicate a proper folder structure in it too. That way it makes Object storage easier.

I'm also struggling with wrapping crypt. If I wrap crypt under cache, I get a black screen during video playback and no weird errors in the log which suggests the bytes aren't right. I I wrap cache under crypt, I get ErrorEncryptedBadBlock mostly all the time. It originates from Poly1305 one-time message authentication code failing but I never got to understand why.
I studied the crypt code as best as I could and apart from decrypting the stream on the fly, it doesn't seem to be doing anything wrong. The same symptoms happen regardless of the caching technique (which I changed quite a couple of times). Oh and yeah, if I mount crypt directly or cache directly, both work but if I wrap each other then it doesn't which is even more frustrating.

Anyway, to get back to it, stuff to do:

  • Since the chunks are read quite a couple of times, I added an option to cache in RAM too but that doesn't work yet so it's disabled. They should expire after some time or even after the stream passes them
  • I would like to rewrite the file structure caching by using nested buckets in Bolt
  • Maybe open a different PR for ReadBlockAt and WriteBlockAt to minimize the blast radius? @ncw
  • Fix / finish feature methods
  • Review / expose more params that are currently hard coded
  • Change serialization method @ncw if you have something started on it or ideas laid down, shoot them at me
  • Hammer down this pesky crypt-cache issue
  • Tweak logging

I'll probably split these and get a PR open with a first version to get some feedback

@remusb
Copy link
Collaborator Author

remusb commented Aug 7, 2017

seems like I need to rebase master too

@ncw
Copy link
Member

ncw commented Aug 9, 2017

I have written up my ideas for object serialization in #1589. Yes it does involve touching every remote, but it is worth it I think. Your comments on that issue would be appreciated.

I added a new feature flag FileReadRaw that basically tells ReadFileHandle to allow the backend to return the chunks that fuse asks for directly (latest commit has this).

That doesn't account for writes does it. I think the new optional interface might be clearer, but I can be convinced either way I'm sure!

I I wrap cache under crypt, I get ErrorEncryptedBadBlock mostly all the time.

This means you are corrupting the data. The crypt remote will detect any form of data corruption so it is a very sensitive test!

Maybe open a different PR for ReadBlockAt and WriteBlockAt to minimize the blast radius?

Good idea.

As far as code organization vs commits, what I'd like to see is a different commit (or PR if you think you can get them in earlier) for each logical change of stuff outside the /cache directory. So if you need to change cmd/mountlib then put it in a separate commit or PR.

Inside the /cache directory can you squash everything into one commit at the end. Make sure you check out the commit message guidelines and the following section about vendor stuff (can you put vendor commits in a separate commit).

I just had a quick look at the travis log. It is moaning about lots of places that error returns need checking, and you seem to have broken the crypt unit tests somehow! Try to get the build tests working locally if you can so you can run make check and make quicktest - that will save you time fixing up the build problems.

Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things I noticed with a quick skim over the code!

crypt/crypt.go Outdated
@@ -87,7 +87,7 @@ func NewFs(name, rpath string) (fs.Fs, error) {
remotePath := path.Join(remote, cipher.EncryptFileName(rpath))
wrappedFs, err := fs.NewFs(remotePath)
// if that didn't produce a file, look for a directory
if err != fs.ErrorIsFile {
if err != fs.ErrorIsFile && err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what broke the crypt unit tests!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer needed so removed

@@ -164,5 +164,6 @@ func main() {
generateTestProgram(t, fns, "Box")
generateTestProgram(t, fns, "QingStor", buildConstraint("!plan9"))
generateTestProgram(t, fns, "AzureBlob", buildConstraint("go1.7"))
generateTestProgram(t, fns, "Cache", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to lose the , "" here - I change the interface recently

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Makefile Outdated
@@ -18,6 +18,10 @@ rclone:
go install -v --ldflags "-s -X github.com/ncw/rclone/fs.Version=$(TAG)" $(BUILDTAGS)
cp -av `go env GOPATH`/bin/rclone .

rclone_linux:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a local change!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by mistake, removed

@remusb
Copy link
Collaborator Author

remusb commented Aug 31, 2017

Seems like Bolt isn't compiling on mips

@remusb remusb force-pushed the backend-cache branch 2 times, most recently from 710ca12 to 1d81a60 Compare August 31, 2017 21:16
@remusb
Copy link
Collaborator Author

remusb commented Aug 31, 2017

Time for a status update.

PR:

  • all changes are localized to the cache backend
  • persistent storage changed to https://github.com/coreos/bbolt for MIPS support and generally being more maintained
  • rebased to latest master + fixed commits (sorry for who checked out the branch already)
  • fixed the points mentioned earlier by @ncw

Now on functionality:

  • after moving to a dedicated reader, crypt works to be wrapped by cache (I'm not even testing the other way around, it's a bad choice)
  • I found a huge number of issues while running the integration tests especially on how the file structure was handled
  • reading supports double buffering which can be enabled via --cache-chunk-memory. If enabled, there's gonna be a second layer for buffering which works on top of the persistent storage. It surely boosts performance but at a RAM cost (depending on how many simultaneous streams you're doing and the --cache-chunk-size param)
  • all params are now configurable via CLI or in the remote config: details on the docs
  • most methods are implemented and features depend on what you're wrapping (I'm testing with local and drive)

Still pending:

  • ReadBlockAt and WriteBlockAt will be done at a later stage. I wanna see the cache backend completed and it works without it. I'm also a bit unclear on the use case or how to test that but let's take it out of this scope
  • Serialization method: nice to have but similar to ReadBlockAt, it will be done separately
  • Cached file info doesn't expire at the moment. It's needed cause changes done outside of the cache remote will not be seen
  • I'll start using this version for my use case (encrypted on drive -> cache -> Plex) to see how it behaves in practice before I submit the PR as ready for merge
  • Documentation

@remusb
Copy link
Collaborator Author

remusb commented Aug 31, 2017

Seems plan9 will fail now. @ncw would it it be possible to remove support for this platform just for this backend and allow the build to pass? It's unlikely for me to contribute support for it on bbolt.

@ncw
Copy link
Member

ncw commented Sep 1, 2017

Sounds like you are making great progress - well done :-)

I agree 100% with you about punting ReadBlockAt, Serialization, etc for the moment. I think we should try to merge the minimum working code first and then iterate from there.

Let me know when you are ready to merge the minimum and I'll do a thorough code review.

Keep rebasing the branch and squashing the commits where appropriate - I know it is annoying for anyone following the branch, but it will make merging so much easier!

@remusb wrote

Seems plan9 will fail now. @ncw would it it be possible to remove support for this platform just for this backend and allow the build to pass? It's unlikely for me to contribute support for it on bbolt.
Merge state

Absolutely. Check out the sftp remote which does the same thing. Don't forget an equivalent of the sftp_unsupported.go file, and check out fstest/fstests/gen_tests.go for how you add a build constraint to the auto generated tests.

@remusb
Copy link
Collaborator Author

remusb commented Oct 2, 2017

Just a quick update: I need to research a better storage DB or just store the file data on the disk. Bolt isn't the ideal choice for this. The DB on the disk grows beyond reasonable sizes and since it's locked all the time during mounts there is no way to compact the file during mounting.
More explanations here: boltdb/bolt#308 (comment)

This is after days/weeks of keeping a build up with cache and it functions just fine after adding several extra features:

  • mount a cache FS in warmup mode. this is for intensive lookups of small pieces on the source and preloading isn't needed. Think Plex initial scans
  • rps (requests per second) limiter on workers. Isn't really needed if the previous feature is used correctly but I got lots of bans while doing initial warmups and this was useful enough to be kept around in case you want a hard limit on cold spots of the cache

I just need to move the bulk of the data on another DB storage and will likely push back a working version. As it is, with Bolt, this isn't usable after a couple of days of rclone being on with caching.

@ncw
Copy link
Member

ncw commented Oct 11, 2017

That sounds really good, apart from the problems with bolt.

I need to research a better storage DB or just store the file data on the disk. Bolt isn't the ideal choice for this. The DB on the disk grows beyond reasonable sizes and since it's locked all the time during mounts there is no way to compact the file during mounting.

Using the filesystem sounds like a reasonable idea to me depending on exactly how many keys you want into the data; the filesystem is a pretty good multi-user persistent key/value store ;-)

@zenjabba
Copy link

@remusb I would say the Key information that needs to be cached is the directory structure. This will allow faster access from the disk and less back and forth reading and re-reading everything. Bolt would probably be fine for that, but actual disk caching you should just use local file storage.

If you want some ideas on another go project that did caching in bolt look at https://github.com/dweidenfeld/plexdrive

@seuffert
Copy link
Contributor

Bolt would probably be fine for that, but actual disk caching you should just use local file storage.

I second this. The filesystem is well crafted to be used for cached file chunks.
I would even consider to cache the structure just in memory. mounts often exist over a long period, why bother with making this persist over remounts?

@ajkis
Copy link

ajkis commented Oct 27, 2017

Nice to see that progress is being made.

@remusb
Copy link
Collaborator Author

remusb commented Oct 31, 2017

Finally after a few good days of letting this run as a Plex encrypted permanent storage in google drive I can say it works ok (at least for my needs) with a reasonable sized drive (500GB+).

And yes, I completely went to disk based storage of the chunks. And I was also thinking the same as @seuffert to switch to RAM based structure for the file lists. I even know rclone has this support but it's somehow hard wired to ACD from what I saw (ACD is just using the module, it isn't hardwired to it). Still, I would rather use an existing module than adding a new one. And from my tests, listing isn't that expensive for mount usages. But it will likely be for the one-off operations though.
@ncw @zenjabba how do you feel about this?

Avoiding bans was something I had to struggle a bit with. Plex is weird in a lot of ways. It constantly scans and to my latest surprise, it's not always the same file parts causing rclone to hit google drive very often. The problem is that by default rclone will preload chunks through the cache workers which added more gdrive hits than necessary during scans. I had to add some logic per stream to check if it is a long running or a short running one and adapt how rclone loads the chunks from google drive based on that. This also removes the need to configure rclone first for a warm up and then disable it which I found to be annoying. It also now means that playback during scans won't be affected as playback streams will adapt to use preloading while scans will still use the warm up one.

@zenjabba I studied a bit Plexdrive (it's a very good caching software and I applaud the creators for it) before starting on this. I think they switched to bolt before I started on this module so I might be a bit disconnected from that progress. Still, I know they were not using it for chunk storage either.

@naeloob
Copy link

naeloob commented Oct 31, 2017

Great Work!!
There will be any type of write/upload caching?

@remusb
Copy link
Collaborator Author

remusb commented Oct 31, 2017

@naeloob yes, write is supported by default as any other rclone backend. What I've been asked offline was to add support for writes through cache to be saved locally thus saving the need to download the file afterwards if it's needed immediately after write.
I believe the use case would be torrent files which would be used intensively for a short period of time (seed time) and then they can expire their data and be read only when needed. I'm not sure though of this and I don't plan to use it in this manner either. But the option is there (need to add an internal test now that I remembered about it).

@naeloob
Copy link

naeloob commented Oct 31, 2017

@remusb Nice! I cannot wait to test it!!
How will it handle append data in a file? Will download, append in local and upload on close file? or upload on a expire date?

@ajkis
Copy link

ajkis commented Oct 31, 2017

Nice! I cannot wait to test it!!

Same here, I wonder if @ncw will set it in v1.39 milestone.

As for write/cache it would be great if files where stored locally ( eg same way we use unionfs now) and then moved to remote asap.

The main problem atm is that direct copy is unreliable with bigger files, more prone to errors since its not going trough all checks/retries etc...

It would be also quite important that we could set flags how those cashed files are moved to remote, eg bwlimit, nr of retries etc...

We should first test cached end points and get proper optimizations for best results.
When dweidenfeld started with plexdrive there was tons of changes to get best results and avoid gdrive bans.

And his latest stable release with boltdb works great, I have it on 2 servers on which I can run full scans, radarr, sonarr where each of them are opening files as well if you have media analysis enabled + plex scan and i never got any bans. ( my library is over 100TB .. 5K+ moves and 20K+ episodes )

@remusb really awesome you picked it up since first serious talks about implementation of it started at beginning of 2017

@seuffert
Copy link
Contributor

Imho read Cache should be stable first. Write cache must not be a blocker for a first release.

@ajkis
Copy link

ajkis commented Oct 31, 2017

Imho read Cache should be stable first. Write cache must not be a blocker for a first release.

I agree

@ncw
Copy link
Member

ncw commented Nov 1, 2017

@remusb wrote

Finally after a few good days of letting this run as a Plex encrypted permanent storage in google drive I can say it works ok (at least for my needs) with a reasonable sized drive (500GB+).

Great news :-)

Can you adjust the build tags so the build passes. According to the travis logs it looks like the build is failing on plan9 so can you disable those architectures. (Aside that output is really tricky to read!) You can run this locally with make compile_all

And yes, I completely went to disk based storage of the chunks. And I was also thinking the same as @seuffert to switch to RAM based structure for the file lists. I even know rclone has this support but it's somehow hard wired to ACD from what I saw (ACD is just using the module, it isn't hardwired to it). Still, I would rather use an existing module than adding a new one. And from my tests, listing isn't that expensive for mount usages. But it will likely be for the one-off operations though.
@ncw @zenjabba how do you feel about this?

Not sure I understand the question? The file lists need to be persisted so can't be stored entirely in ram. I'd say storing them on disk is fine - the OS will cache hot data for us.

I'd like to merge the first version of this for 1.39 marked as experimental - how does that sound to you?

In the future I'd like to be able to add the caching layer just by using a --cache flag or something like that which would auto configure a cache remote.

@remusb remusb force-pushed the backend-cache branch 2 times, most recently from 04fb8b6 to d07e0a7 Compare November 4, 2017 20:23
@remusb
Copy link
Collaborator Author

remusb commented Nov 4, 2017

Ok, final first version done I would say. I'm ok with marking it as experimental, there's no more test data or tests that I can perform to provide any other value.

Regarding file and dir info in RAM, I dropped it for now as it would just add more changes that will just take a long time to implement before even releasing a first version. Bolt works fine for this use case only but chunks (file data) are stored on disk as files (+ RAM if not disabled).

Updates:

  • went through the code a couple of more times to clean up the code after a good amount of in-flight changes
  • rebased to master
  • updated docs

@ncw open for a review when you have the time

Thanks

LE: @ncw I thought that it only fails locally but it seems even with the plan9 ignores copied from sftp it still tries to compile for it.
I think I know what's the issue. The failures happen in a vendor package, the unsupported flags don't apply to that do they?

@remusb remusb changed the title [DRAFT] Backend cache Backend cache Nov 4, 2017
@danielloader
Copy link
Contributor

Want to be the first to say I'm excited by the direction and progress!
Thank you!

@remusb remusb force-pushed the backend-cache branch 3 times, most recently from 40d718d to ae6696a Compare November 7, 2017 19:17
@remusb
Copy link
Collaborator Author

remusb commented Nov 7, 2017

@ncw updated some of the inline comments.

  • plan9 - it passes now after adding the constraint in all the files. It helped to understand this better with the link from the article. Thanks
  • reduce API surface - good point. I already started doing this for what you flagged and for mostly any other method that doesn't make sense to be public. I still have doubts about the cache storage interfaces. I'm thinking someone else could use them too
  • config file: the remote name gets appended to the DB file and the path to the chunks. But I agree, having them tidy up in another folder is better so I added cache as a parent to everything that the cache remote adds to the configurable path
  • internal test run interrupted - annoying piece that hit me too but never got to it. fixed with a purge in the init
  • panic - this one is interesting. For some reason I was passing all the checks yet these tests were failing. In this case it's an interesting test that uses the NilObject and the String method needs to check first if the object is Nil. Even so, having to work on getting this to pass, I found several other issues which I had to address too.

The config file looks interesting. Are those values blank on intention?

It seems I just need to get a working environment for Windows which seems more trickier than it seems, the most basic tests don't even start on it (backslash issues, import errors on fstests).

And thanks for the extensive review, quite helpful.

@@ -66,7 +66,7 @@ import (
"github.com/ncw/rclone/fs"
"github.com/ncw/rclone/fstest/fstests"
"github.com/ncw/rclone/{{ .FsName }}"
{{ if eq .FsName "crypt" }} _ "github.com/ncw/rclone/local"
{{ if (or (eq .FsName "crypt") (eq .FsName "cache")) }} _ "github.com/ncw/rclone/local"
Copy link
Collaborator Author

@remusb remusb Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncw just want to flag this to your attention cause it's kind of outside the standard changes. Since both cache acts as a wrapper(along crypt) I had to add it to this if to have that import added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks fine - cache and crypt are similar in that respect.

@remusb remusb force-pushed the backend-cache branch 7 times, most recently from e9e571d to e5d0a98 Compare November 8, 2017 11:44
@ncw
Copy link
Member

ncw commented Nov 8, 2017

plan9 - it passes now after adding the constraint in all the files. It helped to understand this better with the link from the article. Thanks

Great

reduce API surface - good point. I already started doing this for what you flagged and for mostly any other method that doesn't make sense to be public. I still have doubts about the cache storage interfaces. I'm thinking someone else could use them too

That sounds fine.

config file: the remote name gets appended to the DB file and the path to the chunks. But I agree, having them tidy up in another folder is better so I added cache as a parent to everything that the cache remote adds to the configurable path

I'll add an fs.CacheDirectory() call at some point to be parent for the cache as I think according the LSB cache stuff is supposed to go in .cache

internal test run interrupted - annoying piece that hit me too but never got to it. fixed with a purge in the init

:-)

panic - this one is interesting. For some reason I was passing all the checks yet these tests were failing. In this case it's an interesting test that uses the NilObject and the String method needs to check first if the object is Nil. Even so, having to work on getting this to pass, I found several other issues which I had to address too.

It is a belt an braces thing to make sure the NilObject prints properly - it is much better than a traceback in your log files!

The config file looks interesting. Are those values blank on intention?

I just ran through rclone config pressing enter to each value like I normally do expecting to get the default value. I would expect an empty (or non-existent) value in the config file to use the default value. It looks like your code should do this though?

It seems I just need to get a working environment for Windows which seems more trickier than it seems, the most basic tests don't even start on it (backslash issues, import errors on fstests).

Yes you need to use the filepath module instead of the path module for cross platform paths.

And thanks for the extensive review, quite helpful.

No worries!

BTW I have made a start on getting mount to be able to use Read and Write handles. I've implemented a very simple cache there, but at some point I'd like to be re-using the internals from your cache backend directly.

@remusb
Copy link
Collaborator Author

remusb commented Nov 8, 2017

@ncw I can't replicate the last failure in travis. Even ran the test in a loop until it's encountered but nothing. Could be environmental as that just means an I/O interruption in a reading.

Have you seen this before in other PRs?
If not, I'm concerned about the possibility of being transient which means it can influence other PRs.
If yes, can I retest it somehow?

@ncw
Copy link
Member

ncw commented Nov 9, 2017

Hmm normally the tests are pretty reliable so it seems likely to me that it is in your code. I note that the failure is using go test -cpu=2 -race - did you try that? Have you got more than one CPU on your dev machine?

I've pressed the rebuild button to see if we get a different result this time!

@toomuchio
Copy link
Contributor

Not to side track, are there plans later on to expand this out to take advantage of the changelog endpoints to keep this up to date outside of changes made within rclone mount?

@remusb
Copy link
Collaborator Author

remusb commented Nov 11, 2017

Something broke somewhere and I can't pin it. You added some sort of buffering in between fuse and the remotes? Was there any other major change that could help me understand what could it be?

Tests pass, content matches but media players don't seem to want to play the files anymore for reasons that I can't seem to find. Still trying to get to the bottom of it. I rebased by the way, I suspect I was running locally on an older base than what was running in travis.
The behaviour I'm seeing:
These are all offsets, not sizes. ReadFileHandle.Read is directly on the mount handle while everything else is from cache. You can see how it's kind of disconnected.
Something is reading from cache way in the future and at some points the FS handle seeks where it is and everything has to pickup from a cold start again (roughly cold start, the chunks are already on disk, just not in the RAM). I suspect that these disconnects causes the players to panic as I have seen this often before adding the RAM support for streams.

2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 8k
2017/11/11 15:11:39 ERROR : small/file.mkv: seeked 8k
2017/11/11 15:11:39 ERROR : small/file.mkv: queueing 0
2017/11/11 15:11:39 ERROR : small/file.mkv: queueing 5M
2017/11/11 15:11:39 ERROR : small/file.mkv: queueing 10M
2017/11/11 15:11:39 ERROR : small/file.mkv: queueing 15M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 12k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 20k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 20k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 36k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 68k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 28k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 132k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 260k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 516k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 1.004M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 2.004M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 3.004M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 4.004M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 5M
2017/11/11 15:11:39 ERROR : small/file.mkv: queueing 20M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 5.004M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 36k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 6.004M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 7.004M
2017/11/11 15:11:39 ERROR : small/file.mkv: seeked 36k
2017/11/11 15:11:39 ERROR : small/file.mkv: queueing 0
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 40k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 48k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 64k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 96k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 160k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 288k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 544k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 1.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 2.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 3.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 4.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 5M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 5.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 6.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 7.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 8.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 9.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 40k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 10M
2017/11/11 15:11:39 ERROR : small/file.mkv: queueing 25M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 10.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 48k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 11.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 60k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 12.031M
2017/11/11 15:11:39 ERROR : worker-2 <file.mkv>: downloaded 25M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 68k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 13.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 136k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 14.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 204k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 272k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 15M
2017/11/11 15:11:39 ERROR : small/file.mkv: queueing 30M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 15.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 340k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 408k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 472k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 540k
2017/11/11 15:11:39 ERROR : worker-1 <file.mkv>: downloaded 30M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 16.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 608k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 676k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 744k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 812k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 880k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 944k
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1012k
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 17.031M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.055M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.121M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.188M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.254M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.316M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.383M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.449M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.516M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.582M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.648M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.715M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.777M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.844M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.910M
2017/11/11 15:11:39 ERROR : small/file.mkv: ReadFileHandle.Read offset 1.977M
2017/11/11 15:11:39 ERROR : small/file.mkv: reading 18.031M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.043M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.109M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.172M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.238M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.305M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.371M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.438M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.504M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.570M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.633M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.699M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.766M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.832M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.898M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 2.965M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.027M
2017/11/11 15:11:40 ERROR : small/file.mkv: reading 19.031M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.094M
2017/11/11 15:11:40 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.160M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.227M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.293M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.359M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.426M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.488M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.555M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.621M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.688M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.754M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.820M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.883M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 3.949M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 4M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.016M
2017/11/11 15:11:41 ERROR : small/file.mkv: reading 20M
2017/11/11 15:11:41 ERROR : small/file.mkv: queueing 35M
2017/11/11 15:11:41 ERROR : small/file.mkv: reading 20.031M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.066M
2017/11/11 15:11:41 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.133M
2017/11/11 15:11:41 ERROR : worker-0 <file.mkv>: downloaded 35M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.199M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.266M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.332M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.398M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.461M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.527M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.594M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.660M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.727M
2017/11/11 15:11:42 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.793M
2017/11/11 15:11:43 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.859M
2017/11/11 15:11:43 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.922M
2017/11/11 15:11:43 ERROR : small/file.mkv: ReadFileHandle.Read offset 4.988M
2017/11/11 15:11:43 ERROR : small/file.mkv: reading 21.031M
2017/11/11 15:11:43 ERROR : small/file.mkv: ReadFileHandle.Read offset 6.402M
2017/11/11 15:11:43 ERROR : small/file.mkv: seeked 6.402M
2017/11/11 15:11:43 ERROR : small/file.mkv: queueing 0
2017/11/11 15:11:43 ERROR : small/file.mkv: queueing 5M
2017/11/11 15:11:43 ERROR : small/file.mkv: queueing 10M
2017/11/11 15:11:43 ERROR : small/file.mkv: queueing 15M
2017/11/11 15:11:43 ERROR : small/file.mkv: reading 6.406M
2017/11/11 15:11:43 ERROR : small/file.mkv: reading 6.414M
2017/11/11 15:11:43 ERROR : small/file.mkv: reading 6.430M
2017/11/11 15:11:43 ERROR : small/file.mkv: reading 6.461M
2017/11/11 15:11:43 ERROR : small/file.mkv: reading 6.523M
2017/11/11 15:11:43 ERROR : small/file.mkv: reading 6.648M
2017/11/11 15:11:43 ERROR : small/file.mkv: reading 6.898M
2017/11/11 15:11:43 ERROR : small/file.mkv: ReadFileHandle.Read offset 119.168M
2017/11/11 15:11:43 ERROR : small/file.mkv: reading 7.398M
2017/11/11 15:11:43 ERROR : small/file.mkv: seeked 119.168M
2017/11/11 15:11:43 ERROR : small/file.mkv: queueing 110M
2017/11/11 15:11:43 ERROR : small/file.mkv: queueing 115M
2017/11/11 15:11:43 ERROR : small/file.mkv: queueing 120M
2017/11/11 15:11:43 ERROR : small/file.mkv: queueing 125M
2017/11/11 15:11:43 ERROR : small/file.mkv: queueing 130M
2017/11/11 15:11:43 ERROR : worker-3 <file.mkv>: downloaded 115M
2017/11/11 15:11:43 ERROR : worker-1 <file.mkv>: downloaded 125M
2017/11/11 15:11:43 ERROR : worker-0 <file.mkv>: downloaded 120M
2017/11/11 15:11:43 ERROR : worker-2 <file.mkv>: downloaded 110M
2017/11/11 15:11:43 ERROR : worker-3 <file.mkv>: downloaded 130M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 119.172M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 119.180M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 119.195M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 119.227M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 119.289M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 119.414M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 119.664M
2017/11/11 15:11:44 ERROR : small/file.mkv: ReadFileHandle.Read offset 276.527M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 120M
2017/11/11 15:11:44 ERROR : small/file.mkv: queueing 135M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 120.164M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 121.164M
2017/11/11 15:11:44 ERROR : small/file.mkv: reading 122.164M
2017/11/11 15:11:44 ERROR : small/file.mkv: seeked 276.527M
2017/11/11 15:11:44 ERROR : small/file.mkv: queueing 270M
2017/11/11 15:11:44 ERROR : small/file.mkv: queueing 275M
2017/11/11 15:11:44 ERROR : small/file.mkv: queueing 280M
2017/11/11 15:11:44 ERROR : small/file.mkv: queueing 285M
2017/11/11 15:11:44 ERROR : small/file.mkv: queueing 290M
2017/11/11 15:11:44 ERROR : worker-1 <file.mkv>: downloaded 135M
2017/11/11 15:11:44 ERROR : worker-3 <file.mkv>: downloaded 280M
2017/11/11 15:11:44 ERROR : worker-2 <file.mkv>: downloaded 270M
2017/11/11 15:11:44 ERROR : worker-0 <file.mkv>: downloaded 275M
2017/11/11 15:11:44 ERROR : worker-1 <file.mkv>: downloaded 285M
2017/11/11 15:11:44 ERROR : worker-3 <file.mkv>: downloaded 290M

2017/11/11 15:11:45 ERROR : small/file.mkv: reading 276.531M
2017/11/11 15:11:45 ERROR : small/file.mkv: ReadFileHandle.Read offset 276.531M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 276.539M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 276.555M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 276.586M
2017/11/11 15:11:45 ERROR : small/file.mkv: ReadFileHandle.Read offset 278.242M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 276.648M
2017/11/11 15:11:45 ERROR : small/file.mkv: seeked 278.242M
2017/11/11 15:11:45 ERROR : small/file.mkv: queueing 270M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 278.246M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 278.254M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 278.270M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 278.301M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 278.363M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 278.488M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 278.738M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 279.238M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 280M
2017/11/11 15:11:45 ERROR : small/file.mkv: queueing 295M
2017/11/11 15:11:45 ERROR : small/file.mkv: reading 280.238M

@toomuchio ideally yes. Right now cached data expires in configurable times but eventually I would like to add support for that.

@ncw
Copy link
Member

ncw commented Nov 11, 2017

Something broke somewhere and I can't pin it. You added some sort of buffering in between fuse and the remotes? Was there any other major change that could help me understand what could it be?

What I did was factor the VFS layer out of mountlib so I could re-use it elsewhere. This was a really big refactor so it is possible I broke stuff (though the tests all pass).

Something is reading from cache way in the future

This is likely the readahead done in the Accounting layer. If you set --buffer-size 0 does it disappear?

and at some points the FS handle seeks where it is and everything has to pickup from a cold start again (roughly cold start, the chunks are already on disk, just not in the RAM).

If the chunks are on disk, they are very likely still in the OS buffercache so are probably effectively in RAM too - I wouldn't have thought this should make very much difference?

I suspect that these disconnects causes the players to panic as I have seen this often before adding the RAM support for streams.

It would probably be a good idea to measure the max latency of each request somehow - I suspect if anything, the payers are sensitive to that. A few lines of code in the mount ReadAt command would show that.

I think to make this work really well we are going to have to give mount more access to the cache internals. I'd like to do that in the future though!

@remusb
Copy link
Collaborator Author

remusb commented Nov 12, 2017

@ncw Ok, I found the issue. It's because I removed the SeekOption from reading on the source remote. Local doesn't support it and naturally when I'm performing my tests I'm using that. The cache workers were essentially just downloading the first part of the file and it broke the file completely.

Tests didn't picked it up either cause it happened after the first chunk and that's MBs large which wasn't used anywhere. I added a test to go over a large file byte by byte on cache to ensure it doesn't happen again.

Thing is, I added both options now. Remotes that support one or the other will just work. Remotes that support both of them will be kind of unknown. I've tested with drive and it seems that RangeOption wins over SeekOption. This is in dire need of a feature to be advertised by remotes on how they support reading. I can open a PR at some point for it but just not in this one, it will touch a lot although the risk should be minimal for it.

@ncw
Copy link
Member

ncw commented Nov 15, 2017

@remusb glad you fixed it!

Is it ready to merge now?

I made an issue about the RangeOption #1825 - your thoughts on there would be appreciated.

Remotes that support both of them will be kind of unknown. I've tested with drive and it seems that RangeOption wins over SeekOption.

Looking at the code I think rclone will supply two Range: headers - I have no idea what will happen in that case! Depends on the server...

@remusb
Copy link
Collaborator Author

remusb commented Nov 15, 2017

@ncw I don't have any other reason not to merge now. I see that the travis Unexpected EOF error stopped appearing after the latest batch of fixes. If it does appear again in another PR, I would want to disable the internal tests and isolate the troubleshooting.

The current state has been running locally on my end for the past couple of days and things look good.

We might get reports of file reading not working properly across other remotes as that Range option is flaky but we can deal with it in the issue you opened already.

@ncw
Copy link
Member

ncw commented Nov 15, 2017

Fantastic.

I'm going to merge this now then!

@ncw ncw merged commit 016abf8 into rclone:master Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants