Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make writes more reliable through cache #1936

Closed
remusb opened this issue Dec 22, 2017 · 9 comments

Comments

@remusb
Copy link
Collaborator

commented Dec 22, 2017

This is to track future work on cache to make uploads and general writes through cache more fault tolerant.

Ideas I have so far:

  1. cache will add the files in the storage and make it available for reading once that is done
  2. in the background, it will make use of rclone's syncing functionality to reliably upload the file to where it was originally intended to be

A couple of things I can think of:

  1. Cache will now be responsible of data. Before this, data would be manipulated based on user actions only but now it's actually going to do the backup on the cloud provider on its own.
  2. Naturally the temporary storage for files yet to be uploaded must be independent of the cache storage. This is to ensure that 1 is met (partially) and that it can resume them if rclone crashes for some reason.
  3. 2 might double the size of a file that it takes on disk. It's safer but more costly. Need to think about this a bit more
  4. Folder merge, overwrites, etc. All these cases that require input would not be possible in the background. At the same time I would like cache to throttle the backups to not put pressure on the cloud provider.
@zenjabba

This comment has been minimized.

Copy link

commented Dec 22, 2017

  1. Disk space isn't a major issue these days given we have so much being stored in the cloud. I would suggest doing things the "safe" way.
This was referenced Jan 28, 2018

@remusb remusb added this to the v1.40 milestone Jan 30, 2018

@remusb

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 31, 2018

@JAC2703

Thanks - this behavies strangly with --cache-writes in that nothing seems to end up in the --cache-tmp-upload-path directory, but it does end up in the --cache-db-path directories.

I guess I shouldn't really be using the --cache-writes? I think I want to though, as although it's duplication for a time they are different caches. For example when the upload completes (after the specified time) it's deleted, however I may still need to cache it as part of the main cache - if that makes any sense at all?! ;)

Actually it's either --cache-writes or --cache-tmp-upload-path. --cache-tmp-upload-path has priority.
Basically if you specify --cache-tmp-upload-path then --cache-writes is ignored at this point.

The temporary upload storage is doing the same thing as cache-writes. Reading is done directly from it too so there's not much point in the complexity of keeping 2 cached datas in sync (which will turn in bad performance)

But I do see your point. I actually thought about this when starting this feature and I partially agreed that caching the data even when it is in temporary storage would be ok but dropped it.

I'll give it some more thought and maybe leave it open for a discussion here.

@JAC2703

This comment has been minimized.

Copy link

commented Jan 31, 2018

It's an interesting one!

It seems a shame to 'loose' the data held in the upload cache once the upload has completed, especially if I'm actively saying `--cache-writes'. To me, at the point the upload cache item is removed it should be transferred to the cache backend. Ideally there would be no performance detraction because any changes would be managed by the upload cache until the cut over. I can, however, see a problem with performance during the cutover as I suspect (sorry, I haven't read the code to understand the mechanism) you'd have to convert the upload cached item into chunks to match the cache backend format.

Actually it's either --cache-writes or--cache-tmp-upload-path.--cache-tmp-upload-pathhas priority. Basically if you specify--cache-tmp-upload-path` then --cache-writes is ignored at this point.

What I noticed with the beta v99 was that when I specified both it was only the backend cache directory that was filling and nothing was going into the upload cache. I might have been looking at it boz eyed so I'll double check.

@danielloader

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2018

Instead of opening a new issue I thought I'd ask here:

Is it functioning correctly that the upload cache deletes the local file in the upload cache directory but leaving the empty directory tree behind? After weeks of cached writes the directory tree is quite big now full of empty dirs, I'm running a cron for find /path/to/uploadcache -type d -empty -delete but was wondering if this is the best way to do it or if it's meant to be done via the cache itself.

@zenjabba

This comment has been minimized.

Copy link

commented Mar 2, 2018

@danielloader

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2018

I wouldn't say I'm worried but when I'm running a watch command with tree to monitor what's in the cache it'd be nice to remove the empty directories from that.

@remusb

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 2, 2018

Yep, that is annoying I know. Especially in their encrypted form, if I want to check a file, it's damn near impossible to locate it.

It should be fairly fast and easy to clean up empty folders after a file is uploaded. I'll check it out after I finish this round of changes I have in the line.

@danielloader

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2018

Just want to say you're doing great work and it's obviously a low urgency issue.

I tried wrapping the cacheupload dir with the crypt mount so I could see what's being uploaded but it was spotty at best.

@remusb

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 15, 2018

I'm closing this issue for 1.4

The dir cleanup will be done as part of a new issue after 1.4 gets released

@remusb remusb closed this Mar 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.