Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make writes more reliable through cache #1936

Closed
remusb opened this issue Dec 22, 2017 · 9 comments
Closed

make writes more reliable through cache #1936

remusb opened this issue Dec 22, 2017 · 9 comments
Assignees
Milestone

Comments

@remusb
Copy link
Collaborator

remusb commented Dec 22, 2017

This is to track future work on cache to make uploads and general writes through cache more fault tolerant.

Ideas I have so far:

  1. cache will add the files in the storage and make it available for reading once that is done
  2. in the background, it will make use of rclone's syncing functionality to reliably upload the file to where it was originally intended to be

A couple of things I can think of:

  1. Cache will now be responsible of data. Before this, data would be manipulated based on user actions only but now it's actually going to do the backup on the cloud provider on its own.
  2. Naturally the temporary storage for files yet to be uploaded must be independent of the cache storage. This is to ensure that 1 is met (partially) and that it can resume them if rclone crashes for some reason.
  3. 2 might double the size of a file that it takes on disk. It's safer but more costly. Need to think about this a bit more
  4. Folder merge, overwrites, etc. All these cases that require input would not be possible in the background. At the same time I would like cache to throttle the backups to not put pressure on the cloud provider.
@zenjabba
Copy link

  1. Disk space isn't a major issue these days given we have so much being stored in the cloud. I would suggest doing things the "safe" way.

This was referenced Jan 28, 2018
@remusb remusb added this to the v1.40 milestone Jan 30, 2018
@remusb
Copy link
Collaborator Author

remusb commented Jan 31, 2018

@JAC2703

Thanks - this behavies strangly with --cache-writes in that nothing seems to end up in the --cache-tmp-upload-path directory, but it does end up in the --cache-db-path directories.

I guess I shouldn't really be using the --cache-writes? I think I want to though, as although it's duplication for a time they are different caches. For example when the upload completes (after the specified time) it's deleted, however I may still need to cache it as part of the main cache - if that makes any sense at all?! ;)

Actually it's either --cache-writes or --cache-tmp-upload-path. --cache-tmp-upload-path has priority.
Basically if you specify --cache-tmp-upload-path then --cache-writes is ignored at this point.

The temporary upload storage is doing the same thing as cache-writes. Reading is done directly from it too so there's not much point in the complexity of keeping 2 cached datas in sync (which will turn in bad performance)

But I do see your point. I actually thought about this when starting this feature and I partially agreed that caching the data even when it is in temporary storage would be ok but dropped it.

I'll give it some more thought and maybe leave it open for a discussion here.

@JAC2703
Copy link

JAC2703 commented Jan 31, 2018

It's an interesting one!

It seems a shame to 'loose' the data held in the upload cache once the upload has completed, especially if I'm actively saying `--cache-writes'. To me, at the point the upload cache item is removed it should be transferred to the cache backend. Ideally there would be no performance detraction because any changes would be managed by the upload cache until the cut over. I can, however, see a problem with performance during the cutover as I suspect (sorry, I haven't read the code to understand the mechanism) you'd have to convert the upload cached item into chunks to match the cache backend format.

Actually it's either --cache-writes or--cache-tmp-upload-path.--cache-tmp-upload-pathhas priority. Basically if you specify--cache-tmp-upload-path` then --cache-writes is ignored at this point.

What I noticed with the beta v99 was that when I specified both it was only the backend cache directory that was filling and nothing was going into the upload cache. I might have been looking at it boz eyed so I'll double check.

@danielloader
Copy link
Contributor

danielloader commented Mar 2, 2018

Instead of opening a new issue I thought I'd ask here:

Is it functioning correctly that the upload cache deletes the local file in the upload cache directory but leaving the empty directory tree behind? After weeks of cached writes the directory tree is quite big now full of empty dirs, I'm running a cron for find /path/to/uploadcache -type d -empty -delete but was wondering if this is the best way to do it or if it's meant to be done via the cache itself.

@zenjabba
Copy link

zenjabba commented Mar 2, 2018 via email

@danielloader
Copy link
Contributor

I wouldn't say I'm worried but when I'm running a watch command with tree to monitor what's in the cache it'd be nice to remove the empty directories from that.

@remusb
Copy link
Collaborator Author

remusb commented Mar 2, 2018

Yep, that is annoying I know. Especially in their encrypted form, if I want to check a file, it's damn near impossible to locate it.

It should be fairly fast and easy to clean up empty folders after a file is uploaded. I'll check it out after I finish this round of changes I have in the line.

@danielloader
Copy link
Contributor

Just want to say you're doing great work and it's obviously a low urgency issue.

I tried wrapping the cacheupload dir with the crypt mount so I could see what's being uploaded but it was spotty at best.

@remusb
Copy link
Collaborator Author

remusb commented Mar 15, 2018

I'm closing this issue for 1.4

The dir cleanup will be done as part of a new issue after 1.4 gets released

@remusb remusb closed this as completed Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants