Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hasher backend and KV store #5587

Merged
merged 5 commits into from
Oct 20, 2021
Merged

Add hasher backend and KV store #5587

merged 5 commits into from
Oct 20, 2021

Conversation

ivandeex
Copy link
Member

@ivandeex ivandeex commented Sep 9, 2021

What is the purpose of this change?

This PR adds new hasher backend after preliminary design proposal in #949 (comment). The actual design is described below and solves the following Goals:

  1. Fix slow hashing of large files on LocalFS, FTP, SFTP
  2. Emulate hash types unimplemented by backends
  3. Warm up checksum cache from external SUM files for large pre-hashed data arrays
  4. Implemented as a transitive backend because VFS does NOT need checksums but CLI operations DO

Getting Started

So you want to cache or otherwise handle checksums for existing myRemote:path or /local/path?
Install patched rclone.
Read documentation.
Open ~/.config/rclone.conf in a text editor and add new section(s):

[Hasher]
type = hasher
remote = myRemote:path
hashes = md5
max_age = 30d

[Hasher2]
type = hasher
remote = /local/path
hashes = dropbox,sha1
max_age = 24h

The backend takes basically the following parameters:

  • remote (like alias, required)
  • hashes - comma separated list (by default md5,sha1)
  • max_age - maximum time to keep a checksum value, e.g. 1h30m or 30d
    0 will disable caching completely,
    off will cache "forever" (i.e. until the files get changed)

UPDATE: hash_names renamed to hashes, can be separated by comma only (blanks not allowed anymore)

Use it as Hasher2:subdir/file instead of base remote. Hasher will transparently update cache with new checksums when a file is fully read or overwritten, like:

rclone copy External:path/file Hasher:dest/path

rclone cat Hasher:path/to/file > /dev/null

The way to refresh all cached checksums (even unsupported by the base backend) for a subtree is to re-download all files in the subtree. For example, use hashsum --download using any supported hashsum on the command line (we just care to re-read):

rclone hashsum MD5 --download Hasher:path/to/subtree > /dev/null

rclone backend dump Hasher:path/to/subtree

How It Works

rclone hashsum (or md5sum or sha1sum):

  1. if requested hash is supported by lower level, just pass it.
  2. if object size is below auto_size then download object and calculate requested hashes on the fly.
  3. if unsupported and the size is big enough, build object fingerprint (including size, modtime if supported, first-found other hash if any).
  4. if the strict match is found in cache for the requested remote, return the stored hash.
  5. if remote found but fingerprint mismatched, then: purge the entry and proceed to step 6.
  6. if remote not found or had no requested hash type or after step 5: download object, calculate all supported hashes on the fly and store in cache; return requested hash.

Other operations:

  • whenever a file is uploaded or downloaded in full, capture the stream to calculate all supported hashes on the fly and update database
  • server-side move will update keys of existing cache entries
  • delete will remove cache entry
  • purge will remove all cache entries underneath
  • periodically prune entries (if max_age is not off)
    (this helps against bit-rot or database thrashing when a file got removed externally)

Pre-Seed from a SUM File

Hasher supports two backend commands: generic SUM file import and faster but less consistent stickyimport.

rclone backend import Hasher:dir/subdir SHA1 /path/to/SHA1SUM [--checkers 4]

Instead of SHA1 it can be any hash supported by the remote. The last argument can point to either a local or an other-remote:path text file in SUM format. The command will parse the SUM file, then walk down the path given by the first argument, snapshot current fingerprints and fill in the cache entries correspondingly.

  • Paths in the SUM file are treated as relative to hasher:dir/subdir.
  • The command will not check that supplied values are correct. You must know what you are doing.
  • This is a one-time action. The SUM file will not get "attached" to the remote. Cache entries can still be overwritten later, should the object's fingerprint change.
  • The tree walk can take long depending on the tree size. You can increase --checkers to make it faster. Or use stickyimport if you don't care about fingerprints and consistency.
rclone backend stickyimport hasher:path/to/data sha1 remote:/path/to/sum.sha1

stickyimport is similar to import but works much faster because it does not need to stat existing files and skips initial tree walk. Instead of binding cache entries to file fingerprints it creates sticky entries bound to the file name alone ignoring size, modification time etc. Such hash entries can be replaced only by purge, delete, backend drop or by full re-read/re-write of the files.

Cache Storage

Note: Details updated after recent KV redesign

Cached checksums are stored as bolt database files under ~/.cache/rclone/kv/, one per base backend, named like BaseRemote~hasher.bolt. Checksums for multiple alias-es into a single base backend will be stored in the single database. All local paths are treated as aliases into the local backend (unless crypted or chunked) and stored in ~/.cache/rclone/kv/local~hasher.bolt.

By default database is opened in read-only mode and can be shared by multiple rclone instances. When a change is needed, rclone will reopen database in exclusive read-write mode for a moment and reopen it back for sharing after changing to let other instances access it.

The bolt engine shows good performance for databases with up to a million entries on ordinary average hardware. Reopening is cheap and takes less than a millisecond.

You can print or drop database using custom backend commands:

rclone backend dump Hasher:dir/subdir

rclone backend drop Hasher:

Boilerplate

Was the change discussed in an issue or in the forum before?

Fixes #949
Fixes #157
Fixes #626

Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

@ivandeex ivandeex self-assigned this Sep 9, 2021
This was referenced Sep 9, 2021
@ivandeex ivandeex requested a review from ncw September 9, 2021 07:54
@ivandeex ivandeex added this to the v1.57 milestone Sep 9, 2021
@ivandeex

This comment has been minimized.

@ncw
Copy link
Member

ncw commented Sep 9, 2021

This is a great implementation @ivandeex. I was imagining that we'd patch all the backends but this is much better.

I wonder whether we should support easy integration of this as nested remotes are a bit of a pain to set up... I had the idea that wrapping backends should support doing :hasher:remote:mypath. This is easy to do, if remote is empty in NewFs then use root instead. I made a zip and a tar backend using this and it worked quite well. What do you think?

  1. Documentation's not in the patch yet.
    After a code review I'll slightly edit the top post here and add to the commit.

I'll give it a quick once over now.

  1. Generic backend integration test passed successfully.
  2. Unit tests for hasher specific features not done yet. Probably in the next PR.
  3. Code quality test is addressed by build: apply gofmt from golang 1.17 #5588
    (will rebase when that's merged)

Reviewed :-)

  1. Betas to play with are published as iva releases at https://github.com/ivandeex/rclone/releases

I'd like to get all devs on the same page with releasing test binaries to users. At the moment if you push to a branch on the main repo you'll get betas built etc. What I'd like to do is make it so that if a team member makes a PR then this also makes a beta - what do you think of that idea? Ideally we'd make some actions bot which posted a link in the PR once it was built.

@ivandeex
Copy link
Member Author

ivandeex commented Sep 9, 2021

I wonder whether we should support easy integration of this as nested remotes are a bit of a pain to set up...

👍 👍 👍

I had the idea that wrapping backends should support doing :hasher:remote:mypath. This is easy to do, if remote is empty in NewFs then use root instead.

👍 An API like fs.NewFsChain(":wrapper,wrap_opt1=val1:remote,inner_opt1=val2:mypath") for

  • building an FS chain from extended remote path and
  • returning the top-most Fs from the new chain

seems reasonable and straightforward in the case when wrapping remote has empty root relative to the inner remote (we could even stuff the chain builder right into good old fs.NewFs ???)

👍 FS chains with empty wrapper root should be straightforward on the command line too.

👎 I can't imagine INI syntax for FS chains even with empty wrapper root. Probably needs TOML, not INI ???

👎 I can't imagine what would extended remote path be for wrappers with non-empty root like :wrapper,opt1=val1[[[wrapper-root-goes-somewhere-here]]]:remote,opt2=val2:/path/to/remote/root

I'll give it a quick once over now.
[5min]
Reviewed :-)

I will add docs tomorrow. I'm not nearly as lightning fast :)

I'd like to get all devs on the same page with releasing test binaries to users. At the moment if you push to a branch on the main repo you'll get betas built etc.

I'd be happy to make a beta branch for bisync!
Upd: https://github.com/rclone/rclone/tree/pr-bisync

What I'd like to do is make it so that if a team member makes a PR then this also makes a beta - what do you think of that idea? Ideally we'd make some actions bot which posted a link in the PR once it was built.

I don't know. I just can test out whatever you make.

@ivandeex ivandeex force-pushed the pr-hasher branch 2 times, most recently from 55b93aa to 4709c9c Compare September 9, 2021 18:23
@ivandeex
Copy link
Member Author

ivandeex commented Sep 9, 2021

@ncw Please verify whether this patch fixes #157

Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this a quick once over - looking great!

I'd like to have a discussion about the use of bolt? I'd like to only have one KV database in rclone (once we have disabled the cache backend). Is bolt the right one?

backend/hasher/bolt.go Outdated Show resolved Hide resolved
backend/hasher/hasher.go Outdated Show resolved Hide resolved
backend/hasher/hasher.go Outdated Show resolved Hide resolved
backend/hasher/hasher.go Outdated Show resolved Hide resolved
backend/hasher/hasher_test.go Outdated Show resolved Hide resolved
backend/hasher/object.go Show resolved Hide resolved
docs/content/hasher.md Outdated Show resolved Hide resolved
docs/content/hasher.md Outdated Show resolved Hide resolved
docs/content/hasher.md Outdated Show resolved Hide resolved
@ncw
Copy link
Member

ncw commented Sep 10, 2021

@ivandeex wrote:

I wonder whether we should support easy integration of this as nested remotes are a bit of a pain to set up...
I had the idea that wrapping backends should support doing :hasher:remote:mypath. This is easy to do, if remote is empty in NewFs then use root instead.

An API like fs.NewFsChain(":wrapper,wrap_opt1=val1:remote,inner_opt1=val2:mypath") for
building an FS chain from extended remote path and returning the top-most Fs from the new chain seems reasonable and straightforward in the case when wrapping remote has empty root relative to the inner remote (we could even stuff the chain builder right into good old fs.NewFs ???)

I was thinking that if you called wrapper.NewFs with remote="" it would say, ah no remote, lets use root instead.

This would enable this syntax to just work...

:wrapper,wrap_opt1=val1:remote,inner_opt1=val2:mypath
wrapper.NewFs would be called with opt={wrap_opt1=val1}, root=remote,inner_opt1=val2:mypath

It would then decide that since its config remote was empty to set remote = remote,inner_opt1=val2:mypath and root="" and then call remote.NewFs with inner_opt and root=mypath.

So I don't think a new API is needed as long as we are happy with that syntax.

FS chains with empty wrapper root should be straightforward on the command line too.
I can't imagine INI syntax for FS chains even with empty wrapper root. Probably needs TOML, not INI ???
I can't imagine what would extended remote path be for wrappers with non-empty root like :wrapper,opt1=val1[[[wrapper-root-goes-somewhere-here]]]:remote,opt2=val2:/path/to/remote/root

Yes that is a potential problem. So we are saying that if we had a hasher set up on /backup how do we access a path /backup/current?

What would hasher do if it you set up a hasher for /backup and then tried to use /backup/current. Is that effectively a new hasher? The VFS cache is designed so that you can do that, but it is quite a heavy load to impose on all backends.

Here are some ideas

:wrapper,opt1=val1,remote='remote,opt2=val2:/backup':current  # this would work now
:wrapper,opt1=val1,root=current:remote,opt2=val2:/backup         # use root as a parameter as well?
:wrapper,opt1=val1:remote,opt2=val2:/backup//current                # parse off wrapper root from end?

I'd like to get all devs on the same page with releasing test binaries to users. At the moment if you push to a branch on the main repo you'll get betas built etc.

I'd be happy to make a beta branch for bisync! Upd: https://github.com/rclone/rclone/tree/pr-bisync

Great :-)

What I'd like to do is make it so that if a team member makes a PR then this also makes a beta - what do you think of that idea? Ideally we'd make some actions bot which posted a link in the PR once it was built.

I don't know. I just can test out whatever you make.

I figured out how to do everything except work out if the PR is from a team member, but I'll carry on!

[ivandeex] I intended to edit a quoted reply but edited your post by mistake. I tried to do the best to restore.

@ivandeex
Copy link
Member Author

ivandeex commented Sep 10, 2021

I'd like to only have one KV database in rclone (once we have disabled the cache backend).

Currently only hasher plans to use it.
Probably the resumer framework can pretend to have a kind of resumption store.

We could however make it a common API:

  • factor hasher/bolt.go to lib/kv/bolt.go
  • rename interface hashOp to kvOp
  • make --hasher-lock-time a global option --kv-lock-time
  • change database location from ~/.cache/rclone/hasher/REMOTE to ~/.cache/rclone/kv/REMOTE
  • run a central pool of database goroutines per rclone instance (one goroutine per database)
  • factor hasher.Fs.do(hasherOp) into lib/kv.Do(remote, kvOp)
  • develop new kinds of kvOp to satisfy more internal consumers if any (after this PR)
  • etc

I'd be happy to factor KV store away and make the backend just use an API.
Anyway it will be a transient cache-oriented KV store not suitable for object metadata as the latter will have to be shared between rclone instances on multiple computers.

Is bolt the right one?

bolt is a satisfactory one.

@ivandeex
Copy link
Member Author

ivandeex commented Sep 10, 2021

What would hasher do if it you set up a hasher for /backup and then tried to use /backup/current.

Hasher will jump in if you setup Hasher{remote=/backup} and use the path Hasher:current.
I provided examples in the top post here (and will make it into the backend docs).

Is that effectively a new hasher?

No. /backup/current is a short for local:/backup/current handled by the local backend.
Hasher does not install hooks in the local backend. It reacts to its own paths.

The VFS cache is designed so that you can do that, but it is quite a heavy load to impose on all backends.

VFS jumps in when you mount VFS controlled stuff somewhere and access the mount point.
In fact you install a hook in the kernel and let it trigger.
rclone mounts do not use file checksums so far so Hasher is irrelevant here (maybe in future mounts will return checksums as extended file attributes).

@ivandeex
Copy link
Member Author

ivandeex commented Sep 10, 2021

Currently wrapping(transient) rclone backends only provide means for wrapping other remotes. You are talking about "injecting" (or mixing in) behaviors of transient backends into other remotes. I didn't set such a goal. rclone lacks internal APIs for such mixins. It's out of scope for this PR.

@ivandeex
Copy link
Member Author

ivandeex commented Sep 10, 2021

Here are some ideas

:wrapper,opt1=val1,remote='remote,opt2=val2:/backup':current  # this would work now
:wrapper,opt1=val1,root=current:remote,opt2=val2:/backup      # use root as a parameter as well?
:wrapper,opt1=val1:remote,opt2=val2:/backup//current          # parse off wrapper root from end?

Let me experiment with your sketch for some time. Maybe I can deliver a PR with new rpath parser in 2-3 months.
At any rate I don't have enough time to complete it before 1.57.

@ncw
I planned to use September for publishing a number of patches developed this summer:
a few ftp fixes, automount support, mount helper, hasher...
and probably bisync (only docs fixes and if you accept it without --backup-dir, modtime support and sessions, otherwise I'll keep it on beta branch and postpone till 1.58).

@ivandeex
Copy link
Member Author

ivandeex commented Sep 10, 2021

I was imagining that we'd patch all the backends but this is much better.

Probably adding optional hashes parameter to every backend and wrapping backend.Object.Hash is a viable alternative......

Update: hashes was previously known as hash_names

Copy link
Member Author

@ivandeex ivandeex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

backend/hasher/hasher.go Outdated Show resolved Hide resolved
@ncw
Copy link
Member

ncw commented Sep 11, 2021

I'd like to only have one KV database in rclone (once we have disabled the cache backend).

Currently only hasher plans to use it.
We could however make it a common API:

  • factor hasher/bolt.go to lib/kv/bolt.go
  • rename interface hashOp to kvOp
  • make --hasher-lock-time a global option --kv-lock-time
  • change database location from ~/.cache/rclone/hasher/REMOTE to ~/.cache/rclone/kv/REMOTE

This should probably have a separate name for each user ~/.cache/rclone/kv/hasher/REMOTE ~/.cache/rclone/kv/VFS/REMOTE (or something!)

  • run a central pool of database goroutines per rclone instance (one goroutine per database)
  • etc

Bolt is really the right solution for the VFS metadata - when there gets to be a lot of it, scanning lots of little JSON files is very slow. However I didn't want to commit to Bolt at the time and was wary of the multiple process limitations.

Another place I'd like a KV database is doing syncs larger than memory - eg sync two 20 million entry directories - without using 20 GB of RAM.

I think your idea to stick it in lib is a good one and probably worth doing now.

Is bolt the right one?

bolt is a satisfactory one.

I was looking also at badger however my main concern is longevity and stability rather than absolute performance.

Currently wrapping(transient) rclone backends only provide means for wrapping other remotes. You are talking about "injecting" (or mixing in) behaviors of transient backends into other remotes. I didn't set such a goal. rclone lacks internal APIs for such mixins. It's out of scope for this PR.

Yes lets discuss this later. I just wanted to share my ideas.

Here are some ideas

:wrapper,opt1=val1,remote='remote,opt2=val2:/backup':current  # this would work now
:wrapper,opt1=val1,root=current:remote,opt2=val2:/backup         # use root as a parameter as well?
:wrapper,opt1=val1:remote,opt2=val2:/backup//current                # parse off wrapper root from end?

I planned to use September for publishing a number of prepared patches: ftp, autofs mounts, probably bisync.
Let me experiment with your sketch for a month. Maybe I can deliver a PR with new rpath parser in 2-3 months.

It would be a good idea to finish some of the things we've started. I've got a list of things too sitting in branches here.

@ivandeex
Copy link
Member Author

ivandeex commented Sep 11, 2021

This should probably have a separate name for each user ~/.cache/rclone/kv/hasher/REMOTE ~/.cache/rclone/kv/VFS/REMOTE (or something!)

Alternatively we could use multiple bolt buckets per database, but the more files the better due to fine grained locking.

The "user" approach also calls for:

  • extended API like kv.Do(ctx context.Context, user string, f fs.Fs, op kvOp) error
  • something like a map[user_string][fs.Fs]func (...) (pool) of goroutines
  • a common function to clean and encode remote names like kv.CleanFsName(fs.Fs)

Another place I'd like a KV database is doing syncs larger than memory - eg sync two 20 million entry directories - without using 20 GB of RAM.

I also had an idea to use a struct Trie (details don't matter a.t.m.) for large recursive listings to replace long paths by 32-bit integers around the code. Could save some RAM especially in sync operations.

I was looking also at badger however my main concern is longevity and stability rather than absolute performance.

If we make it an API then "procedural users" will accept anything.
If we decide to switch to another database engine later, the "human users" will notice just a one-time delay due to database regeneration.

Yes lets discuss this later. I just wanted to share my ideas.

I'll create a dedicated ticket and copy the discussion there.

It would be a good idea to finish some of the things we've started.

Holy true.

@ncw
Copy link
Member

ncw commented Sep 12, 2021

This should probably have a separate name for each user ~/.cache/rclone/kv/hasher/REMOTE ~/.cache/rclone/kv/VFS/REMOTE (or something!)

Alternatively we could use multiple bolt buckets per database, but the more files the better due to fine grained locking.

I like the idea of one file per "purpose" if you see what I mean.

The "user" approach also calls for:

  • extended API like kv.Do(ctx context.Context, user string, f fs.Fs, op kvOp) error
  • something like a map[user_string][fs.Fs]func (...) (pool) of goroutines
  • a common function to clean and encode remote names like kv.CleanFsName(fs.Fs)

There is the stuff in the VFS cache which has had many patches and we should probably start from! This could be factored out into fs maybe?

// Get cache root path.
// We need it in two variants: OS path as an absolute path with UNC prefix,
// OS-specific path separators, and encoded with OS-specific encoder. Standard path
// without UNC prefix, with slash path separators, and standard (internal) encoding.
// Care must be taken when creating OS paths so that the ':' separator following a
// drive letter is not encoded (e.g. into unicode fullwidth colon).
var err error
parentOSPath := config.CacheDir // Assuming string contains a local path in OS encoding
if parentOSPath, err = filepath.Abs(parentOSPath); err != nil {
return nil, errors.Wrap(err, "failed to make --cache-dir absolute")
}
fs.Debugf(nil, "vfs cache: root is %q", parentOSPath)
parentPath := fromOSPath(parentOSPath)
// Get a relative cache path representing the remote.
relativeDirPath := fremote.Root() // This is a remote path in standard encoding
if runtime.GOOS == "windows" {
if strings.HasPrefix(relativeDirPath, `//?/`) {
relativeDirPath = relativeDirPath[2:] // Trim off the "//" for the result to be a valid when appending to another path
}
}
relativeDirPath = fremote.Name() + "/" + relativeDirPath
relativeDirOSPath := toOSPath(relativeDirPath)

Another place I'd like a KV database is doing syncs larger than memory - eg sync two 20 million entry directories - without using 20 GB of RAM.

I also had an idea to use a struct Trie (details don't matter a.t.m.) for large recursive listings to replace long paths by 32-bit integers around the code. Could save some RAM especially in sync operations.

Interesting idea.

I was looking also at badger however my main concern is longevity and stability rather than absolute performance.

If we make it an API then "procedural users" will accept anything.
If we decide to switch to another database engine later, the "human users" will notice just a one-time delay due to database regeneration.

Agreed

Yes lets discuss this later. I just wanted to share my ideas.

I'll create a dedicated ticket and copy the discussion there.

👍

It would be a good idea to finish some of the things we've started.

Holy true.

:-)

@ivandeex
Copy link
Member Author

ivandeex commented Sep 12, 2021

This should probably have a separate name for each user ~/.cache/rclone/kv/hasher/REMOTE ~/.cache/rclone/kv/VFS/REMOTE (or something!)

We can consider various db file naming schemes below ~/.cache/rclone/kv:

  • SUBSYS/REMOTE - db files grouped in dirs by subsystem (vfs, hasher, etc)
  • REMOTE/SUBSYS - grouped by remote,
    for easier cleanup when remotes get deprecated
  • REMOTE~SUBSYS - flat scheme with some delimiter,
    for cleanup by shell globs *~vfs or myremote~*,
    delimiter can be ~, #, wide :, wide / or whatever prohibited in remote names
  • ✊🏻 REMOTE~SUBSYS.bolt - variation with explicit db format,
    for experiments with badger or schema migrations
  • etc

@ivandeex ivandeex changed the title Add hasher backend Add hasher backend and KV store Sep 12, 2021
@ivandeex
Copy link
Member Author

ivandeex commented Sep 15, 2021

common function to clean and encode remote names like kv.CleanFsName(fs.Fs)

There is the stuff in the VFS cache which has had many patches and we should probably start from! This could be factored out into fs maybe?

// Get cache root path.
// We need it in two variants: OS path as an absolute path with UNC prefix,
// OS-specific path separators, and encoded with OS-specific encoder. Standard path
// without UNC prefix, with slash path separators, and standard (internal) encoding.
// Care must be taken when creating OS paths so that the ':' separator following a
// drive letter is not encoded (e.g. into unicode fullwidth colon).
var err error
parentOSPath := config.CacheDir // Assuming string contains a local path in OS encoding
if parentOSPath, err = filepath.Abs(parentOSPath); err != nil {
return nil, errors.Wrap(err, "failed to make --cache-dir absolute")
}
fs.Debugf(nil, "vfs cache: root is %q", parentOSPath)
parentPath := fromOSPath(parentOSPath)
// Get a relative cache path representing the remote.
relativeDirPath := fremote.Root() // This is a remote path in standard encoding
if runtime.GOOS == "windows" {
if strings.HasPrefix(relativeDirPath, `//?/`) {
relativeDirPath = relativeDirPath[2:] // Trim off the "//" for the result to be a valid when appending to another path
}
}
relativeDirPath = fremote.Name() + "/" + relativeDirPath
relativeDirOSPath := toOSPath(relativeDirPath)

Note to myself. Look what was factored in reply to
#4547 (comment) and coordinate

@ivandeex ivandeex mentioned this pull request Sep 15, 2021
@ivandeex ivandeex modified the milestones: v1.57, v1.58 Sep 26, 2021
@ncw
Copy link
Member

ncw commented Oct 13, 2021

@ivandeex I've stuck c437002 onto your PR and reverted the test fix - that passes the hasher tests for me - does that look OK to you?

I didn't want to force push your branch - it needs squashing and re-ordering my commit should go first if you think it is OK.

@ivandeex
Copy link
Member Author

ivandeex commented Oct 13, 2021

I didn't want to force push your branch - it needs squashing and re-ordering my commit should go first if you think it is OK.

I think you know it better...
Squashing and reordering...

@ivandeex
Copy link
Member Author

ivandeex commented Oct 14, 2021

A quick note about last two commits...
After testing concurrent calling of kv.Start and db.Stop I had to restrict more parts of these under mutex to make results deterministic without Sleep-s in the test body.
It's more safe but has potential to lock Start for up to 1-2 seconds due to db.open.
Not fond of it, will seek for ways to improve performance in future.

Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this for 1.57

I see new backends as relatively low risk. This one is marked experimental so if we need to change the config/commands/interface the users have been warned!

ncw and others added 5 commits October 20, 2021 18:46
Before this change we checked that features.ReadMimeTime was set if
and only if the Object.MimeType method was implemented.

However this test is overly general - we don't care if Objects
advertise MimeType when features.ReadMimeTime is set provided that
they always return an empty string (which is what a wrapping backend
might do).

This patch implements that logic.
Add bolt-based key-value database support.

Quick API description:
rclone#5587 (comment)
After testing concurrent calling of `kv.Start` and `db.Stop` I had to restrict
more parts of these under mutex to make results deterministic without Sleep's
in the test body. It's more safe but has potential to lock Start for up to
2 seconds due to `db.open`.
@ivandeex
Copy link
Member Author

Merging
Deleting beta branch https://github.com/rclone/rclone/commits/pr-hasher

@ivandeex ivandeex merged commit 0d7426a into rclone:master Oct 20, 2021
ivandeex pushed a commit that referenced this pull request Oct 20, 2021
Before this change we checked that features.ReadMimeTime was set if
and only if the Object.MimeType method was implemented.

However this test is overly general - we don't care if Objects
advertise MimeType when features.ReadMimeTime is set provided that
they always return an empty string (which is what a wrapping backend
might do).

This patch implements that logic.
ivandeex added a commit that referenced this pull request Oct 20, 2021
Add bolt-based key-value database support.

Quick API description:
#5587 (comment)
ivandeex added a commit that referenced this pull request Oct 20, 2021
After testing concurrent calling of `kv.Start` and `db.Stop` I had to restrict
more parts of these under mutex to make results deterministic without Sleep's
in the test body. It's more safe but has potential to lock Start for up to
2 seconds due to `db.open`.
ivandeex added a commit that referenced this pull request Oct 20, 2021
@ivandeex ivandeex deleted the pr-hasher branch October 20, 2021 16:12
@ivandeex
Copy link
Member Author

@ncw
You might want to cleanup https://beta.rclone.org/branch/pr-hasher/

@ivandeex
Copy link
Member Author

ivandeex commented Nov 2, 2021

@Sarke
This ticket is implemented and closed.
Please post your question as a new ticket.

NyaMisty pushed a commit to NyaMisty/fclone that referenced this pull request Feb 5, 2022
Add bolt-based key-value database support.

Quick API description:
rclone/rclone#5587 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fs/local: implement hashsum cache Cache MD5s md5sum file support
3 participants