-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor cache metadata interface. #2316
Conversation
|
||
const keyDeleted = "cache.deleted" | ||
type MetadataStore interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't like this. Cache/metadata package should not know about every single possible caller it might have. A new caller can define a unique name for itself and get access to store information. If there are helper methods they should be on caller side. Same for GetSharedKey/GetGitRemote/GetGitSnapshot/Etag
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked it because it made it made it possible to easily know all the metadata that's being tracked for a given cache record by just checking one place (in the cache package) rather than searching all over the codebase. Understanding what metadata is being tracked for a given record can obviously be important when you are making behavioral changes to the cache package.
If you disagree with the approach, what would you think of exposing the generic methods for setting/getting arbitrary values and then having outside callers just use those? While not as convenient, you can at least still find usages of those generic getters/setters to track down metadata that's being associate with records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
know all the metadata that's being tracked
There shouldn't be any need to manage this together. A caller only knows about its use case, it should not have dependency/knowledge of other callers. If new callers are added or removed it should not concern cache pkg or other callers. Cache package just provides an arbitrary amount of callers access to persistent storage for a ref.
what would you think of exposing the generic methods for setting/getting arbitrary values
Yes, generic helpers without fixed keys are fine. Helpers with fixed keys are also fine on caller side (should be private though if they are only used by one package).
cache/metadata.go
Outdated
return si.SetValue(b, keyDiffID, v) | ||
}) | ||
return nil | ||
type Metadata interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more like RefMetadata
now. No changes needed if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that clarifies that it's specific to a record rather than metadata for the cache as a whole, will update
cache/manager.go
Outdated
if err := MigrateV2( | ||
context.TODO(), | ||
filepath.Join(opt.MetadataStoreRoot, "metadata.db"), | ||
filepath.Join(opt.MetadataStoreRoot, "metadata_v2.db"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this change. Why would the cache database need to have a constant name from the package standpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the cache package is the only one with direct access to the raw store and then just provides an interface for interacting with it to other packages (as opposed to before where a bunch of different packages had direct access to the raw store). It still needs to be told where to put the data on disk though, so I switched to this parameter.
This approach makes more sense to me but if you disagree I don't have any issue with switching back to passing the store as long as NewManager
is the only place it gets passed to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to me that you can't create two instances of manager with metadata.db in same dir. It is some internal structure limitation that leaks to the public API. I'm ok with for example just adding MetadataDBV1
or smth in manageropt if you want to move migration here. It is bit weird though that all other stores are initialized and passed in as interface, and these are now passed as strings.
cache/refs.go
Outdated
@@ -32,33 +30,19 @@ import ( | |||
// Ref is a reference to cacheable objects. | |||
type Ref interface { | |||
Mountable | |||
ID() string | |||
Metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change needed, but I kind of liked it as separate from the rest of the implementation. Guess it can still be nullable if implementation just defines embedded interface as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found that having all the getters/setters as methods on refs rather than detached functions made it way easier to write code and remember what metadata exists. It works much better with IDE autocompletion and is slightly less verbose.
Release(context.Context) error | ||
Size(ctx context.Context) (int64, error) | ||
Metadata() *metadata.StorageItem | ||
IdentityMapping() *idtools.IdentityMapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is IdentityMapping
removed. It is not fully connected (used atm only in moby) but seems like required data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this can be accessed after Mount
is called from snapshot.Mountable
. Not sure if this is enough for all cases or something might need this info without actually doing a mount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't see that it was ever used so I got rid of it. If moby needs it though then I'll just re-add it, not a big deal at all.
0336182
to
8cbffe8
Compare
cache/metadata.go
Outdated
GetString(string) string | ||
GetExternal(string) ([]byte, error) | ||
|
||
SetValue(string, interface{}, string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used where it requires to be public and interface{}
? Current interface is weird as there is GetString/SetValue
that don't have a matching getter/setter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it didn't make sense anymore now that the methods were public, changed to SetString
GetExternal(string) ([]byte, error) | ||
|
||
SetValue(string, interface{}, string) error | ||
SetExternal(string, []byte) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pair with GetExternal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a single random test failure in CI: https://gist.github.com/sipsma/a171fd146fa01cee69175e5a69ca793c Failure is from this line: buildkit/client/client_test.go Line 2665 in f314c4b
I can't reproduce locally, even running with |
There are a few goals with this refactor: 1. Remove external access to fields that no longer make sense and/or won't make sense soon due to other potential changes. For example, there can now be multiple blobs associated with a ref (for different compression types), so the fact that you could access the "Blob" field from the Info method on Ref incorrectly implied there was just a single blob for the ref. This is on top of the fact that there is no need for external access to blob digests. 2. Centralize use of cache metadata inside the cache package. Previously, many parts of the code outside the cache package could obtain the bolt storage item for any ref and read/write it directly. This made it hard to understand what fields are used and when. Now, the Metadata method has been removed from the Ref interface and replaced with getters+setters for metadata fields we want to expose outside the package, which makes it much easier to track and understand. Similar changes have been made to the metadata search interface. 3. Use a consistent getter+setter interface for metadata, replacing the mix of interfaces like Metadata(), Size(), Info() and other inconsistencies. Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Rebased. There was another ephemeral test failure on a previous run, different from the one I mentioned previously: https://gist.github.com/sipsma/593107a30f4886ff2b3dbb02a4862361 I looked around and can't see any clear way those failure could be related to the changes here and can't reproduce them locally after many runs. I also looked through past CI runs and found similar failures from before this PR: So I don't think it's likely those ephemeral failures are related to the change here. |
There are a few goals with this refactor:
won't make sense soon due to other potential changes. For example,
there can now be multiple blobs associated with a ref (for different
compression types), so the fact that you could access the "Blob"
field from the Info method on Ref incorrectly implied there was just
a single blob for the ref. This is on top of the fact that there is
no need for external access to blob digests.
Previously, many parts of the code outside the cache package could
obtain the bolt storage item for any ref and read/write it directly.
This made it hard to understand what fields are used and when. Now,
the Metadata method has been removed from the Ref interface and
replaced with getters+setters for metadata fields we want to expose
outside the package, which makes it much easier to track and
understand. Similar changes have been made to the metadata search
interface.
the mix of interfaces like Metadata(), Size(), Info() and other
inconsistencies.
Signed-off-by: Erik Sipsma erik@sipsma.dev