Updates to ModelKit format#635
Conversation
Add fields 'digest' and 'diffId' to Kitfile fields that represent ModelKit layers, which are only serialized in JSON format. When the Kitfile is used as the ModelKit OCI artifact config, it should contain digests and diff IDs, but when it is deserialized to YAML it should be the Kitfile as normal.
Change how files are packed into tarballs for ModelKit layers: instead of packing files relative to the layer's path within a Kitfile, we instead pack them relative to the context directory for the kit command (i.e. kit pack <path>). This means that a ModelKit's layers can be extracted directly to reproduce the files/directories from the original repository, without relying on the CLI to place files at the correct path. Additionally, this commit uses UID 1000 for the file owner inside the tarball instead of 0 (root). This makes it easier to consume ModelKit layers as layers in an OCI image.
We need to handle existing ModelKits in addition to the new tar format -- for new ModelKits, we can unpack the tars directly (and simply), but for older ones, we need to pre-create the layer's directory and concatenate paths from the tarball. To avoid overcomplicating things (since we're still relatively early), we detect new ModelKits via the digest/diffId fields in the config; this is a lot simpler to support than using new versions on our media types.
| // Clean path to ensure consistent format (./path vs path/ vs path) | ||
| path = filepath.Clean(path) | ||
|
|
||
| if layerIgnored, err := ignore.Matches(path, path); err != nil { |
There was a problem hiding this comment.
this call to ignore.Matches is ood to me.
There was a problem hiding this comment.
This is because the ignore matcher is used for both the .kitignore file and for intersecting layers (where e.g. a dataset layer packs a subdirectory of a code layer). The second argument is used to determine whether the path is ignored due to it being packed by another layer in the ModelKit -- using path, path here functionally disables the intersecting dirs ignore check. The function signature is
func Matches(path, layerPath string) (bool, error)Alternatively, we could split the ignore into two checks (i.e. ignored due to kitignore vs ignored due to being part of another layer, I just wanted to keep the check simple.
| aToB, errA := filepath.Rel(a, b) | ||
| bToA, errB := filepath.Rel(b, a) | ||
| if errA != nil || errB != nil { | ||
| plog.Logf(output.LogLevelWarn, "Cannot compare directories %s and %s, skipping path", a, b) | ||
| return false | ||
| } | ||
| if strings.Contains(aToB, "..") && strings.Contains(bToA, "..") { | ||
| return false | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
Would HasPrefix be more approprate instead of Contains?
There was a problem hiding this comment.
It might be, I opted for Contains so as to not worry about ./.., C:path, etc.
I'd have to test how relative paths are handled on Windows to be sure. We might also have weird bugs around switching drives on Windows, I'm not sure (not familiar with how it all works on that side of things)
| _, relPath, err = filesystem.VerifySubpath(opts.unpackDir, config.Model.Path) | ||
| if err != nil { | ||
| return fmt.Errorf("error resolving model path: %w", err) | ||
| if config.Model.LayerInfo != nil { |
There was a problem hiding this comment.
Should this also check if there is a config.Model ?
There was a problem hiding this comment.
We only enter this branch of the switch if we encounter a model layer, which implies the existence of a model field in the Kitfile (i.e. how did we pack a model layer without a model section in the Kitfile?).
| _, relPath, err = filesystem.VerifySubpath(opts.unpackDir, docsEntry.Path) | ||
| if err != nil { | ||
| return fmt.Errorf("error resolving path %s for docs: %w", docsEntry.Path, err) | ||
| } |
There was a problem hiding this comment.
Duplicated code for handling every layer. Should it be refactored ?
There was a problem hiding this comment.
This is just kind of the way things end up happening in Golang. I can try to rework it but in my experience it ends up decreasing readability :)
There was a problem hiding this comment.
Latest commit should make this a little clearer.
Updates to handle empty directories, since we're hitting a few edge cases: * If there are no files to be packed, don't initialize a progress bar, since it will never be filled. This causes pack to hang * Print a warning if the computed size of a layer is 0 (ignoring directories) to warn the user of a potential mistake * Check that target directory exists; since we're walking from the pack context, we no longer automatically fail when a layer's path does not exist.
Description
This PR implements the updates discussed in #489:
This change is backwards compatible with ModelKits packed prior to this change -- they can still be unpacked, etc. This change does, however, break reproducible digests for older ModelKits (packing the same data with Kit prior to this change will result in a ModelKit with a different digest due to the changes to tar files and the config). Within Kit versions, digests are still reproducible.
Linked issues
Closes #489