Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata / labels document and cleanup? #3

Open
kdvolder opened this issue Sep 9, 2020 · 14 comments
Open

Metadata / labels document and cleanup? #3

kdvolder opened this issue Sep 9, 2020 · 14 comments
Labels
semver:major A change requiring a major version bump type:enhancement A general enhancement

Comments

@kdvolder
Copy link

kdvolder commented Sep 9, 2020

I'm trying to make use of the metadata produced by the buildpacks in the labels of the image that it produces.

I presume that it is possible to obtain information such as whether a given dependency/jar was included in the image and if so what version. This is definitely useful information and I want to know it!

However...

  1. the format of the metadata needs to be clearly documented so that potential consumers of the data can
    a) understand how to parse the data
    b) rely on this parsing / format / structure to remain stable in the future (i.e. the documentation of the metadata format is to represent a contract of sorts that consumers of the data can rely on.

  2. I think there may be a bit too much metadata being attached. I think this because when I use 'docker inspect' on a buildpacked container the result is a file large enough to break some editors. The file I have is 500k in size. Granted, this is 'manageble' if handled with care, but parsing that data is still costly (memory and CPU). And some tools cannot handle it at all, for example gedit linux text editor freezes up as soon as I try to search for text in this file). So I question whether all this data is really needed/useful. (Hard to say now as I don't fully understand yet what is actually there. Some of it though seems to be the complete textual documentation for spring boot metadata properties, I think we probably do not really need all that documentation embedded in the metadata).

@nebhale nebhale transferred this issue from paketo-buildpacks/bellsoft-liberica Sep 21, 2020
@nebhale
Copy link
Member

nebhale commented Sep 21, 2020

  1. @ekcasey is working hard on documenting this (much of it is in the spec, but should be surfaced in a more user-friendly way). I'll have her run the documentation past you once it's published.
  2. The amount of metadata attached is by design. By adding all possible Boot configurations for a given application (as described by Boot itself; it's the standard Boot configuration metadata), the goal is to enable GUIs that offer collections of properties (complete with their documentation) to users.

@dmikusa dmikusa added the type:documentation A documentation update label Sep 28, 2021
@dmikusa
Copy link
Contributor

dmikusa commented Sep 28, 2021

@kdvolder

An update:

  1. I believe you're referring to the bill of materials (BOM). It is accessible with pack inspect apps/maven --bom or looking at the label directly. This format is currently custom and I do not believe documented, so here's some info.

The format is as such:

  • pack inspect will return JSON, the label is JSON as well
  • pack inspect will return a dict with two keys: remote and local. They indicate the same information, but show the information for the label on your local Docker daemon vs on the remote registry. Often they are the same but could differ if there is a newer image in the registry, for example.
  • The label, or remote/local entries are a JSON array. The array contains a dict, one for each layer, that is contributed to the image.
  • Each layer contributes the following info:
    • name -> the BOM entry's name
    • metadata -> a custom set of information, specific to each buildpack (I can't document all of these here, but the format is pretty simple and hopefully obvious)
    • buildpacks -> the buildpack that contributed the BOM entry

I don't anticipate this format to change until we implement this RFC. The RFC is focused on getting a standard format available for the bill of materials. Final comments are I believe tomorrow, and once it's finalized we can look at implementing it. The format will be standard, like SPDX or CycloneDX, so after this is implemented the format won't change & will be documented.

  1. We have taken some steps to address item Can't run simple app on heroku due to memory calculator #2. See Apps with huge amount of files possible lead to K8s node DoS spring-boot#80 (comment) for details. There is still some information in the labels though.

If there are any labels you're still concerned about let me know which ones and I can get some more details.

Hope that helps!

@kdvolder
Copy link
Author

kdvolder commented Sep 29, 2021

This format is currently custom and I do not believe documented

Yeah, so that's what this issue is about. For the format to be documented.

metadata -> a custom set of information, specific to each buildpack (I can't document all of these here, but the format is pretty simple and hopefully obvious)

This is really the most interesting and crucial part. E.g. for my use case it was about discovering dependencies in metadata from the java buildpack.

but the format is pretty simple and hopefully obvious

It really isn't obvious, and even if it was, the fact that these things are wholly undocumented essentially raises the questions:

  • how much can you rely on the format remaining unchanged and compatible with whatever use you are making of it?
  • how confident can you be that the format doesn't vary in some breaking way depending on factors you may not be aware of. E.g. for some 'hypothetical' parameters that could impact the data: maven vs gradle project, project using custom layering vs default structure, project using a non-default base image / builder etc. project using war vs jar packaging etc. etc. So how much variability do these parameters introduce into the part of the metadata that one is interested in?

Without documentations spelling out what the format is, it is hard / impossible to answer such questions with any confidence. Also even if you think you have it 'figured out' you are still running the risk it may change tomorrow because there really isn't any kind of contract a consumer of this data can rely upon.

@dmikusa
Copy link
Contributor

dmikusa commented Sep 29, 2021

I get what you're saying, it should be documented and explained. That makes the format an API and something that won't change out from under you. I understand that's important for anyone that wants to integrate with the tools.

Being completely transparent and as direct as possible, the present format isn't going to be documented or be any sort of official or guaranteed format. It'll continue to exist in its present format for the time being. We have no plans to change it until, as I mentioned before, the RFC that's coming that will prescribe a BOM format based on industry-wide standards like SPDX and CycloneDX. We'll be implementing that RFC once it is formalized.

This RFC is also important because it will unify the format across all of the buildpacks. Right now, we can only control what the Java buildpacks are contributing to the BOM & its format. Other buildpacks and the stack make contributions as well. Those formats are out of our control, at least until this RFC is implemented.

I believe that will address your concerns, but if not, please let me know. I can also leave this issue open if you like until that RFC has been implemented & we roll it out. Just let me know.

@kdvolder
Copy link
Author

Whatever you want to do is fine, re keeping the ticket open or not. This ticket is old and sort of forgot about it. At the moment I have no real issue. That may change if something breaks in our tools and I have to look at fixing it :-). In that case I may be back here with some questions.

@dmikusa
Copy link
Contributor

dmikusa commented Sep 30, 2021

I'll keep it open & I will update it when we start to and when we implement the transition for buildpacks/rfcs#166. If anyone is interested in this feature set, feel free to watch this issue.

It also seems reasonable that this RFC implementation/change will trigger a major version bump, since it's changing existing established behaviors. So you can watch for that as well.

@dmikusa dmikusa added semver:major A change requiring a major version bump type:enhancement A general enhancement and removed type:documentation A documentation update labels Sep 30, 2021
@dmikusa
Copy link
Contributor

dmikusa commented Nov 29, 2021

The Paketo buildpacks have implemented the new Buildpacks RFC for SBOM.

Here are some notes on the transition:

  1. The lifecycle was not implemented to support both the old style & new style SBOM formats. If you specify both, the lifecycle will error.
  2. Due to # 1, Paketo Java buildpacks cannot offer a transition period where we support both formats. We had hoped to be able to do this, but it's just not possible with the lifecyle.
  3. Due to # 2 and this being a breaking change, we have bumped the major version on all of the Paketo Java buildpacks.
  4. To use the new SBOM format in the lifecycle, you need the following:
    • Platform API 0.8+
    • Buildpack API 0.7+
  5. At the moment of writing this, there is no release of pack that includes platform 0.8. There is an RC you can use and a release should be available soon. The Spring Boot integration only at present supports platform API 0.4, and there is a plan to upgrade it to 0.8 for Spring Boot 2.7.
  6. With the major version bump of Paketo Java buildpacks, we have bumped the buildpack API to 0.7.
  7. If you have a compatible platform and you are using the latest major buildpack versions, you will get the new-style SBOM output. This is bundled in the container image under the /layers/sbom/.
  8. For bundled dependencies & helper dependencies, we are presently outputting Syft JSON SBOM format. We'll be adding CycloneDX format in the future.
  9. For application dependencies, we are presently outputting Syft JSON & CycloneDx JSON SBOM format.

Here's an example:

Screen Shot 2021-11-29 at 12 23 56 PM

  1. If you are using an older platform, you cannot get new-style SBOM output. You can continue to get old-style SBOM output by keeping your buildpack API set to 0.6. You can do this by sticking with paketo-buildpacks/java v5.21.1 or by packaging your own version of the buildpacks with the API set to 0.6 (it needs to be set for every buildpack that generates SBOM output).

I'm going to keep this issue open for a little while, so feel free to post questions/feedback here, or reach out in our Slack channel.

@kdvolder
Copy link
Author

Okay so I tried the command but all I get is a bom which is 'null' for both 'local' and 'remote'.

$ pack inspect-image test --bom
{
  "remote": null,
  "local": null
}

It looks some stuff did break in our tools. It is related to determining whether a given image contains a specific Java dependency. The code we had for this used to look at the labels directly. And that code broke. Also looking at all the labels on the image I do not see any label there that still has this information. So I wonder how we are supposed to access that information now. If you can provide some pointers that would be appreciated.

@kdvolder
Copy link
Author

Note: the image was build using pack cli and the builder paketobuildpacks/builder:full set as default builder. The project we built was a very vanilla spring-boot web app created using spring initializer.

@kdvolder
Copy link
Author

So here is where our code is looking for the bom:

$ docker inspect test | jq '.[].Config.Labels."io.buildpacks.build.metadata"'

But as you can see from the output below, the 'bom' there is null.

"{\"bom\":null,\"buildpacks\":[{\"homepage\":\"https://github.com/paketo-buildpacks/ca-certificates\",\"id\":\"paketo-buildpacks/ca-certificates\",\"version\":\"3.0.2\"},{\"homepage\":\"https://github.com/paketo-buildpacks/bellsoft-liberica\",\"id\":\"paketo-buildpacks/bellsoft-liberica\",\"version\":\"9.0.3\"},{\"homepage\":\"https://github.com/paketo-buildpacks/syft\",\"id\":\"paketo-buildpacks/syft\",\"version\":\"1.6.0\"},{\"homepage\":\"https://github.com/paketo-buildpacks/maven\",\"id\":\"paketo-buildpacks/maven\",\"version\":\"6.1.0\"},{\"homepage\":\"https://github.com/paketo-buildpacks/executable-jar\",\"id\":\"paketo-buildpacks/executable-jar\",\"version\":\"6.0.2\"},{\"homepage\":\"https://github.com/paketo-buildpacks/apache-tomcat\",\"id\":\"paketo-buildpacks/apache-tomcat\",\"version\":\"7.0.4\"},{\"homepage\":\"https://github.com/paketo-buildpacks/dist-zip\",\"id\":\"paketo-buildpacks/dist-zip\",\"version\":\"5.0.2\"},{\"homepage\":\"https://github.com/paketo-buildpacks/spring-boot\",\"id\":\"paketo-buildpacks/spring-boot\",\"version\":\"5.3.1\"}],\"launcher\":{\"version\":\"0.13.3\",\"source\":{\"git\":{\"repository\":\"github.com/buildpacks/lifecycle\",\"commit\":\"9f48e5a1\"}}},\"processes\":[{\"type\":\"executable-jar\",\"command\":\"java\",\"args\":[\"org.springframework.boot.loader.JarLauncher\"],\"direct\":true,\"buildpackID\":\"paketo-buildpacks/executable-jar\"},{\"type\":\"task\",\"command\":\"java\",\"args\":[\"org.springframework.boot.loader.JarLauncher\"],\"direct\":true,\"buildpackID\":\"paketo-buildpacks/executable-jar\"},{\"type\":\"web\",\"command\":\"java\",\"args\":[\"org.springframework.boot.loader.JarLauncher\"],\"direct\":true,\"buildpackID\":\"paketo-buildpacks/executable-jar\"}],\"buildpack-default-process-type\":\"web\"}"

@dmikusa
Copy link
Contributor

dmikusa commented Jan 28, 2022

The bad news first:

pack inspect-image test --bom

At the moment, pack hasn't been updated to use the new BOM format. That's why it's not reporting anything. Sorry, updating it is out of my control. I hope it'll happen soon. I can't really say for sure though.

So here is where our code is looking for the bom:

We cannot store BOM information on labels going forward. Labels have a hard size limit in Kubernetes and BOM information can grow to be quite large easily going over this limit. As such, the BOM information going forward is stored in the image, in its own layer.

You can use this tool to extract the new BOM information, https://github.com/sclevine/cnb-sbom/. When you run it, the tool should write files in the current working directory with the BOM files (or you can look at the code for the tool, and it's an example of how you could write a custom solution to pull out that info). You can also use docker cp, or similar if that is more convenient.

The good news:

After further discussions with the Buildpacks team, we were able to get the lifecycle updated to have backward compatibility. In short, starting with lifecycle 0.13.3 we're now able to support both the older style label-based BOM information and the new layer-based BOM information at the same time.

We do need to make some updates to the Paketo Java buildpacks before this will work. I'm hoping to have that out in next week's release cycle (Fri 2/4). I will post back here when we've made the change.

This doesn't mean there will be continued long-term support for the old-style label-based BOM format. We're still considering the older label-based BOM formats to be deprecated and they will be removed at some point. I'm just glad we'll be able to offer some overlap between the two so that users have a chance to move at their own pace.

I hope that's helpful for folks. As always, please reach out and let us know if you have questions/comments. Thanks

@kdvolder
Copy link
Author

At the moment, pack hasn't been updated to use the new BOM format. That's why it's not reporting anything. Sorry, updating it is out of my control. I hope it'll happen soon. I can't really say for sure though.

Actually we don't really care about the pack cli. Using pack cli is kind of of the table for our use case anyway, we have to access the information from inside of a Java process using a Java library / docker client. I only tried pack cli to try and see if the information is there in the image at all, using the 'officially document way' to access the info.

You can use this tool to extract the new BOM information

Hmmm... that is really rather impractical, as mentioned above our code is written in Java. I suppose we could somehow package up the binaries for that tool and then launch it somehow from code, but it isn't a great solution and requires complex packaging to acomodate for different OS's, or else we have to request that users install that tool themselves complicating the installation process.

As such, the BOM information going forward is stored in the image, in its own layer.

Okay... hmmm, that brings up a whole lot of questions. Does this mean you have to pull/download the entire image to access that info then? That would be impractical because the image can be large.

We are using this library: https://github.com/docker-java/docker-java/blob/master/docs/README.md and it isn't clear to me whether we can use it to access information from layers (somehow I doubt it). If you have any advice on how we might read the info (hopefully without downloading the whole image) please share.

@dmikusa
Copy link
Contributor

dmikusa commented Feb 4, 2022

At the moment, pack hasn't been updated to use the new BOM format. That's why it's not reporting anything. Sorry, updating it is out of my control. I hope it'll happen soon. I can't really say for sure though.

For reference for those reading along and using pack, this will be supported in the 0.24.0 release. As I write this, there's an RC available and if testing goes well, a release should be official in a few days.

As such, the BOM information going forward is stored in the image, in its own layer.

Okay... hmmm, that brings up a whole lot of questions. Does this mean you have to pull/download the entire image to access that info then? That would be impractical because the image can be large.

We are using this library: https://github.com/docker-java/docker-java/blob/master/docs/README.md and it isn't clear to me whether we can use it to access information from layers (somehow I doubt it). If you have any advice on how we might read the info (hopefully without downloading the whole image) please share.

It's my understanding that because it's a layer you do have to fetch the image. I don't know enough about interacting with an OCI registry to know if you can only fetch a particular layer or if you're stuck fetching them all. I have heard similar complaints from others about this change.

I am just the messenger here though. As a buildpack author, we don't deal with the layers directly. The buildpacks tooling does all that. The way this is stored was a decision made by the Buildpacks project. I would suggest reaching out either on their Github or on their Slack. Given they adopted this design, they might have tips on how to efficiently extract the BOM. That would also allow you to get feedback about this approach directly to them & hear any future plans they have on the topic directly.

Also, updates on:

We do need to make some updates to the Paketo Java buildpacks before this will work. I'm hoping to have that out in next week's release cycle (Fri 2/4). I will post back here when we've made the change.

This slipped and will be out next week, Fri 2/11. Sorry for the inconvenience.

This doesn't mean there will be continued long-term support for the old-style label-based BOM format. We're still considering the older label-based BOM formats to be deprecated and they will be removed at some point.

I talked with some more folks about timelines and it looks like we'll be supporting the label-based BOM format, as well as the new layer-based BOM format through the end of 2022.

@dmikusa
Copy link
Contributor

dmikusa commented Feb 17, 2022

We are using this library: https://github.com/docker-java/docker-java/blob/master/docs/README.md and it isn't clear to me whether we can use it to access information from layers (somehow I doubt it). If you have any advice on how we might read the info (hopefully without downloading the whole image) please share.

I asked around about this and you do not need to download the whole image. I am not sure about the particular library you're using, but the way it works is this:

  1. At least with Docker Hub, you need to pull a token.

  2. Fetch the image manifest and pull the config digest out of it.

  3. Use the config digest to pull the image config.

  4. From the config, you can fetch the diff id for the layer. There are two ways you can do this.

4a. The labels io.buildpacks.base.sbom and io.buildpacks.app.sbom. The former indicates the layer that has the SBOM information for the base image (I don't believe this is being populated with current tools at the time I write this so it's always empty) and the latter is the layer that has the SBOM information for the application and buildpack contributed dependencies (this is being populated since the switch to the 6.x line of Java buildpacks). It's my understanding that these label ids are associated with an open RFC, so while it appears these have been settled on there is the potential they could change). At the time of writing this RFC is still open which is where these labels are being defined.

4b. If you have an image where those labels are not present (most at the time of me writing this), you can look at the io.buildpacks.lifecycle.metadata label. This contains a JSON blob with one of two possible top-level fields, either bom or sbom (my understanding is that going forward it should be sbom, but for a period of time with some of the RC releases things were called bom so for absolute best compatibility you'd want to support both, but sbom is probably OK, it depends on your users).

  1. The diff id is the hash of the uncompressed layer, so you need to convert that to the hash of the layer. There isn't a direct way to do this, but the conventional way seems to be by layer order. The position of the diff id in the layer list in the config is the same as the position of the layer in the manifest. So you find the position in one, then look up the layer id from the other list. This is described better here.

  2. Take the layer from the manifest and use the media type and digest to download the actual image blob. This will be a tar-gzip file that contains the SBOM layer, which is just the SBOM files. You can extract them out of the archive and use for whatever purpose you need.

The logic above is from the tool I'd previously mentioned: https://github.com/sclevine/cnb-sbom/blob/main/main.go#L148-L190

The logic it's using to extract the layer is here: https://github.com/sclevine/cnb-sbom/blob/main/main.go#L192-L207.

Here's a gist of a bash script I wrote that uses curl to download the layer. I didn't test it extensively, but the couple of images I tried did work OK. I think this breaks down the process a bit more, if you want to implement it in Java, then the go code as that's using a library that hides some of this away.

Last thing. This only works against a registry. It's using the Registry API. Getting this information from a Docker daemon would be different. That said, if your docker daemon has the image already then this matters a lot less. You've already spent the time to pull the image and can copy out the SBOM in a number of ways (like docker cp).

There are other tools you can use to interact with the registry API too, like crane. That can more easily fetch the manifest and config. I don't think it has a command for fetching the specific layer though.

Not sure if you're still looking for this info, but I wanted to understand it myself so I figured I'd write it up and post it for reference. Hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver:major A change requiring a major version bump type:enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants