Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Node.js Installation Buildpacks BOM #98

Merged
merged 7 commits into from
Aug 20, 2021
Merged

Conversation

sophiewigmore
Copy link
Member

@sophiewigmore sophiewigmore commented Jul 28, 2021

Summary

Resolves paketo-buildpacks/nodejs#425

Readable

Use Cases

Checklist

  • I have viewed, signed, and submitted the Contributor License Agreement.
  • I have linked issue(s) that this PR should close using keywords or the Github UI (See docs)
  • I have added an integration test, if necessary.
  • I have reviewed the styleguide for guidance on my code quality.
  • I'm happy with the commit history on this PR (I have rebased/squashed as needed).

@sophiewigmore sophiewigmore changed the title [WIP] nodejs installation bom [WIP] Node.js Installation Buildpacks BOM Jul 28, 2021
@sophiewigmore sophiewigmore changed the title [WIP] Node.js Installation Buildpacks BOM Node.js Installation Buildpacks BOM Jul 29, 2021
@sophiewigmore sophiewigmore marked this pull request as ready for review July 29, 2021 17:39
@sophiewigmore sophiewigmore requested a review from a team as a code owner July 29, 2021 17:39
@sophiewigmore sophiewigmore changed the title Node.js Installation Buildpacks BOM RFC: Node.js Installation Buildpacks BOM Jul 29, 2021
NPM Install and Yarn Install Buildpacks.

* After the `npm install` or `yarn install` command is run during the build
phase, we should run `npm install -g @cyclonedx/bom` to install the tool on
Copy link
Member

@arjun024 arjun024 Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPM Install/Yarn Install buildpacks are designed such that they can be used offline in Internet-less environments. This installation of cyclonedx/bom doesn't seem to work in offline mode. Can this be packaged as a dependency? Or even be in the build image? Regardless, I think we should maintain install buildpack's ability to work fully in offline mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjun024 that's a good point. I'm more inclined to have this run as a part of this buildpack since the goal is just to collect insights around the install process, and it doesn't really have a role in adding something to the final app image (besides metadata).

I can look into packaging this as a dependency and update you here.

Copy link
Member Author

@sophiewigmore sophiewigmore Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @ForestEckhardt and I have two options we are considering after looking into this:

A. Extract the tool as a tarball with its dependencies, and make it available as a standalone dep-server dependency. Install it with it’s own buildpack in both an online and offline scenario. Run BOM creation using the tool in its own buildpack after NPM Install/Yarn Install/ or before Node Start in the no package manager scenario.

B. Extract the tool as a tarball with its dependencies, and then run BOM tool installation and BOM creation together in one buildpack after NPM Install/Yarn Install/ before Node Start in the no package manager scenario.

We are leaning towards having this process be a part of its own buildpack(s) for a couple reasons. First, it keeps buildpack modularity intact. While it seems intuitive to run BOM creation alongside the NPM Install and Yarn Install buildpacks, the issue is that we need to support the Node Start case, and it seems unintuitive to create the BOM in this buildpack, since it's a start-command buildpack.

The other reason we are leaning to one of these two set ups is because it feels slightly sneaky to slip the dependency into the image within another buildpack. We could add it as a part of the node-engine dependency or install it in the node-engine buildpack, but this doesn't seem that transparent especially since the inclusion of this dependency opens the door for additional CVEs.

@arjun024 do you have any thoughts?

Copy link
Member Author

@sophiewigmore sophiewigmore Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are ideally going with option B here. We will host the CycloneDX BOM tool as a dependency in the dep-server, and then introduce a buildpack that runs before start commands to install the tool and run BOM generation together. This will work in both the online and offline settings because the tool in the dep-server will have all of its dependencies vendored in. See 491a220

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

* A single BOM generation buildpack will be added. It's primary functions are:
* perform the BOM dependency tool retrieval from the dep-server,
* execute BOM generation with the tool (`cyclonedx-bom -o bom.json`), and
* contribute the BOM to the layer metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: You are not writing to the "layer metadata" right? You're gonna write to the {build,launch}.toml? (build/launch environment metadata - I don't know the technical term?) I think of layer metadata as <layers>/<layer>.toml. I think spelling it out would be clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 7f6e30e


The buildpack will pass detection in two different cases:

* Detection will pass if there is a `node_modules` directory in the source directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think apps would have a node_modules dir at detect time unless they are vendored. The x-install buildpacks generate node_modules dir during the build phase

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the point here was that. If the modules are vendored, we don't have to require node_modules. My wording is a bit misleading I can fix it up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to fix the wording in 7f6e30e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fail detection if BP_*_BOM is set to false? - so we can use it as an optional buildpack. If that's too detailed for the RFC, that's fine too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea!! That's what we're proposing. Is that unclear in our explanation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text says "Node Module BOM Generator CNB always detects". I'd expect it to say it detects unless BP_*_BOM is set to false like Node Run Script but kind of opposite logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are no longer implementing this environment variable, I don't think this is necessary now?


* (Optionally) generate `CPE` and `pURL` from the data and add it to the
1. Perform the BOM dependency tool retrieval from the dep-server. It will get
added to a build-time layer. It will contribute a BOM entry for the tool itself
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will get added to a build-time layer

Why is this tool added to a layer (& a BOM entry for the tool itself added) instead of just using the tool to generate the BOM and throwing away the tool?

Copy link
Member Author

@sophiewigmore sophiewigmore Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wanted to cache the tool so that if any node modules change, we can still generate the BOM again without having to re-download the tool.

A BOM entry for the tool is necessary because it's something we are installing onto the image we are using during the build process, and we should record of everything we install via our buildpacks for transparency's sake


Users of the Node.JS buildpack can opt out of this behaviour by creating a
custom order grouping that does not include the BOM Generation buildpack, or
they can set a `BP_ENABLE_MODULE_BOM` flag during container build-time to `false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to use this as a generic flag across language families? If it's nodejs-specific, I suggest we stick to the convention of the BP_NODE prefix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to have both. Imagine that a user might want to disable the BOM for just a single buildpack, or all BOM generation across the entire build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing worth considering is to invert the variable to BP_DISABLE_MODULE_BOM. This way unset and set to false align.

Copy link
Contributor

@ForestEckhardt ForestEckhardt Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would y'all feel if for this RFC and the initial implementation of the buildpack we removed the ability to skip detection via and environment variable?

I think that we could hold off for now and survey the landscape of language module BOM generating buildpacks and make a more informed decision on what kind of control mechanism we would like. I think that this is a relatively safe approach because if we get user feedback in the interim for such a feature this control structure is easily added and will help us solidify a pattern for similar types of buildpacks that may come in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the bom generation do not eat up a lot time, I'm fine with it. You can also propose the skip-detection in the RFC and not do it in the initial implementation if you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjun024 we have updated the RFC to mention this environment variable as a future optimization pending user interest. We are leaving it out of our initial implementation

Signed-off-by: Forest Eckhardt <feckhardt@pivotal.io>
* execute BOM generation with the tool (`cyclonedx-bom -o bom.json`), and
* contribute the BOM to the `launch.toml` `[bom]` label.

* For each of the three Node.JS buildpack order groups, the BOM generation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if instead of generating the BOM during the build phase of the BOM generation buildpack, we should instead have the buildpack that produces installs the modules generate the BOM. There are a couple reasons for this:

Once CNB introduces the ability associate BOM entries with a specific layer, we will want to:

  1. e.g. associate these BOM entries with a layer like paketo-buildpacks/yarn-install:modules
  2. save time by avoiding rerunning the tool if the layer has not changed (the yarn-install buildpack should be responsible for making this determination)

In the case where there is no package manage we will still need the new buildpack to generate the BOM, or I guess node-start could potentially do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need a new buildpack regardless and as it stands we can get away with having one buildpack do all of the BOM generation. I would prefer to implement the smallest change first and then potentially expand the functionality of other buildpacks or add additional buildpacks as the need arises. I have some concerns about having to duplicate logic across three different buildpacks just so I have the BOM on the right layer and so I would like to dive into this deeper when the feature is ready to play.

Is the lack of consideration for layer specific BOM a blocker for this initial implementation in your eyes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just added a performance section, but when we checked the time it took to run the tool against an app with 1300 node modules, it only took around 3 seconds. It's pretty quick, so we're not overly worried about the re-running of the tool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only took around 3 seconds. It's pretty quick, so we're not overly worried about the re-running of the tool

While I don't think 3 seconds is a deal-breaker, it's not completely insignificant either. For example a no-op rebuild of the paketo yarn sample app currently takes about 4.6 seconds on my workstation.

I have some concerns about having to duplicate logic across three different buildpacks just so I have the BOM on the right layer and so I would like to dive into this deeper when the feature is ready to play.

My assumption would be that the vast majority of the logic lives in the bom generation tool itself, leaving very little logic that would need to be duplicated?

Is the lack of consideration for layer specific BOM a blocker for this initial implementation in your eyes?

No. In general I don't see my suggestions here as a blocker. I think associating the BOM with the correct layer and shaving a few seconds off the build whenever possible are both good long term goals, but if the node maintainers believe that accomplishing those goals would significantly slow down our ability to deliver ther feature or make the buildpacks harder to maintain, I trust the maintainers weigh those tradeoffs. This also seems like something that could be refactored later if need be without impacting end users, which lowers the stakes.

Copy link
Member Author

@sophiewigmore sophiewigmore Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't think 3 seconds is a deal-breaker, it's not completely insignificant either. For example a no-op rebuild of the paketo yarn sample app currently takes about 4.6 seconds on my workstation.

This is fair. The app we used is quite large so I'd expect that to be more of a worst-case time. Regardless, you are right though that it's significant enough that we should probably consider optimizing.

My assumption would be that the vast majority of the logic lives in the bom generation tool itself, leaving very little logic that would need to be duplicated?

Right now the logic includes the installation of the tool (/cache reuse of the tool), running the tool, and converting the BOM JSON file into packit.BOMEntry types (for now).

It's not terrible to duplicate, but the fact of the matter is that we can get the basic functionality for the BOM via a single buildpack, rather than adding a separate buildpack for one case, and making changes to NPM Install and Yarn Install in the other cases. The optimizations for performance and layer inclusion are definitely worthwhile, but I am more inclined to go with the basic implementation that we have outlined first (pending maintainer approval) and then work towards these optimizations down the line as you said.

@paketo-buildpacks/nodejs-maintainers can you weigh in?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM for the first implementation. If the node_modules hasn't changed, you should still be able to save time by avoiding downloading/rerunning the tool, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also on board with the single-buildpack proposal as a first pass


The Node Module BOM Generator CNB always detects with the following contract:
* Requires {`node`, `node_modules`} during `build`
* Provides none
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it should provide cyclonedx-bom. If we take my sugestion above regarding, shifting responsibility for executing the tool to the buildpack providing the described layer (when appropriate), then

  1. if node_modules is vendored, it will also require cyclonedx-bom
  2. yarn-install and npm-install can require cyclonedx-bom in their first build plan entry (and fall back to passing without generating a bom in a second entry).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct if we do divide up the generation amongst buildpacks, however I am hesitant to do this fo the reasons mentioned above.

Signed-off-by: Forest Eckhardt <feckhardt@pivotal.io>
@robdimsdale robdimsdale mentioned this pull request Oct 21, 2021
5 tasks
modulo11 pushed a commit to sap-contributions/rfcs that referenced this pull request Oct 26, 2021
[paketo-buildpacks#98]

Signed-off-by: Ben Hale <bhale@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC for installation buildpack BOM
6 participants