Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build CI/CD for building versioned boefje image containers #2443

Closed
underdarknl opened this issue Feb 6, 2024 · 5 comments · Fixed by #2709
Closed

Build CI/CD for building versioned boefje image containers #2443

underdarknl opened this issue Feb 6, 2024 · 5 comments · Fixed by #2709
Assignees
Labels
kubernetes 😸 QA feedback QA feedback provided

Comments

@underdarknl
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Currently our boefjes are mostly locally stored folders with python code. This needs to be transformed into a Container or archive file based system where each Boefje is self-contained with a well defined entrypoint. These artifacts should be distributable as an archive and a oci-image and contain al needed requirements for the internals of this boefje to function.

Describe the solution you'd like

  • Describe the requirements per boefje in their own local requirements file.
  • Create a build script for each boefje (or a general one if possible)
  • Build ci/cd to create versioned artifacts (archive + ico-image) of released boefjes. possibly build for various architectures.
  • Setup distribution channel for artifacts
  • Connect katalogus to distribution channel
  • Have boefje runner collect required artifact (based on local arch, runner type) from distribution channel.
@Donnype
Copy link
Contributor

Donnype commented Feb 28, 2024

Related:

Suggested Approach

This ticket has three components I think:

How to build images

We need to think about the flow of building these Images with the right versions and metadata while minimizing the impact on the current boefje structure. This could just be a generic Dockerfile as a template for which we later build ways to extend it, e.g. using a custom base image or supplying optional build-args, etc. Building local images from a new boefje you wrote should be as simple as:

$ docker build -f boefjes/builder boefjes/boefjes/plugins/kat_my_boefje

Or perhaps even:

$ make boefje path=kat_my_boefje

We then use these commands in a Github Action to build all boefjes and push the images to ghcr.io tagged both latest and perhaps their version. If we use the same base image and layers this shouldn't take ages hopefully.

Which brings me to a question: are we going to use (semantic) versioning for the boefjes now? Do we bump the version of all boefjes whenever boefjes or octopoes changes? Or are we just going to iterate with perhaps tagging the short git hash and once in a while tag all boefjes with a new semantic version once we consider it "stable" or fix bugs? Perhaps abuse OpenKAT version numbers for mayor releases at least?

When to build images

For the development flow we need to consider when to build these images. Since we probably want to build the images in the CI as well, we can also consider not building the images by default locally and just pull the needed image the first time we run a boefje (like Docker does). While developing you need to build the image in the end to test it (or use a generic integration tests setup for this).

One tricky part is that the boefjes that are available could either be:

  • A local image
  • A remote image
  • An image that first needs to be (re)built from the local system.

The simplest solution might be:

  1. Define remote tags for the containerized boefjes (ghcr.io in this case)
  2. Try to run the image from the local repository
  3. If not present, try to pull from remote
  4. If not on remote, check if we can build it locally with the right tag
  5. If we cannot build it locally, fail

How to find images [future]

This leaves one aspect. Currently, the boefje definition json files are the files we use to find individual boefjes. When we let users add their own repository, suddenly we cannot rely on looping over json files locally anymore. We need to query this repository for all available boefjes. Different repositories support different API's for this. We could just add a configuration (file) containing all metadata of the boefjes in a certain repository. However, we could also just add one boefje at a time with the repository field, either with json files still or start considering a boefje database model.

@Donnype
Copy link
Contributor

Donnype commented Mar 20, 2024

Also in the future it would be great if we could provide easy tooling to build the images outside of kat, but the current scripts at least depend on some imports outside of its scope

@dekkers dekkers changed the title Build CI/CD for building versioned Image containers, and zip file per Boefje Build CI/CD for building versioned boefje image containers Mar 20, 2024
@dekkers
Copy link
Contributor

dekkers commented Mar 20, 2024

Building images

I don't think we can build every boefje that currently already uses an existing container image using a single builder Dockerfile. So we would need some logic to use a Dockerfile in the boefje directory if it exists and fallback to using the general builder Dockerfile if it doesn't. But maybe it easier to just put a Dockerfile in the boefje directory even if that would cause a bit of duplication.

I think it would be good to not increase the scope too much and tag the boefjes container the same way we do the current containers, so with the OpenKAT version when we add a git tag. This also leaves a bit of room to change the boefjes HTTP API, because we haven't really used it yet and it would be nice to be able to make changes it if necessary.

I think it shouldn't be a problem to just build all boefje images with a single build command in a development setup from source. It takes a bit of time the first time, but after that it would just use the default docker image caching. Those images will then be available in the local docker registry for KAT to use. So I don't think it is necessary to have logic to decide to pull images or build them.

For github actions I think we can use the path filter that used is in the Debian action to only build images when there are changes to the boefje in a PR. I think for releases and maybe also main we always want to build all the images, because you want to make sure that you have the latest base image / security updates.

One aspect that is not mentioned yet is that we need some shared code to do the communication with the boefjes HTTP API. We either need to directly copy that from the source with a statement in the Dockerfile, use multi-stage builds for this, or create a PyPI package that can be installed.

Finding images

I think discoverability / creating a repository is another discussion we should probably do in another issue. But with regards to the boefje definition and picture, I think that should be included in the container image, either as a file in one of the layers or added as OCI image metadata.

@Donnype
Copy link
Contributor

Donnype commented Mar 21, 2024

@dekkers True about the boefjes that use containers and agreed on the approach for the fallbacks. I think 60% could use a generic base.Dockerfile and the others either use some duplication or we juggle with multistage builds.

I also think that the versioning in tags is a discussion to have later. Wanted to at least point out all the considerations to take into account.

Building all images with one command will cover a lot of cases, but I do see scenario's where perhaps I clean up some old images and suddenly see obscure failures in my install because I miss the images. Wouldn't we save some potential future headaches if we try to build a resource if possible? Perhaps in practice most devs spam make kat but I still mostly talk through docker compose itself.

The last part indeed I soon realized starting to experiment with this. It would be nice if we could have some generic tooling around this that works with multiple base images and does not require (a lot of) dependencies to bake those image into openkat-compatible-boefjes. In the MVP I just built I combine a bash boefje_entrypoint.sh that curls to the boefje API and calls a python-specific docker_adapter.py that calls the run method from the boefjes. Could be improved over time when we learned more about the setup.

Finding Images

Agreed to discuss this in another issue.

@Donnype
Copy link
Contributor

Donnype commented Mar 22, 2024

With respect to Boefjes that already use a container, as discussed with @dekkers, the simplest solution would be to extend the existing container and install whatever is necessary (i.e. python, perhaps curl) to call and process the scripts in the same container using the tooling we also have for the python boefjes. This allows us to:

  • Move the complexity of tweaking the current boefje images to building working Docker images, which aligns with the current experience in the team
  • Iterate over the current APIs quickly in the first phase in a language we're comfortable with, i.e. Python
  • Rewrite all the boefjes we're currently supporting

We'd lose some build-time, but that's worth the benefits. The same holds for the fact that we'll be shipping bigger images. To remove this dependency we could consider trying to bundle or compile the Python code to pass to the image, or actually switch to a language such as Go or Rust to compile binaries the images can run directly to turn into a boefje.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kubernetes 😸 QA feedback QA feedback provided
Projects
Archived in project
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants