Skip to content

Conversation

@pioorg
Copy link
Contributor

@pioorg pioorg commented Aug 13, 2025

Details:

  • bumped JDK to the latest 21 in the builder image
  • switched the runtime base image to chainguard-base
  • updated git version to 2.50.1-r1
  • used jlink to create smaller image without not needed JDK modules, man pages and so on

Closes #370

Checklists

Pre-Review Checklist

  • This PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check crawler.yml.example and elasticsearch.yml.example)
  • This PR has a meaningful title
  • This PR links to all relevant GitHub issues that it fixes or partially addresses
    • If there is no GitHub issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v0.1.0)
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • Ran make notice if any dependencies have been added

Changes Requiring Extra Attention

This PR has to be well tested before merging, to ensure all necessary modules are present in the customised JDK image. Until proper CI and tests are completed, please treat it as a work in progress.

  • Security-related changes (encryption, TLS, SSRF, etc)
  • New external service dependencies added.

Related Pull Requests

Release Note

Details:
* bumped JDK to the latest 21 in the builder image
* switched the runtime base image to chainguard-base
* updated git version to 2.50.1-r1
* used jlink to create smaller image without not needed JDK modules, man pages and so on
@pioorg pioorg requested a review from a team as a code owner August 13, 2025 11:37
@pioorg
Copy link
Contributor Author

pioorg commented Aug 13, 2025

PS. on my machine this made the image size ~360MiB.

@lorenabalan
Copy link
Contributor

buildkite test this

@lorenabalan
Copy link
Contributor

PS. on my machine this made the image size ~360MiB.

Hey @pioorg thanks for coming back! I expected the main benefit of this change would've been the image size, but 360MB is larger than the current size (which I believe is sitting somewhere around 240-250MB). Or is the main aim to use most up-to-date versions (git, jdk), and while doing that also minimise the impact of the bump on the overall image size?

@pioorg
Copy link
Contributor Author

pioorg commented Aug 13, 2025

PS. on my machine this made the image size ~360MiB.

Hey @pioorg thanks for coming back! I expected the main benefit of this change would've been the image size, but 360MB is larger than the current size (which I believe is sitting somewhere around 240-250MB). Or is the main aim to use most up-to-date versions (git, jdk), and while doing that also minimise the impact of the bump on the overall image size?

Hi @lorenabalan
I currently see over 500MiB on my machine:
image

The updates of the JDK version is a kind of by-product, which IMHO should happen anyway, because 21.35 almost two years old.

@lorenabalan
Copy link
Contributor

buildkite test this

Copy link
Contributor

@lorenabalan lorenabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lorenabalan I currently see over 500MiB on my machine: image

Ooh okay, that's not what I see in the registry. Locally I get a different number too but I think the size in Docker desktop is a bit misleading, so I've been looking more at what docker inspect returns.

docker pull docker.elastic.co/integrations/crawler:latest 
docker inspect docker.elastic.co/integrations/crawler:latest --format='{{.Size}}' | awk '{printf "%.2f MB\n", $1/1024/1024}'

I checked that the new wolfi image is indeed smaller

make build-docker-wolfi
docker inspect crawler-ci-wolfi --format='{{.Size}}' | awk '{printf "%.2f MB\n", $1/1024/1024}'

159.01 MB on this branch vs 242.98 MB on main.

Approving based on the above, cc @mattnowzari for visibility. 😊

@pioorg
Copy link
Contributor Author

pioorg commented Aug 14, 2025

On my Mac this

make build-docker-wolfi
docker inspect crawler-ci-wolfi --format='{{.Size}}' | awk '{printf "%.2f MB\n", $1/1024/1024}'

yields 343.32 MB vs 602.47 MB 🤷
All in all, the image should be smaller, because eliminating not needed parts of the JDK.

@mattnowzari
Copy link
Contributor

Thank you for this work @pioorg! Also, thanks for closing #368! Go ahead and merge this PR when you are ready 🫡

@mattnowzari mattnowzari merged commit 6e66f73 into elastic:main Aug 14, 2025
2 checks passed
@github-actions
Copy link

💔 Failed to create backport PR(s)

The backport operation could not be completed due to the following error:
There are no branches to backport to. Aborting.

The backport PRs will be merged automatically after passing CI.

To backport manually run:
backport --pr 371 --autoMerge --autoMergeMethod squash

artem-shelkovnikov pushed a commit that referenced this pull request Aug 15, 2025
Details:
* bumped JDK to the latest 21 in the builder image
* switched the runtime base image to chainguard-base
* updated git version to 2.50.1-r1
* used `jlink `to create smaller image without not needed JDK modules,
man pages and so on

### Closes #370

### Checklists

<!--You can remove unrelated items from checklists below and/or add new
items that may help during the review.-->

#### Pre-Review Checklist
- [x] This PR does NOT contain credentials of any kind, such as API keys
or username/passwords (double check `crawler.yml.example` and
`elasticsearch.yml.example`)
- [ ] This PR has a meaningful title
- [ ] This PR links to all relevant GitHub issues that it fixes or
partially addresses
- If there is no GitHub issue, please create it. Each PR should have a
link to an issue
- [ ] this PR has a thorough description
- [ ] Covered the changes with automated tests
- [x] Tested the changes locally
- [ ] Added a label for each target release version (example: `v0.1.0`)
- [ ] Considered corresponding documentation changes
- [ ] Contributed any configuration settings changes to the
configuration reference
- [ ] Ran `make notice` if any dependencies have been added

#### Changes Requiring Extra Attention

<!--Please call out any changes that require special attention from the
reviewers and/or increase the risk to availability or security of the
system after deployment. Remove the ones that don't apply.-->

This PR **has to be well tested before merging**, to ensure all
necessary modules are present in the customised JDK image. Until proper
CI and tests are completed, please treat it as a work in progress.

- [ ] Security-related changes (encryption, TLS, SSRF, etc)
- [ ] New external service dependencies added.

### Related Pull Requests

<!--List any relevant PRs here or remove the section if this is a
standalone PR.

* https://github.com/elastic/.../pull/123-->

### Release Note

<!--If you think this enhancement/fix should be included in the release
notes,
please write a concise user-facing description of the change here.
You should also label the PR with `release_note` so the release notes
author(s) can easily look it up.-->

(cherry picked from commit 6e66f73)
@artem-shelkovnikov
Copy link
Member

💚 All backports created successfully

Status Branch Result
0.4

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

artem-shelkovnikov added a commit that referenced this pull request Aug 15, 2025
# Backport

This will backport the following commits from `main` to `0.4`:
- [Switched Docker runtime image to jlink
(#371)](#371)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

Co-authored-by: Piotr Przybył <23506256+pioorg@users.noreply.github.com>
mattnowzari pushed a commit that referenced this pull request Oct 6, 2025
Details:
* bumped JDK to the latest 21 in the builder image
* switched the runtime base image to chainguard-base
* updated git version to 2.50.1-r1
* used `jlink `to create smaller image without not needed JDK modules,
man pages and so on

### Closes #370

### Checklists

<!--You can remove unrelated items from checklists below and/or add new
items that may help during the review.-->

#### Pre-Review Checklist
- [x] This PR does NOT contain credentials of any kind, such as API keys
or username/passwords (double check `crawler.yml.example` and
`elasticsearch.yml.example`)
- [ ] This PR has a meaningful title
- [ ] This PR links to all relevant GitHub issues that it fixes or
partially addresses
- If there is no GitHub issue, please create it. Each PR should have a
link to an issue
- [ ] this PR has a thorough description
- [ ] Covered the changes with automated tests
- [x] Tested the changes locally
- [ ] Added a label for each target release version (example: `v0.1.0`)
- [ ] Considered corresponding documentation changes
- [ ] Contributed any configuration settings changes to the
configuration reference
- [ ] Ran `make notice` if any dependencies have been added

#### Changes Requiring Extra Attention

<!--Please call out any changes that require special attention from the
reviewers and/or increase the risk to availability or security of the
system after deployment. Remove the ones that don't apply.-->

This PR **has to be well tested before merging**, to ensure all
necessary modules are present in the customised JDK image. Until proper
CI and tests are completed, please treat it as a work in progress.

- [ ] Security-related changes (encryption, TLS, SSRF, etc)
- [ ] New external service dependencies added.

### Related Pull Requests

<!--List any relevant PRs here or remove the section if this is a
standalone PR.

* https://github.com/elastic/.../pull/123-->

### Release Note

<!--If you think this enhancement/fix should be included in the release
notes,
please write a concise user-facing description of the change here.
You should also label the PR with `release_note` so the release notes
author(s) can easily look it up.-->

(cherry picked from commit 6e66f73)
@mattnowzari
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
0.3

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

mattnowzari added a commit that referenced this pull request Oct 7, 2025
# Backport

This will backport the following commits from `main` to `0.3`:
- [Switched Docker runtime image to jlink
(#371)](#371)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

Co-authored-by: Piotr Przybył <23506256+pioorg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make the runtime Docker image smaller by using jlink

4 participants