Skip to content

Conversation

@lucian-tosa
Copy link
Contributor

@lucian-tosa lucian-tosa commented Sep 4, 2025

Summary

Adding the --remove-destination flag to cp in setup-agent-files.sh will prevent the agent container from getting stuck whenever the agent or the utilities container is restarted.
If the utilities container is restarted, then the pid of the utilities marker will change. This means that the symlink for the probes are pointing to the filesystem of the old pid, therefore a dangling symlink. When the agent container is restarted, cp will not be able to overwrite the symlink without this flag.

This change will require re-releasing all agent images.

Proof of Work

Before the change
image

After the change. The containers are ready even if they were restarted
image

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?

@github-actions
Copy link

github-actions bot commented Sep 4, 2025

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.3.0 Release Notes

New Features

Multi-Architecture Support

We've added comprehensive multi-architecture support for the kubernetes operator. This enhancement enables deployment on IBM Power (ppc64le) and IBM Z (s390x) architectures alongside
existing x86_64 support. Core images (operator, agent, init containers, database, readiness probe) now support multiple architectures. We do not add support IBM and ARM support for Ops-Manager and the init-ops-manager image.

  • MongoDB Agent images have been migrated to new container repository: quay.io/mongodb/mongodb-agent.
    • the agents in the new repository will support the x86-64, ARM64, s390x, and ppc64le architectures. More can be read in the public docs.
    • operator running >=MCK1.3.0 and static cannot use the agent images from the old container repository quay.io/mongodb/mongodb-agent-ubi.
  • quay.io/mongodb/mongodb-agent-ubi should not be used anymore, it's only there for backwards compatibility.

Bug Fixes

  • This change fixes the current complex and difficult-to-maintain architecture for stateful set containers, which relies on an "agent matrix" to map operator and agent versions which led to a sheer amount of images.
  • We solve this by shifting to a 3-container setup. This new design eliminates the need for the operator-version/agent-version matrix by adding one additional container containing all required binaries. This architecture maps to what we already do with the mongodb-database container.
  • Fixed an issue where the readiness probe reported the node as ready even when its authentication mechanism was not in sync with the other nodes, potentially causing premature restarts.
  • Fixed an issue where the MongoDB Agents did not adhere to the NO_PROXY environment variable configured on the operator.
  • Changed webhook ClusterRole and ClusterRoleBinding default names to include the namespace. This ensures that multiple operator installations in different namespaces don't conflict with each other.

Other Changes

  • Optional permissions for PersistentVolumeClaim moved to a separate role. When managing the operator with Helm it is possible to disable permissions for PersistentVolumeClaim resources by setting operator.enablePVCResize value to false (true by default). When enabled, previously these permissions were part of the primary operator role. With this change, permissions have a separate role.
  • subresourceEnabled Helm value was removed. This setting used to be true by default and made it possible to exclude subresource permissions from the operator role by specifying false as the value. We are removing this configuration option, making the operator roles always have subresource permissions. This setting was introduced as a temporary solution for this OpenShift issue. The issue has since been resolved and the setting is no longer needed.
  • We have deliberately not published the container images for OpsManager versions 7.0.16, 8.0.8, 8.0.9 and 8.0.10 due to a bug in the OpsManager which prevents MCK customers to upgrade their OpsManager deployments to those versions.

@lucian-tosa lucian-tosa changed the title Add --remove-destination to cp CLOUDP-342878 - Add --remove-destination to cp Sep 4, 2025
@lucian-tosa lucian-tosa marked this pull request as ready for review September 4, 2025 15:20
@lucian-tosa lucian-tosa requested a review from a team as a code owner September 4, 2025 15:20
@lucian-tosa lucian-tosa added the skip-changelog Use this label in Pull Request to not require new changelog entry file label Sep 4, 2025
@lucian-tosa lucian-tosa enabled auto-merge (squash) September 4, 2025 15:34
@lucian-tosa lucian-tosa merged commit 85d4a02 into master Sep 4, 2025
5 of 6 checks passed
@lucian-tosa lucian-tosa deleted the fix-setup-agent-files branch September 4, 2025 15:36
mihaigalos pushed a commit to mihaigalos/mongodb-kubernetes that referenced this pull request Sep 10, 2025
# Summary

Adding the `--remove-destination` flag to `cp` in `setup-agent-files.sh`
will prevent the agent container from getting stuck whenever the agent
or the utilities container is restarted.
If the utilities container is restarted, then the pid of the utilities
marker will change. This means that the symlink for the probes are
pointing to the filesystem of the old pid, therefore a dangling symlink.
When the agent container is restarted, `cp` will not be able to
overwrite the symlink without this flag.

This change will require re-releasing all agent images.

## Proof of Work
Before the change
<img width="1488" height="64" alt="image"
src="https://github.com/user-attachments/assets/8cc664b9-491a-445b-b71c-2f9b89fa844a"
/>

After the change. The containers are ready even if they were restarted
<img width="1444" height="83" alt="image"
src="https://github.com/user-attachments/assets/26f4857a-9f6e-4eca-9e74-82acd5d9f145"
/>


## Checklist

- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you added changelog file?
    - use `skip-changelog` label if not needed
- refer to [Changelog files and Release
Notes](https://github.com/mongodb/mongodb-kubernetes/blob/master/CONTRIBUTING.md#changelog-files-and-release-notes)
section in CONTRIBUTING.md for more details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog Use this label in Pull Request to not require new changelog entry file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants