Skip to content

Conversation

viveksinghggits
Copy link
Contributor

@viveksinghggits viveksinghggits commented Oct 1, 2025

Summary

Update:
Thanks to @MaciejKaras for the comment here and it looks like we were trying to lint the helm chart in parallel to other jobs that were trying to update the helm chart. This PR tries to change that and runs the helm chart linting after the updates to helm chart are made by update_jobs job.

Old Desc:
The chart linting command that we run as part of lint_repo task

ct lint --charts="${PROJECT_DIR}/helm_chart/" \
    --chart-yaml-schema "${PROJECT_DIR}/helm_chart/tests/schemas/chart_schema.yaml" \
    --lint-conf "${PROJECT_DIR}/helm_chart/tests/schemas/lintconf.yaml"

seems to be flakey. It doesn't seem to fail regularly but we have seen issues that are reported here. If we look into the error

[ERROR] templates/: template: mongodb-kubernetes/templates/secret-config.yaml:1:14: executing "mongodb-kubernetes/templates/secret-config.yaml" at <.Values.operator.vaultSecretBackend>: nil pointer evaluating interface

It looks like .Values object that is passed to the template file doesn't have the field .Values.operator.vaultSecretBackend. And the only places where the values.yaml doesn't have vaultSecretBackend defined are the files values-openshift.yaml and values-multi-cluster.yaml.
And I suspect that

  • Either these values files are merged and then passed to helm template, and merge doesn't work properly and eventually .Values.operator has the value that is defined in either values-openshift.yaml or in values-multi-cluster.yaml (and it doesn't have vaultSecretBackend).
  • Or helm template is run using the other values.yaml files which doesn't have operator.vaultSecretBackend. This is less likely because this could have caused consistent failure in CI.

One option that I considered to fix this is explicitly passing just the main values.yaml file to the ct lint command, using --helm-extra-args flag, but that didn't resolve the issues. I was able to reproduce it once if I ran 15 manual patches.

The other option was to have the vaultSecretBackend field in the other values files as well. And this is what I have done in this PR. I am not really sure if this actually fixed the problem but I was not able to reproduce it in 20 manual patches. The other small change that this PR does is, running helm template before running ct lint so that if the test fails ever again we can check and see if vaultSecretBackend is eventually generated in the template.

Proof of Work

Output of successful run of make precommit: https://gist.github.com/viveksinghggits/45a6f80f2b5b85d26d7d21dcc3dfb56c
Output of failed run of make precommit: https://gist.github.com/viveksinghggits/7bb5a35e8bfe3a9be93e668ee8e734b9

Ran 20 manual evg patches and all of them are successful.
image

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?

@viveksinghggits viveksinghggits requested a review from a team as a code owner October 1, 2025 12:41
@viveksinghggits viveksinghggits added the skip-changelog Use this label in Pull Request to not require new changelog entry file label Oct 1, 2025
Copy link

github-actions bot commented Oct 1, 2025

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.5.0 Release Notes

New Features

  • Improve automation agent certificate rotation: the agent now restarts automatically when its certificate is renewed, ensuring smooth operation without manual intervention and allowing seamless certificate updates without requiring manual Pod restarts.

Bug Fixes

  • MongoDBMultiCluster: fix resource stuck in Pending state if any clusterSpecList item has 0 members. After the fix, a value of 0 members is handled correctly, similarly to how it's done in the MongoDB resource.

@viveksinghggits viveksinghggits requested a review from m1kola October 1, 2025 13:55
Copy link
Collaborator

@MaciejKaras MaciejKaras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this will fix the helm linting issue definitely. More likely we are observing flakiness, because of running dependant tasks in parallel. lint_helm_chart func depends on the helm values generated by update_values_yaml_files func, but they are not running sequentially.

pre_commit() {
  run_job_in_background "update_jobs"         # <- this rebuilds helm values
  run_job_in_background "update_licenses"
  run_job_in_background "lint_code"
  run_job_in_background "start_shellcheck"
  run_job_in_background "regenerate_public_rbac_multi_cluster"
  run_job_in_background "python_formatting"
  run_job_in_background "check_erroneous_kubebuilder_annotations"
  run_job_in_background "validate_snippets"
  run_job_in_background "lint_helm_chart"    # <- this requires helm values ready

  if wait_for_all_background_jobs; then
    echo -e "${GREEN}pre-commit: All checks passed!${NO_COLOR}"
    return 0
  else
    return 1
  fi
}

update_jobs() {
     # Update release.json first in case there is a newer version
    time update_release_json
    # We need to generate the values files first
    time update_values_yaml_files               # <- this rebuilds helm values
    # The values files are used for generating the standalone yaml
    time generate_standalone_yaml
}

I think the solution should be to run lint_helm_chart in update_jobs function as the last step. Or maybe run it after all other jobs have finished?

@viveksinghggits
Copy link
Contributor Author

I'm not convinced this will fix the helm linting issue definitely. More likely we are observing flakiness, because of running dependant tasks in parallel. lint_helm_chart func depends on the helm values generated by update_values_yaml_files func, but they are not running sequentially.

pre_commit() {
  run_job_in_background "update_jobs"         # <- this rebuilds helm values
  run_job_in_background "update_licenses"
  run_job_in_background "lint_code"
  run_job_in_background "start_shellcheck"
  run_job_in_background "regenerate_public_rbac_multi_cluster"
  run_job_in_background "python_formatting"
  run_job_in_background "check_erroneous_kubebuilder_annotations"
  run_job_in_background "validate_snippets"
  run_job_in_background "lint_helm_chart"    # <- this requires helm values ready

  if wait_for_all_background_jobs; then
    echo -e "${GREEN}pre-commit: All checks passed!${NO_COLOR}"
    return 0
  else
    return 1
  fi
}

update_jobs() {
     # Update release.json first in case there is a newer version
    time update_release_json
    # We need to generate the values files first
    time update_values_yaml_files               # <- this rebuilds helm values
    # The values files are used for generating the standalone yaml
    time generate_standalone_yaml
}

I think the solution should be to run lint_helm_chart in update_jobs function as the last step. Or maybe run it after all other jobs have finished?

Hi @MaciejKaras,
You are right. I somehow misread the code (that I wrote 😢) and thought that we are still running lint helm chart separately and not as part of pre-commit. I will make the changes and push them soon.

@viveksinghggits
Copy link
Contributor Author

@m1kola can you please have another look/approve?

@viveksinghggits viveksinghggits merged commit 3ec0ca1 into master Oct 3, 2025
37 checks passed
@viveksinghggits viveksinghggits deleted the fix-lint-helm-chart-issue branch October 3, 2025 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-changelog Use this label in Pull Request to not require new changelog entry file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants