Skip to content

Fix corrupted splunk-operator-3.0.0.tgz causing Helm test failures#1863

Open
gabrielm-splunk wants to merge 2 commits intodevelopfrom
fix-corrupted-helm-operator-3.0.0-tgz
Open

Fix corrupted splunk-operator-3.0.0.tgz causing Helm test failures#1863
gabrielm-splunk wants to merge 2 commits intodevelopfrom
fix-corrupted-helm-operator-3.0.0-tgz

Conversation

@gabrielm-splunk
Copy link
Copy Markdown
Collaborator

@gabrielm-splunk gabrielm-splunk commented Apr 21, 2026

Problem

After merging PR #1832 to restore helm support for 3.0.0, helm-tests are failing with:

Error: INSTALLATION FAILED: template: splunk-enterprise/charts/splunk-enterprise/templates/enterprise_v4_ingestorcluster.yaml:1:14: 
executing "splunk-enterprise/charts/splunk-enterprise/templates/enterprise_v4_ingestorcluster.yaml" at <.Values.ingestorCluster.enabled>: 
nil pointer evaluating interface {}.enabled

Root Cause

The splunk-operator-3.0.0.tgz file was corrupted - it contained the full splunk-enterprise chart package (4.5MB) instead of just the splunk-operator chart (5.8KB). This caused Helm to load a stale splunk-enterprise as a subchart within the operator chart, leading to nested template path errors.

File size comparison:

  • Corrupted: splunk-operator-3.0.0.tgz = 4.5MB (contains full splunk-enterprise chart)
  • Correct: splunk-operator-3.0.0.tgz = 5.8KB (contains only operator chart)
  • Reference: splunk-operator-3.1.0.tgz = 6.7KB (correct structure)

The corruption happened in multiple "Restore helm chart version 3.0.0 to repository index" commits (b9766dd and related).

Solution

1. Restored correct tgz file

  • Extracted good version from git history (commit a3737ba)
  • File now contains only splunk-operator chart content, not nested splunk-enterprise

2. Added validation tooling

Created tools/validate-helm-charts.sh to detect future corruption:

  • ✅ Validates tgz starts with splunk-operator/ directory (not splunk-enterprise/)
  • ✅ Checks for embedded splunk-enterprise/Chart.yaml content
  • ✅ Verifies file sizes are reasonable for version (3.x ~5-10KB, 2.x ~400-430KB with CRDs)
  • ✅ Detects files over 1MB (clear sign of full chart package)

3. Automated CI/CD validation

Created .github/workflows/validate-helm-charts.yml that runs on:

  • PRs touching helm-chart files
  • Pushes to main/develop
  • Manual workflow dispatch

The workflow performs:

  • Tgz structure validation - Runs tools/validate-helm-charts.sh on all operator chart tgz files
  • Helm lint - Validates both splunk-operator and splunk-enterprise charts
  • Template rendering tests - Tests default, c3, and s1 deployment patterns
  • PR comments - Automatically comments on PRs if validation fails with helpful diagnostics

This CI check would have caught the corruption before it was merged to develop.

Testing

  • helm lint passes on both splunk-operator and splunk-enterprise charts
  • helm template successfully renders c3 deployment without nil pointer errors
  • ✅ Validation script detects corruption when tested with intentionally bad file
  • ✅ All operator chart tgz files validated successfully (2.3.0 - 3.1.0)
  • ✅ Template rendering tests pass for default, c3, and s1 deployments

Verification Commands

# Validate all charts
./tools/validate-helm-charts.sh

# Test template rendering for c3 deployment
helm template test-c3 helm-chart/splunk-enterprise \
  --set sva.c3.enabled=true \
  --set "sva.c3.indexerClusters[0].name=idx1" \
  --set "sva.c3.searchHeadClusters[0].name=shc1" \
  --set clusterManager.enabled=true

# Lint charts
helm lint helm-chart/splunk-operator
helm lint helm-chart/splunk-enterprise

Files Changed

  • helm-chart/splunk-enterprise/charts/splunk-operator-3.0.0.tgz - Restored correct 5.8KB version
  • tools/validate-helm-charts.sh - Validation script for detecting tgz corruption
  • .github/workflows/validate-helm-charts.yml - Automated CI/CD validation workflow

Related Issues

Fixes the helm test error reported after PR #1832 merge and adds safeguards to prevent future occurrences.

The splunk-operator-3.0.0.tgz file was corrupted - it contained the full
splunk-enterprise chart (4.5MB) instead of just the operator chart (5.8KB).
This caused Helm to load a stale splunk-enterprise as a subchart, leading
to template rendering errors:

  Error: INSTALLATION FAILED: template: splunk-enterprise/charts/
  splunk-enterprise/templates/enterprise_v4_ingestorcluster.yaml:1:14:
  executing "splunk-enterprise/charts/splunk-enterprise/templates/
  enterprise_v4_ingestorcluster.yaml" at <.Values.ingestorCluster.enabled>:
  nil pointer evaluating interface {}.enabled

Root cause: The file was replaced with a packaged splunk-enterprise chart
in multiple "Restore helm chart version 3.0.0" commits (see b9766dd and
related commits in git history).

Fix:
- Restored correct splunk-operator-3.0.0.tgz from commit a3737ba (5.8KB)
- File now contains only splunk-operator chart content, not splunk-enterprise

Validation:
- helm lint passes on both splunk-operator and splunk-enterprise charts
- helm template successfully renders c3 deployment without errors
- Added tools/validate-helm-charts.sh script to detect future corruption

The validation script checks:
- tgz files start with "splunk-operator/" directory (not "splunk-enterprise/")
- Files don't contain splunk-enterprise/Chart.yaml content
- File sizes are reasonable (detects 4.5MB corruption vs expected 5-400KB)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread tools/validate-helm-charts.sh
Created .github/workflows/validate-helm-charts.yml to automatically
validate Helm chart tgz files and prevent corruption.

The workflow runs on:
- Pull requests that modify helm-chart files (tgz, Chart.yaml, values.yaml)
- Pushes to main/develop that touch helm-chart files
- Manual trigger via workflow_dispatch

Validation checks:
1. Operator chart tgz structure validation (via tools/validate-helm-charts.sh)
   - Ensures tgz files contain only splunk-operator/ content
   - Detects embedded splunk-enterprise chart corruption
   - Verifies file sizes are reasonable

2. Helm lint on both splunk-operator and splunk-enterprise charts

3. Template rendering tests for common deployment patterns:
   - Default values
   - C3 deployment (cluster manager + indexer cluster + search head cluster)
   - S1 deployment (standalone)

Benefits:
- Catches corrupted tgz files before merge
- Validates template syntax and rendering
- Provides early feedback on PRs via automated comments
- Prevents helm test failures in CI

This would have caught the splunk-operator-3.0.0.tgz corruption before
it was merged to develop.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

helm splunk operator helm chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants