Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] spark-operator v1beta2-1.4.2-3.5.0 install with helm timeout #2035

Open
1 task done
Jay-boo opened this issue May 22, 2024 · 4 comments
Open
1 task done

[BUG] spark-operator v1beta2-1.4.2-3.5.0 install with helm timeout #2035

Jay-boo opened this issue May 22, 2024 · 4 comments

Comments

@Jay-boo
Copy link

Jay-boo commented May 22, 2024

Description

Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration.

If your request is for a new feature, please use the Feature request template.

  • ✋ I have searched the open/closed issues and my issue is not listed.

Reproduction Code [Required]

I encounter the problem while using it in Github CI/CD while giving this jobs:

  create-cluster:
    runs-on: ubuntu-latest
    steps:

      - name: Checkout current branch (full)
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Create kind cluster
        uses: helm/kind-action@v1
        with:
          config: ./kind/k8s_config/kind-config.yaml

      - name: Helm install
        run: |
          helm repo add spark-operator https://kubeflow.github.io/spark-operator
          helm search repo spark-operator
          helm repo update
          helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set webhook.enable=true --debug


Steps to reproduce the behavior:

Expected behavior

Successful spark-operator install

Actual behavior

Installation is timeout after 5 min

Terminal Output Screenshot(s)

  helm repo add spark-operator https://kubeflow.github.io/spark-operator
  helm search repo spark-operator
  helm repo update
  helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set webhook.enable=true --debug
  shell: /usr/bin/bash -e {0}
"spark-operator" has been added to your repositories
NAME                         	CHART VERSION	APP VERSION        	DESCRIPTION                                  
spark-operator/spark-operator	1.3.0        	v1beta[2](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:2)-1.4.2-3.5.0	A Helm chart for Spark on Kubernetes operator
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "spark-operator" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
install.go:218: [debug] Original chart version: ""
install.go:2[3](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:3)5: [debug] CHART PATH: /home/runner/.cache/helm/repository/spark-operator-1.3.0.tgz
client.go:1[4](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:4)2: [debug] creating 1 resource(s)
client.go:142: [debug] creating 1 resource(s)
wait.go:48: [debug] beginning wait for 2 resources with timeout of 1m0s
install.go:20[5](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:5): [debug] Clearing REST mapper cache
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "my-release-spark-operator" ServiceAccount
client.go:490: [debug] Ignoring delete failure for "my-release-spark-operator" /v1, Kind=ServiceAccount: serviceaccounts "my-release-spark-operator" not found
wait.go:[6](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:6)6: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "my-release-spark-operator" ClusterRole
client.go:490: [debug] Ignoring delete failure for "my-release-spark-operator" rbac.authorization.k8s.io/v1, Kind=ClusterRole: clusterroles.rbac.authorization.k8s.io "my-release-spark-operator" not found
wait.go:66: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "my-release-spark-operator" ClusterRoleBinding
client.go:490: [debug] Ignoring delete failure for "my-release-spark-operator" rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io "my-release-spark-operator" not found
wait.go:66: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "my-release-spark-operator-webhook-init" Job
client.go:490: [debug] Ignoring delete failure for "my-release-spark-operator-webhook-init" batch/v1, Kind=Job: jobs.batch "my-release-spark-operator-webhook-init" not found
wait.go:66: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:[7](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:8)12: [debug] Watching for changes to Job my-release-spark-operator-webhook-init with timeout of 5m0s
client.go:740: [debug] Add/Modify event for my-release-spark-operator-webhook-init: ADDED
client.go:779: [debug] my-release-spark-operator-webhook-init: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:740: [debug] Add/Modify event for my-release-spark-operator-webhook-init: MODIFIED
client.go:779: [debug] my-release-spark-operator-webhook-init: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed pre-install: 1 error occurred:
	* timed out waiting for the condition
helm.go:[8](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:9)4: [debug] failed pre-install: 1 error occurred:
	* timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
	helm.sh/helm/v3/cmd/helm/install.go:158
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra@v1.8.0/command.go:[9](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:10)83
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra@v1.8.0/command.go:1115
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra@v1.8.0/command.go:[10](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:11)39
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
	runtime/proc.go:267
runtime.goexit
	runtime/asm_amd64.s:1650
Error: Process completed with exit code 1.

Environment & Versions

  • Spark Operator App version: v1beta2-1.4.2-3.5.0
  • Helm Chart Version: 1.3.0
  • Kubernetes Version:
Client Version: v1.29.3
Kustomize Version: v[5](https://github.com/Jay-boo/InsightHoot/actions/runs/9187931760/job/25266641497#step:4:6).0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2
  • Apache Spark version: None at this stage

Additional context

@Jay-boo
Copy link
Author

Jay-boo commented May 22, 2024

Forced to use Chart version 1.2.7 to make it work

@Timoniche
Copy link

Timoniche commented May 23, 2024

Forced to use Chart version 1.2.7 to make it work

Can you please specify here the helm install command?

I have the similar problem (timeout here, mac m2)

helm install spark-operator/spark-operator --namespace spark-operator --set sparkJobNamespace=default --set webhook.enable=true --generate-name --debug

UPD:
seems

helm install eee spark-operator/spark-operator --namespace spark-operator --set sparkJobNamespace=default --set webhook.enable=true --debug --version 1.2.7

@Timoniche
Copy link

Timoniche commented May 24, 2024

Forced to use Chart version 1.2.7 to make it work

Fun fact that this is the only working version (1.2.5 also has timeout)

Do you have any problems with 1.2.7? For example, I don't see driver-pods creating while running spark-pi example, maybe because this is the first k8s touch from my side)

#
# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "spark:3.5.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar"
  sparkVersion: "3.5.0"
  sparkUIOptions:
    serviceLabels:
      test-label/v1: 'true'
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.5.0
    serviceAccount: spark-operator-spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.5.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

@ChenYi015
Copy link
Contributor

@Jay-boo Fixed in chart v1.3.2 with #2044.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants