Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failed to pull image "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1 #1991

Open
vzhao12 opened this issue Apr 17, 2024 · 9 comments
Labels
kind/bug Something isn't working

Comments

@vzhao12
Copy link

vzhao12 commented Apr 17, 2024

Description

Unable to Start spark job in kubenetes

  • [*] ✋ I have searched the open/closed issues and my issue is not listed.

Reproduction Code [Required]

Steps to reproduce the behavior:

  1. Set up a new kubenetes cluster. I set up one in gcloud.
  2. Get kubenetes cluster config
  3. helm repo add spark-operator https://kubeflow.github.io/spark-operator
  4. helm install spark-operator spark-operator/spark-operator
    --namespace default
    --set 'image.tag=v1beta2-1.3.3-3.1.1'
    --set sparkJobNamespace=default

Expected behavior

Spin up the spark operator pod.

Actual behavior

Pod failed because of ImagePullBackOff

Saw the following error.

Failed to pull image "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1": rpc error: code = NotFound desc = failed to pull and unpack image "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1": failed to resolve reference "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1": ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1: not found

The errors start at 04/13/2024 1:00 AM

Terminal Output Screenshot(s)

Screenshot 2024-04-17 at 3 14 30 PM Screenshot 2024-04-17 at 3 14 38 PM

Environment & Versions

  • Spark Operator App version:3.1.1
  • Helm Chart Version: v3.12.3
  • Kubernetes Version: v1.28.7-gke.1026000
  • Apache Spark version:

Additional context

@vzhao12
Copy link
Author

vzhao12 commented Apr 17, 2024

I checked https://github.com/kubeflow/spark-operator/pkgs/container/spark-operator
It looks like we didn't publish version v1beta2-1.3.3-3.1.1 at all.

@yuchaoran2011 Can you push this version to fix the issue? Thanks

@vzhao12
Copy link
Author

vzhao12 commented Apr 17, 2024

Root cause is #1937

@bharathk005
Copy link

/kind bug

@zevisert
Copy link
Contributor

@vzhao12 Until this is addressed, you can use images from the old registry by invoking helm with an extra option

--set 'image.repository=ghcr.io/googlecloudplatform/spark-operator'

@JunseoChoJJ
Copy link

@vzhao12 I am still getting imagepullbackoff error. does anyone have idea?
helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set 'image.repository=ghcr.io/googlecloudplatform/spark-operator'
I am using this command

@iva3682
Copy link

iva3682 commented Apr 22, 2024

use 'image.repository=ghcr.io/kubeflow/spark-operator' and 'image.tag=v1beta2-1.4.3-3.5.0'

@vara-bonthu
Copy link
Contributor

We just released a new image update with important registry fixes. Check it out:

Image tag: https://github.com/kubeflow/spark-operator/tree/v1beta2-1.4.5-3.5.0
Helm chart: https://github.com/kubeflow/spark-operator/releases/tag/spark-operator-chart-1.2.14

Please give it a try and let us know if you encounter any issues. We're working on a new KubeFlow Spark Operator release and your testing will help make it stable! Feel free to share feedback on the Kubeflow Spark operator channel.

@zevisert
Copy link
Contributor

zevisert commented Apr 26, 2024

@vara-bonthu Users will still need to --set=image.repository=... if they are using any tag other than v1beta2-1.4.5-3.5.0 since previous docker images have not yet been replicated to the chart's default repository (docker.io/kubeflow/spark-operator).

Still only one tag exists in the default container registry: https://hub.docker.com/r/kubeflow/spark-operator/tags

Edit: Changed tag to match @RyanZotti's comment

@RyanZotti
Copy link

I think you meant any tag other than v1beta2-1.4.5-3.5.0. The 1.4.3 version isn't available but 1.4.5 is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants