Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Error related with webhook #2032

Open
alstjs37 opened this issue May 20, 2024 · 5 comments
Open

[QUESTION] Error related with webhook #2032

alstjs37 opened this issue May 20, 2024 · 5 comments
Labels
question Further information is requested

Comments

@alstjs37
Copy link

Hello,

I've just encountered an error like this,

When I first installed the spark-operator and run pyspark-pi.py without giving the --set webhook.enable=true option,
I checked it worked well.

After that, in order to mount the volume, I removed the spark-operator with helm uninstall and reinstalled it again by giving the --set webhook.enable=ture option, but pyspark-pi does not work now.

My pods in spark-operator namespace

$ sudo kubectl get all -n spark-operator

NAME                                                  READY   STATUS      RESTARTS   AGE
pod/sparkoperator-spark-operator-6994c8bcfd-vns8k     1/1     Running     0          137m
pod/sparkoperator-spark-operator-webhook-init-ww2lw   0/1     Completed   0          137m

NAME                                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/sparkoperator-spark-operator-webhook   ClusterIP   10.107.69.123   <none>        443/TCP   137m

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sparkoperator-spark-operator   1/1     1            1           137m

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/sparkoperator-spark-operator-6994c8bcfd   1         1         1       137m

NAME                                                  STATUS     COMPLETIONS   DURATION   AGE
job.batch/sparkoperator-spark-operator-webhook-init   Complete   1/1           3s         137m

when i apply yaml file to k8s with below file,
i got SUBMISSION_FAILED from sparkapplication ...

here is my pyspark-pi.yaml

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: pyspark-pi
  namespace: spark-operator
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  image: "msleedockerhub/spark-py:py3.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.5.1"
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.5.1
    serviceAccount: sparkoperator-spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.5.1

I'm sure there's no problem with the image I created.
If you set up the webhook, what else should I set up in the yaml file?

How can i solve this problem?
plz help

@alstjs37 alstjs37 added the question Further information is requested label May 20, 2024
@imtzer
Copy link

imtzer commented May 21, 2024

@alstjs37 Can you use kubectl describe <your-sparkapp> and provide the output? And if pod was created, check pod log too

@alstjs37
Copy link
Author

@imtzer thx for you answer

I've already checked, but this is the part of the log that contains the starting point of the error.

Status:
  Application State:
    Error Message:  failed to run spark-submit for SparkApplication spark-operator/pyspark-pi: 24/05/21 04:51:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/05/21 04:51:06 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
24/05/21 04:51:07 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
24/05/21 04:51:07 WARN DriverCommandFeatureStep: spark.kubernetes.pyspark.pythonVersion was deprecated in Spark 3.1. Please set 'spark.pyspark.python' and 'spark.pyspark.driver.python' configurations or PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables instead.
24/05/21 04:51:48 ERROR Client: Please check "kubectl auth can-i create pod" first. It should be yes.
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
  at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)

I checked Please check "kubectl auth can-i create pod" first this command, but i got yes from k8s.

And then there's a SUBMISSION_FAILED, so pod is not generated 😭

Do you expect anything else?

@imtzer
Copy link

imtzer commented May 30, 2024

kubectl auth can-i create pod

The output error kubectl auth can-i create pod is throwed in Spark repo KubernetesClientApplication.scala file when using KubernetesClient API to create driver pod

@jcunhafonte
Copy link

jcunhafonte commented Jul 1, 2024

I'm facing the same issue with the version v1beta2-1.4.3-3.5.0. @alstjs37 Were you able to fix this issue?

@youngsol
Copy link

I think it is rather you did not create service account with proper role or IRSA bound on your service account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants