New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch Job Stuck at "Starting" #9
Comments
Hi! Glad to hear that you are trying out Lighter! Since you changed all the namespaces in your code to Another thing is that we see that you are using |
@Minutis thanks for your help and sorry I did not see this earlier! I replaced the You were right about the data problems. I currently mount an NFS volume in all my workers (extra settings for helm chart): service:
type: ClusterIP
worker:
replicaCount: 3
extraVolumes:
- name: spark-data
persistentVolumeClaim:
claimName: spark-data-pvc
extraVolumeMounts:
- name: spark-data
mountPath: /data but I guess that Lighter creates a new container without that extra volume. I sent the following request: curl -X POST -H "Content-type: application/json" -d '{
"name": "Test",
"file": "/data/spark-examples.jar",
"args": ["/data/test.txt"],
"files": ["/data/test.txt"],
"className" : "org.apache.spark.examples.JavaWordCount"
}' 'localhost:8081/lighter/api/batches' and it was correctly received but the logs show that it can't find the file.
Is there any way to add this volume via Lighter's deployment configuration, environment variables, etc? If not, what do you guys recommend for getting files available in the new container? I'm also getting an error after the exception shown above:
I checked and I have no Any ideas on what's going on? Thanks for your help and sorry for the delay! |
To use Volume mounts on your driver/executor pods you'll have to follow official Spark documentation regarding it: Other option would be to to use a custom pod templates, and attach your mounts here. To do that, you can create Config map on I'm not sure about your last Exception, have you really got it after changing |
Hello!
Thanks for you work on Lighter, I've been looking for a replacement for Livy for a bit and this is the closest thing on the Internet!
I'm running Lighter on a minikube Kubernetes environment. I have a Spark cluster deployed through Helm charts. I also deployed PostgreSQL through Helm since it seems like Lighter needs it. Here's how my environment looks (in the default namespace):
Based on your instructions, I did:
I did not bother with the ingress portion because I am just testing and use port-forwarding to see the Spark and Lighter UIs.
I am sending a POST request to Lighter's Batch API with the following:
After the request gets accepted, other fields get automatically filled in and the UI reflects the submission. However, the state of the job is always "Starting" (also I can't delete it if I press the "X" button).
I'm not sure what is going on. I tried checking the pod's logs but there is nothing related. I was wondering if you could point me in the right direction to figure out what is missing/misconfigured for it to work properly. I'm attaching a screenshot of the UI.
Thank you for your time!
The text was updated successfully, but these errors were encountered: