Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run a test class from a script with spark-operator instead of spark-submit? #719

Closed
jkleckner opened this issue Dec 6, 2019 · 7 comments

Comments

@jkleckner
Copy link
Contributor

TL;DR:
How can I run a test class with spark-operator from a script, wait until the application has either succeeded or failed, and recover that success or failure?

Detail:

I have a number of test classes that I can run with spark-submit with something like:

spark-submit \
        various env options... \
        --master 'local[*]' \
        --class $testClassName \
        ${path_to_jar_file_with_tests} \
        arguments to test class

What I would like is a recipe to employ the spark-operator from a shell script.

I can create the yaml equivalent of the spark-submit, but it doesn't appear straightforward to run/wait/test-status for spark applications reliably and without race conditions.

I'm hoping that someone out there has already figured out how to do this.
My background is primarily Scala/Java/Python and almost no exposure to Go and just powering up on k8s.

Ideally, I would like to just be a consumer of spark-operator with my learning focused to setting up the right YAML and/or config maps to run applications.

All help appreciated!

It looks like this issue indicates that kubectl wait is in need of generalizing: kubernetes/kubernetes#83094

I have created a branch located here that contains a script to run in a development cluster:

It asks this question as well these questions:

  • How can you launch a spark application and then reliably wait for it to finish?
    • This needs to be race free.
    • For a restart=Never application, what are the application states that
      indicate completion, just COMPLETED or FAILED?
    • Is there documentation about application states?
  • How can you know whether the application succeeded or failed?
    • Does COMPLETED imply success as the driver pod exit code should?
  • Why does the SparkPi example show all executor state as FAILED?
    • I've heard that this happens if sys.exit(0) is not called which supposedly
      should be avoided. Why doesn't spark.stop() cause executors to exit cleanly?
@jkleckner
Copy link
Contributor Author

Here is a gist with the output of a run of that script:

@liyinan926
Copy link
Collaborator

What about doing some enhancement to sparkctl and make it support what you would like to achieve here?

@liyinan926
Copy link
Collaborator

How can you launch a spark application and then reliably wait for it to finish?

I would explore extending sparkctl to support this kind of launch-and-wait use cases.

How can you know whether the application succeeded or failed?

Yes, the COMPLETED state implies a successful run of the application, or more specifically a successful exit of the driver.

Why does the SparkPi example show all executor state as FAILED.

There's some issue with keeping track of executor states, mainly because executors get deleted after they finished before the operator get a chance to be able to list and check their status. We put some fixes for this issue.

@jkleckner
Copy link
Contributor Author

jkleckner commented Dec 12, 2019

How can you launch a spark application and then reliably wait for it to finish?

I would explore extending sparkctl to support this kind of launch-and-wait use cases.

That could work, but it would be nice if the underlying api had this as a primitive so that there is no need to install another tool but the spark application could be waited on from kubectl.

How can you know whether the application succeeded or failed?

Yes, the COMPLETED state implies a successful run of the application, or more specifically a successful exit of the driver.

Great. That is what we will use. It would also be great to find a home in the doc for this information.

Why does the SparkPi example show all executor state as FAILED.

There's some issue with keeping track of executor states, mainly because executors get deleted after they finished before the operator get a chance to be able to list and check their status. We put some fixes for this issue.

Excellent. Can you point to the patch set so I can roll a version to try?

@liyinan926
Copy link
Collaborator

Excellent. Can you point to the patch set so I can roll a version to try?

The latest master has the fix.

@jkleckner
Copy link
Contributor Author

I built using the gitlab-ci.yml and tested that the spark-pi example now works with a master image that contains #727 - thanks!

@jkleckner
Copy link
Contributor Author

Closing this with reference to #732 as a more focused discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants