Skip to content

3. Spark tools

Simon Renauld edited this page Nov 22, 2021 · 7 revisions

3.1. Submitting Production Applications

The spark-submit script in Spark’s bin directory is used to launch applications on a cluster. spark-submit does one thing: it lets you send your application code to a cluster and launch it to execute there. Upon submission, the application will run until it exits (completes the task) or encounters an error.

https://spark.apache.org/docs/latest/submitting-applications.html

./bin/spark-submit \
--master local \
./examples/src/main/python/pi.py 10
  • --class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
  • --master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)
  • --deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client) †
  • --conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).

3.2. Structured APIs

In Spark structure API refers to DataFrames, Datasets and SQL. In the real world we often deal more with structure API rather than Core API i.e., RDDS. spark generates a physical plan for execution when a user submits his code through spark console or spark-submit

Clone this wiki locally