Skip to content

Latest commit

 

History

History
54 lines (42 loc) · 3.06 KB

QUICKREF.md

File metadata and controls

54 lines (42 loc) · 3.06 KB

Spark Quick Reference

Spark project This document gathers Spark related tips and hints.

Abbreviations

Abbreviation Description
AES Advanced Encryption Standard
DAG Direct Acyclic Graph
HDFS Hadoop Distributed File System
ML Machine Learning
OLAP Online Analytical Processing
RDD Resilient Distributed Dataset
UDF User-defined Functions
UDTF User-defined Table Functions
YARN Yet Another Resource Negotiator

Spark Properties

Spark configuration can be specified in three ways:

  • Using a property file (either option --properties-file FILE or file conf\spark-defaults.conf as default location).
  • Programmatically with setter methods of org.apache.spark.SparkConf.
  • Using dedicated command line options or -c PROP=VALUE for arbitrary properties.

For instance (see also article "Spark submit --num-executors --executor-cores --executor-memory", March 2022):

Programmatically Command line option
.set("spark.executor.cores", "8") --executor-cores 8
.set("spark.executor.memory", "128m") --executor-memory 128m
.setAppName("name") --name "name"
.setMaster("local[2]") --master "local[2]"
.setSparkHome("<some path>") --

Note: The spark-submit command internally uses org.apache.spark.deploy.SparkSubmit class with the options and command line arguments you specify.


mics/May 2024