spark-snippets

Various Spark code snippets that may be useful for you. Mainly utilities for common tasks.

Things you would find here

Spark Scala on Jupyter
Save Dataframe (SQL) to Kafka topic
Spark running fat jar shared among Spark Workers using HDFS
Geolocation obfuscation algorithm
Load CSV from local file system and to/from partitioned HDFS
Kafka Cluster using Docker
HDFS Cluster using Docker
Spark Cluster using Docker

Snippets

Geo Samples Obfuscation

Usage

Copy this examples files to your project
Update the docker-compose.yml file so that you use your own container name
Run docker-compose up --build
Open http://localhost:8888
Create a new Notebook with the following contents:

//import your custom jar in the notebook with a special Toree directive
%AddJar file:///app/app.jar

//import a custom library from Maven (Vegas is a visualization lib)
%AddDeps org.vegas-viz vegas_2.11 0.3.11 --transitive
%AddDeps org.vegas-viz vegas-spark_2.11 0.3.11

println("Initializing Spark context...")
val conf = new SparkConf().setAppName("Example App")
val spark: SparkSession = SparkSession.builder.config(conf).getOrCreate()

println("************")
println("Hello, world!")
val rdd = spark.sparkContext.parallelize(Array(1 to 10))
rdd.count()
println("************")

println("Stop Spark session")
spark.stop()

Run Notebook cells
Open http://localhost:8080 and check for running Spark Applications according to notebook instances running
For adding more Spark Workers, you can simply do

docker-compose up --scale spark-worker=5

For an example of clustered HDFS with multiple namenodes/datanodes, go to https://github.com/flaviostutz/spark-scala-hdfs-docker-example/blob/master/docker-compose.yml

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
notebooks		notebooks
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
startup.sh		startup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

notebooks

notebooks

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

docker-compose.yml

docker-compose.yml

startup.sh

startup.sh

Repository files navigation

spark-snippets

Snippets

Usage

About

Releases

Packages

Languages

License

flaviostutz/spark-snippets

Folders and files

Latest commit

History

Repository files navigation

spark-snippets

Snippets

Usage

About

Resources

License

Stars

Watchers

Forks

Languages