Skip to content

This repo shows how to run (Py)Spark in Orchest (locally)

Notifications You must be signed in to change notification settings

ricklamers/orchest-hello-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Orchest: Hello Spark

Open in Orchest

This repo shows how to run (Py)Spark in Orchest (locally).

For details on how Spark is installed check out setup_script.sh. The actual Spark code is a minimal example of how to count words in a Python LICENSE text file. Checkout the notebook with code.

To connect to a cluster instead use a different PySpark context initializer:

conf = pyspark.SparkConf()
conf.setMaster('spark://head_node:7077')
conf.set('spark.authenticate', True)
conf.set('spark.authenticate.secret', 'secret-key')
sc = pyspark.SparkContext(conf=conf)

Pipeline

PySpark pipeline

About

This repo shows how to run (Py)Spark in Orchest (locally)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published