GitHub - mrjhonyvidal/sparkLab: Distributed Computing with Spark

What Spark is?

Unified computing engine & libraries for distributed data processing which supports:

Spark is not concerned with data sources:

Spark is not part of Hadoop.

Build and start PostgreSQL container that will be used to interact with from Spark:

docker-compose up

Connect to DB:

chmod +x ./psql.sh
./psql.sh

Go to spark-cluster in another terminal and run docker-compose up --scale spark-worker=3:

chmod +x build-images.sh
./build-images.sh

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
project		project
pyspark		pyspark
spark-aws		spark-aws
spark-cluster		spark-cluster
spark-warehouse/scalalab.db/movies		spark-warehouse/scalalab.db/movies
sql		sql
src/main		src/main
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
docker-clean.sh		docker-clean.sh
docker-compose.yml		docker-compose.yml
psql.sh		psql.sh

Provide feedback