hydra-spark

hydra-spark provides a declarative and intuitive interface for creating and submitting [Apache Spark] (http://spark-project.org) data flow pipelines leveraging the flexibility of Spark's DataFrame API.

This repo contains the complete Hydra Spark project, including unit tests and deploy scripts.

Features

"Declarative Spark Jobs": Simple JSON/HOCON based syntax to describe Spark jobs
Support for Hadoop, Hive, Kafka (both as a source and sink), Elastic Search and many others. See Sources.
Support for both batch and Streaming jobs using a unified API.
Supports different Spark deploy modes (local, yarn-client) which can also be overriden at the DSL level.
Supports Scala 2.10 and 2.11

Version Information

Version	Spark Version
master	2.2

For release notes, look in the notes/ directory. They should also be up on notes.implicit.ly.

We host non-release jars at Jitpack.

Getting Started with Hydra Spark

The easiest way to get started is to try the Docker container which prepackages a Spark distribution with the Hydra Spark DSL assembly included.

Other ways to run:

Build and run directly from an IDE. IntelliJ instructions follow below.
Run using sbt
Run sbt assembly and copy the jar to the Spark cluster.

Development mode

The steps below show you how to use hydra-spark with an example DSL, by running Spark in-process. This is not an example of usage in production.

You need to have SBT installed.

Using an IDE

If you are using a Scala IDE (such as IntelliJ), you can import the project and start by running any of the test specs. To run a specific DSL <>

WordCountExample walk-through

Package Jar - Send to Cluster

Docs coming

Contribution and Development

Contributions via Github Pull Request are welcome. See the TODO for some ideas.

Profiling software provided by

YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.

Contact

Please report bugs/problems to: https://github.com/pluralsight/hydra-spark/issues

License

Apache 2.0, see LICENSE.md

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
api/src		api/src
core/src		core/src
examples		examples
project		project
.gitignore		.gitignore
.jvmopts		.jvmopts
.psmetadata		.psmetadata
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
build.sbt		build.sbt
docker-compose.yml		docker-compose.yml

pluralsight/hydra-spark

Folders and files

Latest commit

History

Repository files navigation

hydra-spark

Features

Version Information

Getting Started with Hydra Spark

Development mode

Using an IDE

WordCountExample walk-through

Package Jar - Send to Cluster

Contribution and Development

Contact

License

About

Resources

Stars

Watchers

Forks

Languages