Skip to content

pluralsight/hydra-spark

Repository files navigation

hydra-spark

Build Status codecov Join the chat at https://gitter.im/pluralsight/hydra

hydra-spark provides a declarative and intuitive interface for creating and submitting [Apache Spark] (http://spark-project.org) data flow pipelines leveraging the flexibility of Spark's DataFrame API.

This repo contains the complete Hydra Spark project, including unit tests and deploy scripts.

Features

  • "Declarative Spark Jobs": Simple JSON/HOCON based syntax to describe Spark jobs
  • Support for Hadoop, Hive, Kafka (both as a source and sink), Elastic Search and many others. See Sources.
  • Support for both batch and Streaming jobs using a unified API.
  • Supports different Spark deploy modes (local, yarn-client) which can also be overriden at the DSL level.
  • Supports Scala 2.10 and 2.11

Version Information

Version Spark Version
master 2.2

For release notes, look in the notes/ directory. They should also be up on notes.implicit.ly.

We host non-release jars at Jitpack.

Getting Started with Hydra Spark

The easiest way to get started is to try the Docker container which prepackages a Spark distribution with the Hydra Spark DSL assembly included.

Other ways to run:

  • Build and run directly from an IDE. IntelliJ instructions follow below.
  • Run using sbt
  • Run sbt assembly and copy the jar to the Spark cluster.

Development mode

The steps below show you how to use hydra-spark with an example DSL, by running Spark in-process. This is not an example of usage in production.

You need to have SBT installed.

Using an IDE

If you are using a Scala IDE (such as IntelliJ), you can import the project and start by running any of the test specs. To run a specific DSL <>

WordCountExample walk-through

Package Jar - Send to Cluster

Docs coming

Contribution and Development

Contributions via Github Pull Request are welcome. See the TODO for some ideas.

Profiling software provided by

YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.

Contact

Please report bugs/problems to: https://github.com/pluralsight/hydra-spark/issues

License

Apache 2.0, see LICENSE.md

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published