Skip to content
Boiler plate framework to use Spark and ZIO together.
Scala Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Library/src
ProjectExample/src Add a more complex example and update README Dec 2, 2019
ProjectExample_MoreComplex/src Run FMT Dec 2, 2019
bin Add missing binary Nov 19, 2019
project Update to latest deploy library and release specific module instead o… Nov 19, 2019
.bettercodehub.yml
.gitignore Add spark folder to ignore Dec 2, 2019
.scalafix.conf First commit Nov 14, 2019
.scalafmt.conf First commit Nov 14, 2019
.travis.yml
LICENSE
Makefile Update Makefile Dec 2, 2019
PROJECT_NAME First commit Nov 14, 2019
PULL_REQUEST_TEMPLATE.md Copy deployment infrastructure from https://github.com/leobenkel/safe… Nov 18, 2019
README.md
VERSION Raise version number Dec 2, 2019
build.sbt Add squeletton of new project example Dec 1, 2019
safetyPlugin.json First commit Nov 14, 2019
scalastyle-config.xml First commit Nov 14, 2019
stryker4s.conf Copy deployment infrastructure from https://github.com/leobenkel/safe… Nov 18, 2019

README.md

License: MIT Gitter release-badge maven-central-badge

Build Status BCH compliance Coverage Status Mutation testing badge

Zparkio

Boiler plate framework to use Spark and ZIO together.

The goal of this framework is to blend Spark and ZIO in an easy to use system for data engineers.

Allowing them to use Spark is a new, faster, more reliable way, leveraging ZIO power.

Table of Contents

Created by gh-md-toc

What is this library for ?

This library will implement all the boiler plate for you to be able to include Spark and ZIO in your ML project.

It can be tricky to use ZIO to save an instance of Spark to reuse in your code and this library solve all the boilerplate problem for you.

Why would you want to use ZIO and Spark together?

From my experience, using ZIO/Future in combination with Spark can speed up drastically the performance of your job. The reason being that sources (BigQuery, Postgresql, S3 files, etc...) can be fetch in parallel while the computation are not on hold. Obviously ZIO is much better than Future but it is harder to set up. Not anymore!

Some other nice aspect of ZIO is the error/exception handling as well as the build-in retry helpers. Which make retrying failed task a breath within Spark.

How to use?

I hope that you are now convinced that ZIO and Spark are a perfect match. Let's see how to use this Zparkio.

Include dependencies

First include the library in your project:

libraryDependencies += "com.leobenkel" %% "zparkio" % "[VERSION]"

With version being: maven-central-badge .

This library depends on Spark, ZIO and Scallop.

How to use in your code?

There is a project example you can look at. But here are the details.

Main

The first thing you have to do is extends the ZparkioApp trait. For an example you can look at the ProjectExample: Application.

Spark

By using this architecture, you will have access to SparkSesion anywhere in your ZIO code, via

import com.leobenkel.zparkio.Services._

for {
  spark <- SparkModule()
} yield {
  ???
}

for instance you can see its use here.

Command lines

You will also have access to all your command lines automatically parsed, generated and accessible to you via:

CommandLineArguments ; it is recommended to make this helper function to make the rest of your code easier to use.

Then using it, like here, is easy.

Helpers

In the implicits object, that you can include everywhere. You are getting specific helper functions to help streamline your projects.

Unit test

Using this architecture will literally allow you to run your main as a unit test.

Examples

Simple example

Take a look at the simple project example to see example of working code using this library: SimpleProject.

More complex architecture

A full fles production ready project will obviously need more code that the simple example. For this purpose, and upon suggestion of several awesome people, I added a more complex project. This is a WIP and more will be added as I go. MoreComplexProject.

Authors

Leo Benkel

  • leobenkel-github-badge
  • leobenkel-linkedin-badge
  • leobenkel-personal-badge
You can’t perform that action at this time.