Skip to content
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Scala Dockerfile Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
project Implemented new project structure and release process Oct 29, 2018
LICENSE Initial commit Aug 3, 2018
docker-compose.yml Add Aggregate Knowledge HLL implementation Oct 16, 2019



Spark Alchemy is a collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive in our demanding petabyte-scale environment with rich data (thousands of columns).


Add the following to your libraryDependencies in SBT:

resolvers += Resolver.bintrayRepo("swoop-inc", "maven")

libraryDependencies += "com.swoop" %% "spark-alchemy" % "<version>"

You can find all released versions here.

For Spark users

  • Native HyperLogLog functions that offer reaggregatable fast approximate distinct counting capabilities far beyond those in OSS Spark with interoperability to Postgres and even JavaScript.

For Spark framework developers

What's coming

  • Configuration Addressable Production (CAP), Automatic Lifecycle Management (ALM) and Just-in-time Dependency Resolution (JDR) as outlined in our Spark+AI Summit talk Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments.

  • Hundreds of productivity-enhancing extensions to the core user-level data types: Column, Dataset, SparkSession, etc.

  • Data discovery and cleansing tools we use to ingest and clean up large amounts of dirty data from third parties.

  • Cross-cluster named lock manager, which simplifies data production by removing the need for workflow servers much of the time.

  • Versioned data source, which allows a new version to be written while the current version is being read.

  • case class code generation from Spark schema, with easy implementation customization.

  • Tools for deploying Spark ML pipelines to production.

  • Lots more, as we are constantly building up our internal toolset.

More from Swoop

  • spark-records: bulletproof Spark jobs with fast root cause analysis in the case of failures

Community & contributing

Contributions and feedback of any kind are welcome. Please, create an issue and/or pull request.

Spark Alchemy is maintained by the team at Swoop. If you'd like to contribute to our open-source efforts, by joining our team or from your company, let us know at spark-interest at swoop dot com.


spark-alchemy is Copyright © 2018 Swoop, Inc. It is free software, and may be redistributed under the terms of the LICENSE.

You can’t perform that action at this time.