Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
Wikipedia Data Parsing in Spark and Scala
A Python wrapper for LibFFM
ūnus - one from union
Docker build for Zeppelin, a web-based Spark notebook
a dot repository
Code Less, Build More. Clean, automated Feature Generation and Selection for Apache Spark!
Dependency and data pipeline management framework for Spark and Scala
Compile-time Language Integrated Queries for Scala
Mirror of Apache Spark
This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)
Airflow is a system to programmatically author, schedule and monitor data pipelines.
The Nak Machine Learning Library
DBSCAN clustering algorithm on top of Apache Spark
Java based GraphViz HTTP Server
Mirror of Apache Zeppelin (Incubating)
Use Apache Spark straight from the Browser
Scripts used to setup a Spark cluster on EC2