Skip to content

kaitumisuuringute-keskus/mist

 
 

Repository files navigation

Build Status Build Status Maven Central Docker Hub Pulls

Hydrosphere Mist

Join the chat at https://gitter.im/Hydrospheredata/mist

Fork

This fork is mainly created to support Spark 3.0. For other changes, it has:

  • Artifact sizes can be up to 1GB
  • Is built on Hadoop 3.2, as earlier versions had issues with AWS S3
  • Is built having Fargate and AWS ECS in mind

build: sbt -DscalaVersion=2.12.7 -DsparkVersion=3.0.1 docker

This fork includes a third variable in case the image built is for a private repo. It's called imagePath and can be used as follows: sbt -DscalaVersion=2.12.7 -DsparkVersion=3.0.1 -DimagePath=REPO_NAME_HERE docker

Docker images:

  • kaitumisuuringutekeskus/mist:1.1.3-3.0.1-scala-2.12-hadoop3.2

Running it locally

First pull the mist image using docker pull kaitumisuuringutekeskus/mist:1.1.3-3.0.1-scala-2.12-hadoop3.2

Then run it using docker: docker run -p 2004:2004 -p 4040:4040 -v /var/run/docker.sock:/var/run/docker.sock kaitumisuuringutekeskus/mist:1.1.3-3.0.1-scala-2.12-hadoop3.2

If the docker instance needs access to localhost urls, use: docker run -p 2004:2004 -p 4040:4040 --add-host=host.docker.internal:host-gateway -v /var/run/docker.sock:/var/run/docker.sock kaitumisuuringutekeskus/mist:1.1.3-3.0.1-scala-2.12-hadoop3.2

Localhost inside the container will be "host.docker.internal"

Now you can connect to the instance on the ip and port localhost:2004

Mist

Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model for Spark applications.

Please see our quick start guide and documentation

Features:

  • Spark Function as a Service. Deploy Spark functions rather than notebooks or scripts.
  • Spark Cluster and Session management. Fully managed Spark sessions backed by on-demand EMR, Hortonworks, Cloudera, DC/OS and vanilla Spark clusters.
  • Typesafe programming framework that clearly defines inputs and outputs of every Spark job.
  • REST HTTP & Messaging (MQTT, Kafka) API for Scala & Python Spark jobs.
  • Multi-cluster mode: Seamless Spark cluster on-demand provisioning, autoscaling and termination(pending) Cluster of Spark Clusters

It creates a unified API layer for building enterprise solutions and microservices on top of a Spark functions.

Mist use cases

High Level Architecture

High Level Architecture

Contact

Please report bugs/problems to: https://github.com/Hydrospheredata/mist/issues.

http://hydrosphere.io/

LinkedIn

Facebook

Twitter

Packages

No packages published

Languages

  • Scala 95.3%
  • Python 3.0%
  • Other 1.7%