Processing framework for containerized algorithms
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github 🖋 fixed typo Dec 13, 2018
dcos Updates to use SERVICE_SECRET in place of all old DCOS environment va… Nov 8, 2018
dockerfiles Eliminate use of Mesos DNS in favor of VIPs 🤘 Nov 27, 2018
graphics Fixed scale logo model that was missing parts Jan 4, 2018
hooks Bumped to trigger build Oct 18, 2017
scale-ui Removing 'in collaboration with statements' Nov 19, 2018
scale Merge pull request #1475 from ngageoint/1384-v6DataFilters Jan 11, 2019
web-docs 🎄updated links to v5 api Dec 25, 2018
.dockerignore Quick and dirty builds to skip UI / doc generation during dev Apr 5, 2018
.gitattributes Fix line endings Dec 3, 2015
.gitignore ⚡️ rebase fixes Aug 30, 2018
.gitlab-ci.yml Updates to support testing build on Docker Hub Aug 15, 2018
.travis.yml Removed docker steps from travis in favor of docker cloud Oct 18, 2017
Dockerfile Removed comments 🔥 Nov 8, 2018
Dockerfile-dev v0 is out, HTTP v1 is in 🔥 Nov 8, 2018
LICENSE Initial commit Nov 18, 2015
README.md Removing 'collaboration with' statements Nov 19, 2018
build-scale.sh Removing debug statement Aug 15, 2018
build-support.sh Quoted password to protect against special characters Aug 15, 2018
dev-build-push.sh Quick cleanup for dev marathon apps May 15, 2018
generate-release.sh Added ARG for configuration of base image Sep 8, 2017
push-docs.sh Removed master:gh-pages branch mapping. We are already on gh-pages br… Jul 6, 2017
travis-build.sh 👌 Addressing PR changes Nov 19, 2018

README.md

Scale

Join the chat at https://gitter.im/ngageoint/scale Build Status

Scale is a system that provides management of automated processing on a cluster of machines. It allows users to define jobs, which can be any type of script or algorithm. These jobs run on ingested source data and produce product files. The produced products can be disseminated to appropriate users and/or used to evaluate the producing algorithm in terms of performance and accuracy.

Mesos and Nodes

Scale runs across a cluster of networked machines (called nodes) that process the jobs. Scale utilizes Apache Mesos, a free and open source project, for managing the available resources on the nodes. Mesos informs Scale of available computing resources and Scale schedules jobs to run on those resources.

Ingest

Scale ingests source files using a Scale component called Strike. Strike is a process that monitors an ingest directory into which source data files are being copied. After a new source data file has been ingested, Scale produces and places jobs on the queue depending on the type of the ingested file. Many Strike processes can be run simultaneously, allowing Scale to monitor many different ingest directories.

Jobs

Scale creates jobs based on its known job types. A job type defines key characteristics about an algorithm that Scale needs to know in order to run it (what command to run, the algorithm.s inputs and outputs, etc.) Job types are labeled with versions, allowing Scale to run multiple versions of the same algorithm. Jobs may be created automatically due to an event, such as the ingest of a particular type of source data file, or they may be created manually by a user. Jobs that need to be executed are placed onto and prioritized within a queue before being scheduled onto an available node. When multiple jobs need to be run in a serial or parallel sequence, a recipe can be created that defines the job workflow.

Products

Jobs can produce products as a result of their successful execution. Products may be disseminated to users or used to analyze and improve the algorithms that produced them. Scale allows the creation of different workspaces. A workspace defines a separate location for storing source or product files. When a job is created, it is given a workspace to use for storing its results, allowing a user to control whether the job.s results are available to a wider audience or are restricted to a private workspace for the user's own use.

Scale Dependencies

Scale requires several external components to run as intended. PostgreSQL is used to store all internal system state and must be accessible to both the scheduler and web server processes. Logstash along with Elasticsearch are used to collect and store all algorithm logs. A message broker is required for in-flight storage of internal Scale messages and must be accessible to all system components. The following versions of these services are required to support Scale:

  • Elasticsearch 2.4
  • Logstash 2.4
  • PostgreSQL 9.4+
  • PostGIS 2.0+
  • Message Broker (RabbitMQ 3.6+ or Amazon SQS)

Note: We strongly recommend using managed services for PostgreSQL (AWS RDS), Messaging (AWS SQS) and Elasticsearch (AWS Elasticsearch Service), if available to you. Use of these services in Docker containers should be avoided in all but development environments. Reference the Architecture documentation for additional details on configuring supporting services.

Quick Start

While Scale can be entirely run on a pure Apache Mesos cluster, we strongly recommend using Data Center Operating System (DC/OS). DC/OS provides service discovery, load-balancing and fail-over for Scale, as well as deployment scripts for nearly all imaginable target infrastructures. This stack allows Scale users to focus on use of the framework while minimizing effort spent on deployment and configuration. A complete quick start guide can be found at:

https://ngageoint.github.io/scale/quickstart.html

Algorithm Development

Scale is designed to allow development of recipes and jobs for your domain without having to concern yourself with the complexities of cluster scheduling or data flow management. As long as your processing can be accomplished with discrete inputs on a Linux command line, it can be run in Scale. Simple examples of a complete processing chain can be found within the above quick start or you can refer to our in-depth documentation for step-by-step Scale integration:

https://ngageoint.github.io/scale/docs/algorithm_integration/index.html

Scale Development

If you want to contribute to the actual Scale open source project, we welcome your contributions. There are 2 primary components of Scale:

The links provide specific development environment setup instructions for each individual component.

Build

Scale is tested and built using a combination of Travis CI and Docker Hub. All unit test execution and documentation generation are done using Travis CI. We require that any pull request fully pass unit test checks prior to being merged. Docker Hub builds are saved to x.x.x-snapshot image tags between releases and on release tags are matched to release version.

A new release can be cut using the generate-release.sh shell script from a cloned Scale repository (where numbers refer to MAJOR MINOR PATCH versions respectively):

./generate-release.sh 4 0 0 

There is no direct connection between the Travis CI and Docker Hub builds, but both are launched via push to the GitHub repository.

Contributing

Scale was developed at the National Geospatial-Intelligence Agency (NGA). The government has "unlimited rights" and is releasing this software to increase the impact of government investments by providing developers with the opportunity to take things in new directions. The software use, modification, and distribution rights are stipulated within the Apache 2.0 license.

All pull request contributions to this project will be released under the Apache 2.0 or compatible license. Software source code previously released under an open source license and then modified by NGA staff is considered a "joint work" (see 17 USC § 101); it is partially copyrighted, partially public domain, and as a whole is protected by the copyrights of the non-government authors and must be released according to the terms of the original open source license.