HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics.
C++ ECL XSLT JavaScript CMake Yacc Other
Switch branches/tags
community_7.0.0-rc1 community_7.0.0-beta2 community_7.0.0-beta1 community_7.0.0-beta1rc2 community_7.0.0-beta1rc1 community_6.4.26-rc1 community_6.4.24-rc1 community_6.4.24-1 community_6.4.22-rc2 community_6.4.22-rc1 community_6.4.22-1 community_6.4.20-rc1 community_6.4.20-1 community_6.4.18-rc1 community_6.4.18-1 community_6.4.16-rc2 community_6.4.16-rc1 community_6.4.16-1 community_6.4.14-rc1 community_6.4.14-1 community_6.4.12-rc2 community_6.4.12-rc1 community_6.4.12-1 community_6.4.10-rc1 community_6.4.10-1 community_6.4.8-rc2 community_6.4.8-rc1 community_6.4.8-1 community_6.4.6-rc4 community_6.4.6-rc3 community_6.4.6-rc2 community_6.4.6-rc1 community_6.4.6-1 community_6.4.4-rc2 community_6.4.4-rc1 community_6.4.4-1 community_6.4.2-rc3 community_6.4.2-rc2 community_6.4.2-rc1 community_6.4.2-1 community_6.4.0-rc6 community_6.4.0-rc5 community_6.4.0-rc4 community_6.4.0-rc3 community_6.4.0-rc2 community_6.4.0-rc1 community_6.4.0-1 community_6.2.30-rc1 community_6.2.30-1 community_6.2.28-rc1 community_6.2.28-1 community_6.2.26-rc1 community_6.2.26-1 community_6.2.24-rc1 community_6.2.24-1 community_6.2.22-rc2 community_6.2.22-rc1 community_6.2.22-1 community_6.2.20-rc1 community_6.2.20-1 community_6.2.18-rc1 community_6.2.18-1 community_6.2.16-rc1 community_6.2.16-1 community_6.2.14-rc2 community_6.2.14-rc1 community_6.2.14-1 community_6.2.12-rc2 community_6.2.12-rc1 community_6.2.12-1 community_6.2.10-rc3 community_6.2.10-rc2 community_6.2.10-rc1 community_6.2.10-1 community_6.2.8-rc3 community_6.2.8-rc2 community_6.2.8-rc1 community_6.2.8-1 community_6.2.6-rc2 community_6.2.6-rc1 community_6.2.6-1 community_6.2.4-1 community_6.2.2-rc3 community_6.2.2-rc2 community_6.2.2-rc1 community_6.2.2-1 community_6.2.0-rc7 community_6.2.0-rc6 community_6.2.0-rc5 community_6.2.0-rc4 community_6.2.0-rc3 community_6.2.0-rc2 community_6.2.0-rc1 community_6.2.0-2 community_6.2.0-1 community_6.0.12-rc3 community_6.0.12-rc2 community_6.0.12-rc1 community_6.0.12-1 community_6.0.10-rc2
Nothing to show
Clone or download
ghalliday Community Edition 7.0.0-rc1 Release Candidate 1
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Latest commit 88eda09 Aug 10, 2018
Permalink
Failed to load latest commit information.
.github HPCC-18020 Add new option to pull request template Jul 17, 2017
build_utils HPCC-13448 Source Code needs Marca Registrada next to HPCC Systems® Aug 4, 2015
charm HPCC-11289 Add README file for HPCC Juju Charm Development Jan 22, 2015
clienttools HPCC-17408 Add Clienttools bin to Windows PATH variable Jul 31, 2018
cmake_modules HPCC-20168 Include Spark conditionally into platform package Jul 21, 2018
common Merge pull request #11542 from ghalliday/issue20252 Aug 9, 2018
configuration HPCC-20200 Improve attribute group processing Aug 6, 2018
dali HPCC-20284 Dali calls to get scope permissions can fail Aug 10, 2018
deploy HPCC-19467 Fix problems with windows stand alone compiling May 29, 2018
deployment HPCC-16183 Output slave port and slaves per node in configgen Jul 25, 2018
devdoc HPCC-20089 Rationalize and consolidate the developer documentation Jul 18, 2018
docs Merge pull request #11549 from g-pan/H20262-init Aug 9, 2018
ecl Merge pull request #11559 from shamser/issue19175b Aug 9, 2018
ecllibrary HPCC-20259 Add Std.Date.TimestampToString function Aug 8, 2018
esp Merge branch 'candidate-6.4.26' Aug 10, 2018
githooks Merge remote-tracking branch 'origin/candidate-3.10.x' Dec 20, 2012
initfiles HPCC-20195 Roxie MySQL cache options for ConfigManager 2 Aug 6, 2018
lib2 HPCC-15414 clean lib2 for lib name changes on Mac OS Jan 25, 2017
misc HPCC-9508 Add eclipse code layout settings file to project Jun 19, 2013
package HPCC-16491 Work-around CMake productbuild packaging issue Oct 26, 2016
plugins HPCC-20220 Dali cores when getPermissions called internally Aug 7, 2018
roxie Merge branch 'candidate-6.4.26' Aug 10, 2018
rtl HPCC-19763 Add activity level control over the translation mode of a … Aug 3, 2018
services HPCC-18585 Replace uses of rand() with fastRand() Oct 23, 2017
spark HPCC-20168 Include Spark conditionally into platform package Jul 21, 2018
system HPCC-20284 Dali calls to get scope permissions can fail Aug 10, 2018
testing HPCC-20252 refactor index read generation to allow remote projection Aug 9, 2018
thorlcr HPCC-20185 Ensure loop again flag is synchronized before reading Jul 30, 2018
tools HPCC-18183 Direct ESDL.exe errors to stderr Jul 19, 2018
.gitattributes HPCC-17425 Various fixes for running HPCC in windows linux subsystem May 4, 2017
.gitignore HPCC-17851 New config manager core library Feb 1, 2018
.gitmodules HPCC-18512 Update ECL Watch stats to use WebPack Jan 2, 2018
.travis.yml HPCC-18512 Switch to WebPack for ECL Watch build Oct 24, 2017
BUILD_ME.md HPCC-19312 Update to latest cassandra driver Mar 16, 2018
CMakeLists.txt HPCC-20168 Include Spark conditionally into platform package Jul 21, 2018
CNAME Add CNAME entry for GitHub pages redirection Aug 23, 2011
CONTRIBUTORS HPCC-16014 Contributors file needs some refreshing Sep 6, 2016
FUTURE Initial version of FUTURE document Sep 14, 2011
LICENSE.txt HPCC-13448 Source Code needs Marca Registrada next to HPCC Systems® Aug 4, 2015
R-LICENSE.txt HPCC-14457 Split R plugin to its own package Dec 15, 2015
README.md Merge branch 'candidate-6.4.0' Jun 29, 2017
VERSIONS Preparation for 6.0.0-beta1 release Sep 22, 2015
baseaddr.txt HPCC-13448 Source Code needs Marca Registrada next to HPCC Systems® Aug 4, 2015
build-config.h.cmake HPCC-9902 Use the build version as the ecl version reported by eclcc Sep 3, 2013
cmake_uninstall.cmake.in HPCC-15142 Minimal changes needed for DESTDIR Aug 10, 2016
package-lock.json HPCC-19093 Update to latest hpcc-js Mar 15, 2018
version.cmake Community Edition 7.0.0-rc1 Release Candidate 1 Aug 10, 2018

README.md

Description / Rationale

HPCC Systems offers an enterprise ready, open source supercomputing platform to solve big data problems. As compared to Hadoop, the platform offers analysis of big data using less code and less nodes for greater efficiencies and offers a single programming language, a single platform and a single architecture for efficient processing. HPCC Systems is a technology division of LexisNexis Risk Solutions.

Getting Started

Architecture

The HPCC Systems architecture incorporates the Thor and Roxie clusters as well as common middleware components, an external communications layer, client interfaces which provide both end-user services and system management tools, and auxiliary components to support monitoring and to facilitate loading and storing of filesystem data from external sources. An HPCC environment can include only Thor clusters, or both Thor and Roxie clusters. Each of these cluster types is described in more detail in the following sections below the architecture diagram.

Thor

Thor (the Data Refinery Cluster) is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. A cluster can scale from a single node to thousands of nodes.

  • Single-threaded
  • Distributed parallel processing
  • Distributed file system
  • Powerful parallel processing programming language (ECL)
  • Optimized for Extraction, Transformation, Loading, Sorting, Indexing and Linking
  • Scales from 1-1000s of nodes

Roxie

Roxie (the Query Cluster) provides separate high-performance online query processing and data warehouse capabilities. Roxie (Rapid Online XML Inquiry Engine) is the data delivery engine used in HPCC to serve data quickly and can support many thousands of requests per node per second.

  • Multi-threaded
  • Distributed parallel processing
  • Distributed file system
  • Powerful parallel processing programming language (ECL)
  • Optimized for concurrent query processing
  • Scales from 1-1000s of nodes

ECL

ECL (Enterprise Control Language) is the powerful programming language that is ideally suited for the manipulation of Big Data.

  • Transparent and implicitly parallel programming language
  • Non-procedural and dataflow oriented
  • Modular, reusable, extensible syntax
  • Combines data representation and algorithm implementation
  • Easily extend using C++ libraries
  • ECL is compiled into optimized C++

ECL IDE

ECL IDE is a modern IDE used to code, debug and monitor ECL programs.

  • Access to shared source code repositories
  • Complete development, debugging and testing environment for developing ECL dataflow programs
  • Access to the ECLWatch tool is built-in, allowing developers to watch job graphs as they are executing
  • Access to current and historical job workunits

ESP

ESP (Enterprise Services Platform) provides an easy to use interface to access ECL queries using XML, HTTP, SOAP and REST.

  • Standards-based interface to access ECL functions

Developer documentation

The following links describe the structure of the system and detail some of the key components: