OVIS High Performance Computing monitoring, analysis, and visualization project
C Python Roff Shell M4 Perl Other
Latest commit ee1a127 Jul 13, 2018
Permalink
Failed to load latest commit information.
0_AVOID_GITHUB_ZIP_FILES Github user interface tip for the unwary. Dec 8, 2016
automake backport cray's relocation support build changes to fix bugs in toss … Apr 16, 2018
baler Fix bquery code that passes pos as * instead of value Jul 31, 2017
config update rpath macro to latest Apr 16, 2018
gpcd-support @ 44f97c9 update gpcd-support tag Nov 29, 2016
helper-scripts helper-script: Fix a typo in ldmsd_sampler_services Dec 18, 2014
kernel/kldms kldms: Fix the calculation of transaction duration Jan 5, 2017
komondor Py27/lanl packaging and updates for rhine/redwood. Nov 6, 2015
ldms Ver 3.4.7 Jul 12, 2018
lib add mm_stat feature to help users discover -m. May 2, 2018
m4 backport cray's relocation support build changes to fix bugs in toss … Apr 16, 2018
me Add environment variable substitution to the avl Mar 20, 2017
ocm Remove procstatutil obsoleted by procstat sampler. Dec 9, 2016
packaging fix missing prefix in printing, and turn store_csv noise. Jul 10, 2018
sos @ ebf1736 sos submodule downgrade Feb 7, 2018
util octopus file Jul 10, 2018
.gitignore backport gitignore for pyc, swp Apr 16, 2018
.gitmodules sos submodule update Mar 29, 2017
INSTALL port v2 build system, less man, to v3. Oct 20, 2015
LICENSE license file fixed Oct 31, 2016
Makefile.am backport cray's relocation support build changes to fix bugs in toss … Apr 16, 2018
README.libevent2 port v2 build system, less man, to v3. Oct 20, 2015
README.md ovis: Update the dependency list in README.md file Nov 3, 2016
autogen.sh 2013-10-30 Benjamin Allan <baallan@sandia.gov> Nov 20, 2013
configure.ac Ver 3.4.7 Jul 12, 2018

README.md

Introduction

OVIS is a modular system for HPC data collection, transport, storage, log message exploration, and visualization as well as analysis. OVIS 3.3.0 comes with LDMS, Baler, and SOS as a git submodule.

Lightweight Distributed Metric Service (LDMS)

LDMS is a low-overhead, low-latency framework for collecting, transfering, and storing metric data on a large distributed computer system. The framework includes

  • a public API with a reference implementation,
  • tools for collecting, aggregating, transporting, and storing metric values,
  • collectors for several common types of metrics.
  • Data transport over socket, RDMA (IB/iWarp/RoCE), and Cray Gemini as well as Aries.

The API provides a way for vendors to expose system information in a uniform manner without being required to provide source code for accessing the information (although we advise it be included) which might reveal proprietary methods or information.

Metric information can be updated by a kernel module which runs only when applications yield the processor and transported using RDMA-like operations, resulting in minimal jitter during collection. LDMS has been run on 10,000 cores collecting over 100,000 metric values per second with less than 0.2% overhead.

Scalable Storage System (SOS)

SOS is a high-performance, indexed, object-oriented database designed to efficiently manage structured data on persistent media. More information can be found at the SOS GitHub website https://github.com/opengridcomputing/SOS.

There is no need to clone the SOS project separatedly. It is advised to install SOS from the git submodule under OVIS to ensure the compatility with OVIS v3.3.0. See more info in the Obtaining OVIS source code.

Baler

Baler is an aggregation of log message exploration and analysis tools. balerd is the tool that tokenizes log messages using user-specified dictionaries. The log messages are then groupped together according to their token sequences. Each group is represented by a Baler pattern -- a token sequence.

Baler stores the log message patterns, the raw log messages, and other infomation in its database. bquery is a tool to query Baler database for the Baler patterns, raw log messages, and message statistics by hosts and/or time.

Baler also comes with an association mining tool -- bassoc -- that can be used to discover sequence of occurrence patterns of log messages and to perform causal analysis.

Obtaining OVIS source code

You may clone OVIS source (and its submodules) from the official Git repository:

git clone https://github.com/ovis-hpc/ovis.git
cd ovis
# If you are interested in storing LDMS data in the SOS storage
# or you are interested in using Baler,
# please perform the last two steps.
git submodule init sos
git submodule update sos

Dependencies

  • autoconf (>=2.63), automake, libtool
  • glib2
  • libreadline
  • libevent2 (>=2.0.21)
    • For recent Ubuntu and CentOS 7, libevent2 can be installed from the central repo.
    • If you want to install from source, please find it here. http://libevent.org/
  • openssl Development library for OVIS, LDMS Authentication
  • For LDMS and Baler Python Interface:
    • Python-2.7.
    • swig
  • For Baler bclient Python Interface:
  • doxygen if you want to build OVIS documentation.
  • Some LDMS plug-ins have dependency on additional libraries. For cray-related LDMS sampler plug-in dependencies, please see the man page of the plug-in in ldms/man/.

Building OVIS

At the OVIS top directory,

	./autogen.sh
	mkdir <build directory>
	cd <build directory>
	../configure --prefix=<installed path> [options]
	make
	make install

To build sos and baler, --enable-sos and --enable-baler, respectively, must be given at the configure line. Note that baler has dependency on sos.

Supported hardware

  • Ubuntu and friends
  • CentOS and friends
  • Cray XE6, Cray XK, Cray XC

Unsupported features

LDMS sampler plugins

  • perfevent sampler
  • papi sampler
  • hadoop sampler
  • switchx