Skip to content

Releases: ovis-hpc/ovis

OVIS-4.4.2

23 Feb 20:28
Compare
Choose a tag to compare
This is release OVIS-4.4.2

The principal improvements in this release are as follows:

- Enhancements needed for production convenience
- New slingshot switch samplers designed to run on the slingshot switch
- Config changes in support of slingshot_metrics
- Fix mis-sizing of string in jbuf implementation
- Invalidate RDMA memory descriptors on set delete
- Remove memory leaks in the as_is decomp plugin
- Fix segfault caused by attempting to flush syslog
- Fix handling of DCGM string fields
- Fix `zap_sock` rejected endpoint leak

OVIS-4.4.1

10 Jan 23:02
Compare
Choose a tag to compare
OVIS Release 4.4.1

Release OVIS-4.3.11

10 Apr 14:19
Compare
Choose a tag to compare

What's Changed

  • Combined remote configuration into a single Python3 module called ldmsd_communicator
  • Disabled support for multiple lists in store_sos & store_csv when decomposition is not used
  • Added linux_proc_sampler streams store for SOS
  • Fixed wildcard address handling in ldms_xprt_listen_by_name()
  • Added a global message logging library (ovis_log) to incrementally replace message log pointers
  • Moved all sampler plugins to their own sub-directory
  • Deprecated asynchronous sampling mode
  • Added missing ldms_list_tail API
  • Added LDMS_V_TIMESTAMP accessor functions
  • Added support for the Slurm2 sampler in sampler_base
  • Made metric types signed in the lustre_mdc sampler
  • Added an Avro-Kafka Store to republish metric set data as Avro encoded Kafka messages
  • Added ovis_log messages to the LDMS authentication plugins
  • Added user_debug option to control job logging when debugging the slurm_notifier

Release OVIS-4.3.10

29 Jan 18:23
Compare
Choose a tag to compare
LDMS Release 4.3.10

LDMS 4.3.10 remains binary compatible with OVIS 4.3 releases back to
OVIS 4.3.3

If you are storing data from the the new samplers that use lists
and records, please do so with a storage decomposition configuration.
Storing these metric sets (procstat2, procnet2, slurm2, ...) without
a decomposition configuration can lead to confusing results.

There are 78 commits in this release that add new samplers and
include many scalability and resiliency improvements.

Release OVIS-4.3.9

29 Sep 21:18
Compare
Choose a tag to compare

Welcome to the long awaited OVIS 4.3.9

OVIS 4.3.9 includes very exciting new features including:

  • Variable length metric values
    ** lists
    ** records
  • Automatic scaling of I/O threads based on demand
  • Decomposition of LDMS metric set data into multiple storage rows based on a configurable decomposition strategy
  • ...

LDMS 4.3.9 remains binary compatible with OVIS 4.3 releases back to OVIS 4.3.3

OVIS-4.3.8

03 Feb 17:19
Compare
Choose a tag to compare
* Numerous bug fixes
* Multi-threaded low-level Zap transport event handers
* Command line option support in configuration files
* Summary set, transport, producer, and thread statistics
* Kokkos Appmon store
* Darshan store
* Non-blocking event logging
* Netlink notifier stream sampler

Release OVIS-4.3.7

19 Apr 21:03
Compare
Choose a tag to compare
This is OVIS-4.3.7 Release

New Features:
* Improved LDMSD Streams Performance
* Improved ib_verbs backward compatability
* Per-device procnet sampler
* Per-device ibmad sampler
* AMD GPU sampler
* Per-mount Lustre samplers
* Various reliability and resiliency improvements

Fixes:
* LDMSD Streams Memory Leak fixes
* Resolved confusing uGNI error messages on exit
* Fixed store rename issues in CSV store

Release OVIS-4.3.6

19 Jan 06:31
Compare
Choose a tag to compare
OVIS-4.3.6 Release tag

Features:

* prdcr_stat command to report ldmsd producer statistics
* set_stat command to report active ldmsd set counts and memory usage
* Support for multi-step slurm jobs in the PAPI sampler
  - the app_id in the metric set is now the step id.
* Partial support for multi-step slurm jobs in the Slurm sampler
  - the app_id in the metric set is now the step id.
* TimescaleDB storage plugin

Bug Fixes:

* Fix spinning IO thread bug in the socket transport
* Fix build failure for older OFA (ib_verbs) libraries
* Fix build failure for missing openssl when auth enabled
* Fix use after free bug in RBD cleanup
* Fix RBD leak in the set delete path
* Fix potential deadlock in Zap RDMA

Release OVIS-4.3.5

14 Dec 17:00
Compare
Choose a tag to compare
This is the OVIS-4.3.5 G/A Release

This release includes the following features and fixes:

* Compatability with OVIS-4.3.3 and OVIS-4.3.4
* Support for the Maestro load balancer
* Allow root user to access ldmsd configuration objects
  regardless of euid/egid of the process
* Zap socket performance improvements
* Zap fabric performance and resiliency improvements
* Zap RDMA support for OmniPath
* Zap uGNI resiliency improvements
* Fix LDMS Streams Service data loss on process exit
* Metric set permission handling improvements
* Fixes for memory leaks and uninitialized data found by
  static analysis tools
* Numerous build and packaging improvements

Release OVIS-4.3.4

07 Nov 16:07
Compare
Choose a tag to compare
This is the OVIS-4.3.4 G/A Release

Significant testing on the socket, RDMA, and uGNI transports has been
done with Socket and uGNI scaling to three levels of aggregation and
30,000 sets in the aggregate.

The RDMA transport has been tested to a few thousands of sets.

The fabric transport should be considered Alpha and is suitable
for development, but not deployment at this time.

This release includes the following new features

* LDMS Transport performance statistics (ldmsd_controller xprt_stats command)
* Zap Thread utilization tracking (ldmsd_controller thread_stats command)
* uGNI resliency improvements to aid with resource error handling
* Packaging updates and github automation to help with tarball generation and release tagging
* A reference counting service has been implemented that supports 'named references'. In debug mode (when REF_TRACK is defined), references are tracked (function name, and line number) when they are taken and when they are released, and individual reference counts are kept for each name. This makes it easier to debug reference tracking during development.
* The new ref_t reference counting mechanism has been added to struct ldms_set and struct ldms_rbuf_desc in support of a robust set-delete capability
* An "end-to-end" protocol has been added for deleting metric sets. When an ldmsd deletes a set, each peer that has a memory handle on the set is notified. The set resources are not freed until all peers acknowledge that they have received the delete notification.
* A service (zap_zerr2errno) has been added to consistently map Zap errors to Unix errno
* Updates to the lustre2_client sampler to support newer version of Lustre