Clone this wiki locally
HPC Goodies is a collection of tools created to provide simple execution of common commands on HPC and Big Data clusters.
As of the time of this writing, about a dozen of the tools/commands have been committed to this repo, but there are many many more. :-) These are scripts and commands that have been created and updated over the course of the past few years during the deployment of large HPC systems, which represent and encapsulate best practices in an easily repeatable form.
HPC Goodies includes multiple sub-packages that can be installed independently. You can find their descriptions below...
hpc-goodies-cpu Video Demo
- Installs an init script with a simple config file to dynamically control key CPU characteristics at the OS level including Turbo (on/off), Hyperthreading (on/off), active real core count, governor, max frequency, min frequency, Max C State, and C1E State (on/off).
hpc-goodies-gpfs (requires xCAT)
- gpfs_syslogging: An init script that sends a copy of GPFS log messages to syslog, but leaves the default behavior untouched (by default, they are only logged to /var/adm/ras/mmfs.log.latest).
- test_gpfs_state: Report key state for each GPFS filesystem on an arbitrary set of nodes, including accessibility and if RDMA is in use.
hpc-goodies-ib (requires xCAT)
- get_root_guids: One command capture of root GUIDs from spine for subnet manager configuration.
- get_node_guids: Produce a list of GUIDs from an arbitrary set of nodes.
- set_hca_firmware_update: Update HCA firmware across an entire cluster with one command (even w/different model HCAs).
- test_hca_state: Report state of key HCA attributes & settings across named nodes in cluster.
- test_infiniband_fabric_info: Produce P2P output describing an already installed IB fabric. Can identify which node is on which port on which switch.
- test_infiniband_route_info: Display a route map between two arbitrary nodes showing each hop.