HTTPS clone URL
Subversion checkout URL
Scalable dynamic library and python loading in HPC environments
C Shell C++
Fetching latest commit...
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
============================================================================= == SPINDLE: Scalable Parallel Input Network for Dynamic Load Environments == ============================================================================= Authors: SPINDLE: W.Frings <W.Frings at fz-juelich dot de> Matthew LeGendre (legendre1 at llnl dot gov) COBO: Adam Moody <moody20 at llnl dot gov> Version: 0.7.2 (May 2013) Summary: =========== Spindle is a tool for improving the performance of dynamic library and python loading in HPC enviornments. Documentation: ============ https://scalability.llnl.gov/spindle/ Introduction: ============ Dynamic linking and loading are widely used in High Performance Computing (HPC). As HPC applications are becoming increasingly complex, developers leverage these techniques that are well-suited for large-scale software development through support for modularization and computational steering. But the growing trend towards high numbers of dynamically linked libraries (DLLs) within applications poses greater challenges to application start-up. During start-up at scale, a large number of processes simultaneously access many DLLs, which creates file access storms that overwhelm shared file systems and acts like site-wide denial-of-service attacks. The Scalable Parallel Input Network for Dynamic Loading Environment (SPINDLE) presents a novel approach that provides a scalable and transparent dynamic linking and loading environment. SPINDLE extends the stock dynamic loader (e.g., ld.so) with a scalable file cache server overlay network. The network forms a forest topology that allows SPINDLE to use a varying number of communication trees in the forest, thereby only the roots of the trees perform file operations of dynamic loading and scalably propagate the results to all other cache servers. The cached results are then used by SPINDLE clients that intercept and re-route file operations of the dynamic loaders. Building on the existing loader auditing interface, system requires no modifications to either the loader or the application. Scalability: ------------ Preliminary results on LLNL Linux clusters indicate that SPINDLE prototype improves the performance of Pynamic, a benchmark that stresses the dynamic loading system, by a factor of 3.5 over the traditional approach at 768 MPI processes, the scale at which we could run the traditional scheme without significantly affecting other jobs. Further this benchmark under SPINDLE support scales well on LLNL clusters without pounding on the shared resources up to 8400 processes. Directories: ============ ./auditclient --> Implementation of Loader Auditing Interface for SPINDLE, Sample client applications ./auditserver --> Implementation of SPINDLE server, High level communication layer (cobo, msocket) ./beboot --> Implementation of bootstrapper, which relocates executables and assists in Spindle startup ./cache --> Internal data structure for file meta data (cache) ./client_comlib --> Client-side of client/server communications infrastructure ./comlib --> Server-side of client/server communications infrastructure ./launchmon --> LaunchMON start-up command and BE-server ./logging --> Logging deamon that collects debug logs from clients and servers ./m4 --> Autoconf helper routines ./scripts --> Autoconf helper scripts ./testsuite --> A testsutie for ./tools/cobo/src --> cobo src used by one of the high level SPINDLE comm. layers ./tools/sion_debug --> Implements DPRINTF interface for selective printf-debugging from SIONlib package Compilation: ============ 1) Run spindle/configure from a build directory 2) 'make install' from that build directory Usage: ====== spindle <job launch command> e.g, spindle mpirun -n 128 mpi_hello_world