To install Nanos6 the following tools and libraries must be installed:
- automake, autoconf, libtool, make and a C and C++ compiler
- boost >= 1.59
- Finally, it's highly recommended to have a installation of Mercurium with OmpSs-2 support enabled. When installing OmpSs-2 for the first time, you can break the chicken and egg dependence between Nanos6 and Mercurium in both sides: on one hand, you can install Nanos6 without specifying a valid installation of Mercurium. On the other hand, you can install Mercurium without a valid installation of Nanos6 using the
Optional libraries and tools
In addition to the build requirements, the following libraries and tools enable additional features:
- extrae to generate execution traces for offline performance analysis with paraver
- elfutils and libunwind to generate sample-based profiling
- graphviz and pdfjam or pdfjoin from TeX to generate graphical representations of the dependency graph
- parallel to generate the graph representation in parallel
- PAPI to generate statistics that include hardware counters
- CUDA to enable CUDA tasks
Nanos6 uses the standard GNU automake and libtool toolchain. When cloning from a repository, the building environment must be prepared through the following command:
$ autoreconf -f -i -v
When the code is distributed through a tarball, it usually does not need that command.
Then execute the following commands:
$ ./configure --prefix=INSTALLATION_PREFIX ...other options... $ make all check $ make install
INSTALLATION_PREFIX is the directory into which to install Nanos6.
The configure script accepts the following options:
--with-nanos6-mercurium=prefixto specify the prefix of the Mercurium installation
--with-boostto specify the prefix of the Boost installation
--with-libunwind=prefixto specify the prefix of the libunwind installation
--with-papi=prefixto specify the prefix of the PAPI installation
--with-libnuma=prefixto specify the prefix of the numactl installation
--with-extrae=prefixto specify the prefix of the extrae installation
--enable-cudato enable support for CUDA tasks
The location of elfutils and hwloc is always retrieved through pkg-config.
The location of PAPI can also be retrieved through pkg-config if it is not specified through the
If they are installed in non-standard locations, pkg-config can be told where to find them through the
PKG_CONFIG_PATH environment variable.
$ export PKG_CONFIG_PATH=$HOME/installations-mn4/elfutils-0.169/lib/pkgconfig:/apps/HWLOC/2.0.0/INTEL/lib/pkgconfig:$PKG_CONFIG_PATH
After Nanos6 has been installed, it can be used by compiling your C, C++ and Fortran codes with Mercurium using the
$ mcc -c --ompss-2 a_part_in_c.c $ mcxx -c --ompss-2 a_part_in_c_plus_plus.cxx $ mcxx --ompss-2 a_part_in_c.o a_part_in_c_plus_plus.o -o app
Nanos6 applications can be executed as is.
The number of cores that are used is controlled by running the application through the
$ taskset -c 0-2,4 ./app
app on cores 0, 1, 2 and 4.
Tracing, debugging and other options
Nanos6 applications, unlike Nanos++ applications do not require recompiling their code to generate extrae traces or to generate additional information. This is instead controlled through environment variables, envar from now on, at run time.
Generating extrae traces
To generate an extrae trace, run the application with the
NANOS6 envar set to
Currently there is an incompatibility when generating traces with PAPI. To solve it, define the following envar:
$ export NANOS6_EXTRAE_AS_THREADS=1
The resulting trace will show the activity of the actual threads instead of the activity at each CPU. In the future, this problem will be fixed.
Generating a graphical representation of the dependency graph
To generate the graph, run the application with the
NANOS6 envar set to
By default, the graph nodes include the full path of the source code.
To remove the directories, set the
NANOS6_GRAPH_SHORTEN_FILENAMES envar to
The resulting file is a PDF that contains several pages.
Each page represents the graph at a given point in time.
NANOS6_GRAPH_SHOW_DEAD_DEPENDENCIES envar to
1 forces future and previous dependencies to be shown with different graphical attributes.
NANOS6_GRAPH_DISPLAY envar, if set to
1, will make the resulting PDF to be opened automatically.
The default viewer is
xdg-open, but it can be overridden through the
For best results, we suggest to display the PDF with "single page" view, showing a full page and to advance page by page.
To enable verbose logging, run the application with the
NANOS6 envar set to
By default it generates a lot of information.
This is controlled by the
NANOS6_VERBOSE envar, which can contain a comma separated list of areas.
The areas are the following:
|DependenciesByAccess||Dependencies by their accesses|
|DependenciesByAccessLinks||Dependencies by the links between the accesses to the same data|
|DependenciesByGroup||Dependencies by groups of tasks that determine common predecessors and common successors|
|TaskStatus||Task status transitions|
|TaskWait||Entering and exiting taskwaits|
|ThreadManagement||Thread creation, activation and suspension|
|UserMutex||User-side mutexes (critical)|
The case is ignored, and the
all keyword enables all of them.
Additionally, and area can have the
! prepended to it to disable it.
NANOS6_VERBOSE=AddTask,TaskExecution,TaskWait is a good starting point.
By default, the output is emitted to standard error, but it can be sent to a file by specifying it through the
NANOS6_VERBOSE_DUMP_ONLY_ON_EXIT can be set to
1 to delay the output to the end of the program to avoid getting it mixed with the output of the program.
To enable sample-based profiling, run the application with the
NANOS6 envar set to
In this mode, the runtime records backtraces of the threads up to a given depth and with a given frequency. These parameters can be set through the following envars:
|NANOS6_PROFILE_NS_RESOLUTION||1000||Sampling interval in nanoseconds|
|NANOS6_PROFILE_BACKTRACE_DEPTH||4||Number of stack frames to collect (excluding inlines) in each sample.|
|NANOS6_PROFILE_BUFFER_SIZE||1000000000||Number of sampling events to preallocate together in a chunk. The default value corresponds to 1 second of samples.|
At the end of the execution, the runtime generates four files that contain entries sorted by decreasing frequency. Their first column contains the sample count, and the rest, the actual entry values. Their contents are the following:
line-profile-PID.txt: Source code lines
function-profile-PID.txt: Function names
inline-profile-PID.txt: Function names and source code lines including inlines
Since the sampling is performed over the return addresses in the stack, if the compiler performs inlining, a given address can correspond to several functions. This file shows for the number of samples that have the same associated source code lines.
backtrace-profile-by-line-PID.txt: Function names and source code lines including inlines of a full backtrace
Shows the number of samples that have a full backtrace that corresponds to the same exact source code lines.
backtrace-profile-by-address-PID.txt: Function names and source code lines including inlines of a full backtrace
Shows the number of samples that have a full backtrace with the same exact return addresses.
When compiling, Mercurium performs transformations to the original source code.
At this time, Mercurium cannot preserve the original source code lines and function names.
Hence, the outputs of the profiler are based on the transformed code.
However, the transformed source code can be preserved by passing the
-keep parameter to Mercurium.
Mercurium generates additional functions that wrap the task code.
These appear in the backtraces and their names begin with
nanos6_unpack_ and are followed by a number.
To enable collecting statistics, run the application with the
NANOS6 envar set to either
The first collects timing statistics and the second also records hardware counters.
By default, the statistics are emitted standard error when the program ends.
The output can be sent to a file through the
The contents of the output contains the average for each task type and the total task average of the following metrics:
- Number of instances
- Mean instantiation time
- Mean pending time (not ready due to dependencies)
- Mean ready time
- Mean execution time
- Mean blocked time (due to a critical or a taskwait)
- Mean zombie time (finished but not yet destroyed)
- Mean lifetime (time between creation and destruction)
The output also contains information about:
- Number of CPUs
- Total number of threads
- Mean threads per CPU
- Mean tasks per thread
- Mean thread lifetime
- Mean thread running time
Most codes consist of an initialization phase, a calculation phase and final phase for verification or writing the results. Usually these phases are separated by a taskwait. The runtime uses the taskwaits at the outermost level to identify phases and will emit individual metrics for each phase.
By default, the runtime is optimized for speed and will assume that the application code is correct.
Hence, it will not perform most validity checks.
To enable validity checks, run the application with the
NANOS6 envar set to
This will enable many internal validity checks that may be violated with the application code is incorrect.
In the future we may include a validation mode that will perform extensive application code validation.
To debug an application with a regular debugger, please compile its code with the regular debugging flags and also the
This flag will force Mercurium to dump the transformed code in the local file system, so that it will be available for the debugger.
To debug dependencies, it is advised to reduce the problem size so that very few tasks trigger the problem, and then use let the runtime make a graphical representation of the dependency graph as shown previously.
NANOS6 envar involves selecting at run time a runtime compiled for the corresponding instrumentation.
This part of the bootstrap is performed by a component of the runtime called "loader.
To debug problems due to the installation, run the application with the
NANOS6_LOADER_VERBOSE environment variable set to any value.
Information about the runtime may be obtained by running the application with the
NANOS6_REPORT_PREFIX envar set, or by invoking the following command:
$ nanos6-info --runtime-details Runtime path /opt/nanos6/lib/libnanos6-optimized.so.0.0.0 Runtime Version 2017-11-07 09:26:03 +0100 5cb1900 Runtime Branch master Runtime Compiler Version g++ (Debian 7.2.0-12) 7.2.1 20171025 Runtime Compiler Flags -DNDEBUG -Wall -Wextra -Wdisabled-optimization -Wshadow -fvisibility=hidden -O3 -flto Initial CPU List 0-3 NUMA Node 0 CPU List 0-3 Scheduler priority Dependency Implementation linear-regions-fragmented Threading Model pthreads
NANOS6_REPORT_PREFIX envar may contain a string that will be prepended to each line.
For instance, it can contain a sequence that starts a comment in the output of the program.
$ NANOS6_REPORT_PREFIX="#" ./app Some application output ... # string version 2017-11-07 09:26:03 +0100 5cb1900 Runtime Version # string branch master Runtime Branch # string compiler_version g++ (Debian 7.2.0-12) 7.2.1 20171025 Runtime Compiler Version # string compiler_flags -DNDEBUG -Wall -Wextra -Wdisabled-optimization -Wshadow -fvisibility=hidden -O3 -flto Runtime Compiler Flags # string initial_cpu_list 0-3 Initial CPU List # string numa_node_0_cpu_list 0-3 NUMA Node 0 CPU List # string scheduler priority Scheduler # string dependency_implementation linear-regions-fragmented Dependency Implementation # string threading_model pthreads Threading Model