Skip to content

C library for the emulation of reduced-precision floating point types

License

Notifications You must be signed in to change notification settings

srcarroll/flexfloat

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlexFloat

FlexFloat is a C library for the emulation of reduced-precision floating point types.

Building FlexFloat

Required packages:

  • CMake 3.1 or higher
  • GCC 7.1 or higher

To build the library:

  • Create a directory (denoted as "<build_dir>" in this document) where you want to put the generated Makefiles, project files as well the object files and output binaries and enter this location. For example: cd flexfloat & mkdir build

  • Run cmake [<optional configuration parameters>] <path to the FlexFloat source directory> (from "<build_dir>") For example: cd build && cmake .. [optional] Configuration parameters are:

    • -DCMAKE_BUILD_TYPE=Release\Debug (default: Release) - The release mode compiles sources with "-O3 -DNDEBUG" flags, while the debug mode uses "-O0 -g3" flags.
    • -DBUILD_TESTS=ON/OFF (default: ON) - Enable unit testing of FlexFloat
    • -DBUILD_EXAMPLES=ON/OFF (default: ON) - Build usage examples
    • -DDISABLE_ROUNDING=ON/OFF (default: OFF) - Disable the library support to IEEE rounding modes (truncation is always applied)
    • -DSINGLE_BACKEND=ON/OFF (default: OFF) - Use single-precision type (float) as a backend type instead of double precision
    • -DQUAD_BACKEND=ON/OFF (default: OFF) - Use quad-precision type (_Float128) as a backend type instead of float or double precision
    • -DENABLE_FLAGS=ON/OFF (default: OFF) - Enable support for floating-point exception flags
    • -DENABLE_STATS=ON/OFF (default: OFF) - Enable collection of statistics
    • -DENABLE_TRACKING=ON/OFF (default: OFF) - Enable track of error accumulation on program variables
  • In the "<build_dir>" directory execute make

  • [optional] To executes the library tests, execute make test (note that this feature requires to enable the unit testing feature of the library)

Base usage

To replace a floating-point type with a reduced-precision one, the native types used in the program must be replaced with flexfloat_t. Before its first use each FlexFloat variable must be given an initial value for exponent and mantissa bit-widths (two unsigned integers) by invoking ff_init (e.g., 5 bits for the exponent and 10 bits for the mantissa characterize the IEEE 754 half-precision format. Users can also (optionally) specify an initialization value expressed as a native C type using ff_init_float or ff_init_double. Since an initialization value might not be exactly representable in a target type with a lower number of bits, it is typically rounded to its nearest representable value using the current rounding mode. The FlexFloat API includes a set of functions to perform arithmetic operations involving operands of the same floating-point type, such as ff_add and ff_mul. Arithmetic perations in the original source code must be replaced by function calls that implement equivalent functionality on top of the emulated types. See "flexfloat.h" for further details on the API.

The C++ wrapper provides a generic floating-point type by defining a template class (flexfloat<e,m>, e and m are exponent and mantissa bit-widths) and a set of auxiliary functions (useful for debugging and collecting statistics). This only requires users to replace original variable declarations with instantiations of this template class. No other part of the program needs modification since class methods include operator overloading. See "flexfloat.hpp" for further details on the class methods.

Examples for the C API and for the C++ wrapper are provided in the "examples" folder (after building the executables are available in "<build_dir>/examples").

Advanced features

Flexfloat includes basic support for floating-point exception flags. If enabled, operations on FlexFloat variables will raise floating-point exception flags within the floating-point environment. One notable limitation of this feature is that the overflow exception (FE_OVERFLOW) is no longer raised on infinity results after a divide-by-zero condition is detected (FE_DIVBYZERO). Flags can be cleared using feclearexcept(FE_ALL_EXCEPT) to restore correct overflow detection until the next singularity division.

FlexFloat allows a complete set of execution statistics related to FP types to be collected. The library API includes functions to start, stop and reset the collection of statistics, namely ff_start_stats, ff_stop_stats and ff_clear_stats. A report is generated by calling ff_print_stats, and it includes the number of arithmetic operations (grouped by operator name) and the number of casts (grouped by source+destination type pairs). This feature allows the evaluation of the overhead due to the casts that have been introduced in a transprecision scenario, where the type of computations can be assigned at a very fine grain level. An example of this feature can be found at “examples/example_stats.c”.

FlexFloat also provides an advanced feature to keep track of error accumulation. Activating this feature, library adopters (programmers or automatic tools) can retrieve the exact value of a computation stored in a variable (calling ff_track_get_exact) or its current error w.r.t. the exact value (calling ff_track_get_error) at any point of the program. In addition users can add a callback to a program variable (calling ff_track_callback), that is a function invoked at any update of the variable. This feature can be useful for different purposes, for instance it can be used to track which internal expression has more impact on the result quality or to study the evolution of the error over time. An example of variable tracking can be found at “examples/example_tracking.c”.

About

C library for the emulation of reduced-precision floating point types

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • C 67.6%
  • C++ 27.9%
  • CMake 4.5%