FastMath - Fast Math Library for Delphi
FastMath is a Delphi math library that is optimized for fast performance (sometimes at the cost of not performing error checking or losing a little accuracy). It uses hand-optimized assembly code to achieve much better performance then the equivalent functions provided by the Delphi RTL.
This makes FastMath ideal for high-performance math-intensive applications such as multi-media applications and games. For even better performance, the library provides a variety of "approximate" functions (which all start with a
Fast-prefix). These can be very fast, but you will lose some (sometimes surprisingly little) accuracy. For gaming and animation, this loss in accuracy is usually perfectly acceptable and outweighed by the increase in speed. Don't use them for scientific calculations though...
You may want to call
DisableFloatingPointExceptions at application startup to suppress any floating-point exceptions. Instead, it will return extreme values (like Nan or Infinity) when an operation cannot be performed. If you use FastMath in multiple threads, you should call
DisableFloatingPointExceptions in the
Execute block of those threads.
Table of Contents
- Superior Performance
- Architecture and Design Decisions
- Overloaded Operators
- Interoperability with the Delphi RTL
- Directory Organization
Most operations can be performed on both singular values (scalars) as well as vectors (consisting of 2, 3 or 4 values). SIMD optimized assembly code is used to calculate multiple outputs at the same time. For example, adding two 4-value vectors together is almost as fast as adding two single values together, resulting in a 4-fold speed increase. Many functions are written in such a way that the performance is even better.
Here are some examples of speed up factors you can expect on different platforms:
|TVector3D + TVector3D||TVector4 + TVector4||1.2x||1.6x||2.8x||2.5x|
|Single * TVector3D||Single * TVector4||2.2x||2.1x||5.6x||3.7x|
|TVector3D * TMatrix3D||TVector4 * TMatrix4||1.3x||4.0x||6.5x||4.2x|
|TMatrix3D * TMatrix3D||TMatrix4 * TMatrix4||2.2x||7.2x||5.4x||8.0x|
As you can see, some very common (3D) operations like matrix multiplication and inversion can be almost 10 times faster than their corresponding RTL versions. In addition, FastMath includes a number of
Fast* approximation functions that sacrifice a little accuracy for an enormous speed increase. For example, using
FastSinCos to calculate 4 sine and cosine functions in parallel can be up to 90 times faster than calling the RTL
SinCos function 4 times, while still providing excellent accuracy for angles up to +/4000 radians (or +/- 230,000 degrees).
On 32-bit and 64-bit desktop platforms (Windows and OS X), this performance is achieved by using the SSE2 instruction set. This means that the computer must support SSE2. However, since SSE2 was introduced back in 2001, the vast majority of computers in use today will support it. All 64-bit desktop computers have SSE2 support by default. However, you can always compile this library with the
FM_NOSIMD define to disable SIMD optimization and use plain Pascal versions. This can also be useful to compare the speed of the Pascal versions with the SIMD optimized versions.
On 32-bit mobile platforms (iOS and Android), the NEON instruction set is used for SIMD optimization. This means that your device needs to support NEON. But since Delphi already requires this, this poses no further restrictions.
On 64-bit mobile platforms (iOS), the Arm64/AArch64 SIMD instruction set is used.
There is no hardware accelerated support for the iOS simulator (it will use Pascal versions for all calculations).
Architecture and Design Decisions
FastMath operations on single-precision floating-point values only. Double-precision floating-point arithmetic is (currently) unsupported.
Most functions operate on single values (of type
Single) and 2-, 3- and 4-dimensional vectors (of types
TVector4 respectively). Vectors are not only used to represent points or directions in space, but can also be regarded as arrays of 2, 3 or 4 values that can be used to perform calculations in parallel. In addition to floating-point vectors, there are also vectors that operator on integer values (
There is also support for 2x2, 3x3 and 4x4 matrices (called
TMatrix4). By default, matrices are stored in row-major order, like those in the RTL's
System.Math.Vectors unit. However, you can change this layout with the
FM_COLUMN_MAJOR define. This will store matrices in column-major order instead, which is useful for OpenGL applications (which work best with this layout). In addition, this define will also clip the depth of camera matrices to -1..1 instead of the default 0..1. Again, this is more in line with the default for OpenGL applications.
For representing rotations in 3D space, there is also a
TQuaternion, which is similar to the RTL's
The operation of the library is somewhat inspired by shader languages (such as GLSL and HLSL). In those languages you can also treat single values and vectors similarly. For example, you can use the
Sin function to calculate a single sine value, but you can also use it with a
TVector4 type to calculate 4 sine values in one call. When combined with the approximate
Fast* functions, this can result in an enormous performance boost, as shown earlier.
All vector and matrix types support overloaded operators which allow you to negate, add, subtract, multiply and divide scalars, vectors and matrices.
There are also overloaded operators that compare vectors and matrices for equality. These operators check for exact matches (like Delphi's
= operator). They don't allow for very small variations (like Delphi's
The arithmetic operators
/ usually work component-wise when applied to vectors. For example if
B are of type
C := A * B will set
(A.X * B.X, A.Y * B.Y, A.Z * B.Z, A.W * B.W). It will not perform a dot or cross product (you can use the
Cross functions to compute those).
For matrices, the
- operators also operate component-wise. However, when multiplying (or dividing) matrices with vectors or other matrices, then the usual linear algebraic multiplication (or division) is used. For example:
M := M1 * M2performs a linear algebraic matrix multiplication
V := M1 * V1performs a matrix * row vector linear algebraic multiplication
V := V1 * M1performs a column vector * matrix linear algebraic multiplication
To multiply matrices component-wise, you can use the
Interoperability with the Delphi RTL
FastMath provides its own vector and matrix types for superior performance. Most of them are equivalent in functionality and data storage to the Delphi RTL types. You can typecast between them or implicitly convert from the FastMath type to the RTL type or vice versa (eg.
MyVector2 := MyPointF). The following table shows the mapping:
Documentation can be found in the HTML Help file FastMath.chm in the
Alternatively, you can read the documentation on-line.
The FastMath repository hold the following directories:
Doc: documentation in HtmlHelp format. Also contains a spreadsheet (Benchmarks.xlsx) with results of performance tests on my devices (a Core i7 desktop and iPad3).
DocSource: contains batch files for generating the documentation. You need PasDocEx to generate the documentation yourself if you want to.
FastMath: contains the main
Neslib.FastMathunit as well as various include files with processor specific optimizations, and static libraries with Arm optimized versions for iOS and Android.
Arm: Arm specific source code and scripts.
Arm32: contains the assembly source code for Arm Neon optimized functions.
Arm64: contains the assembly source code for Arm64 optimized functions.
fastmath-android: contains a batch file and helper files to build the static library for Android using the Android NDK.
fastmath-ios: contains a macOS shell script to build a universal static library for iOS.
Tests: contains a FireMonkey application that runs unit tests and performance tests.
FastMath is licensed under the Simplified BSD License. Some of its functions are based on other people's code licensed under the MIT, New BSD and ZLib licenses. Those licenses are as permissive as the Simplified BSD License used for the entire project.
See License.txt for details.