MathPresso's Shading Language with JIT Engine for C++.
- Official Repository (kobalicek/mpsl)
- Official Blog (asmbits)
- Official Chat (gitter)
- Permissive ZLIB license
This is a WORK-IN-PROGRESS that is far from being complete. MPSL is a challenging and difficult project and I work on it mostly when I'm tired of other projects. Contact me if you found MPSL interesting and want to collaborate on its development - people joining this project are very welcome.
What is implemented:
- MPSL APIs (public) for embedding
- AST definition and source code to AST conversion (parser)
- AST-based semantic code analysis (happens after parsing)
- AST-based optimizations (constant folding and dead code elimination)
- IR concept and initial support for AST to IR mapping
What is a work-in-progress:
- AST-To-IR translation is only basic for now (doesn't implement control-flow and many operators)
- IR-based optimizations are not implemented yet
- IR-To-ASM translation is very basic and buggy
- IR is not in SSA form yet, this is to-be-researched subject atm
MPSL is a lightweight shader-like programming language written in C++. Its name is based on a sister project called MathPresso, which provided the basic idea and some building blocks. MPSL has been designed to be a safe programming language that can access CPU's SIMD capabilities through a shader-like programs compiled at runtime. The language is statically typed and allows to use up to 256-bit wide variables that map directly to CPU's SIMD registers (SSE, AVX, NEON, ...).
MPSL has been designed to be lightweight and embeddable - it doesn't depend on huge libraries like LLVM, it only uses a very lightweight library called AsmJit as a JIT backend. It implements its own abstract syntax tree (AST) and intermediate representation (IR) of the input program, and then uses an IRToAsm translator to convert IR to machine code.
Check out a working mp_tutorial.cpp to see how MPSL APIs are designed and how MPSL engine is embedded and used within an application.
MPSL is a statically typed language where variables have to be declared before they are used like in any other C-like language. Variables can be declared anywhere and can shadow other variables and/or function arguments.
Available types:
int
- 32-bit signed integerfloat
- 32-bit (single-precision) floating pointdouble
- 64-bit (double-precision) floating pointbool
- 32-bit boolean type (implicitly castable to any type)__qbool
- 64-bit boolean type (implicitly castable to any type)- All types can form a vector up to 4 elements, like
int2
,float3
, anddouble4
- 32-bit types can additionally form a vector of 8 elements, like
bool8
,int8
, andfloat8
Constants are parsed in the following way:
- If the number doesn't have any of fraction and exponent parts and it's in range [-2147483648, 2147483647] it's parsed as a 32-bit integer, otherwise it's parsed as
double
- If the number contains "f" suffix it's parsed as
float
- If the number contains "d" suffix it's parsed as
double
- TODO: MSPL should allow to customize which floating type is preferred (if
float
ordouble
).
-
Variables have to be declared before they are used:
[const] type var [= value];
- Variables declared as
const
have to be assigned immediately and cannot be changed - Expressions like
int x = x;
are not allowed (unlike C), unlessx
shadows anotherx
from outer scope - TODO: Reading a variable that was not assigned yet is undefined, should be defined
-
Typedef (aka type alias)
typedef type newtype
-
If-Else:
if (cond) { taken-body; } [else { not-taken-body; }]
-
Implicit and explicit casts:
(type)expression
is an explicit cast- Integer and float can implicitly cast to double
- Boolean types can implicitly or explicitly cast to any 32-bit or 64-bit type, converting
true
value to1
andfalse
value to0
- Scalar is implicitly promoted to vector if used with a vector type
-
Loops:
for (init; cond; iter) { body; }
do { body } while(cond);
while(cond) { body; }
-
Functions:
ret-type func([arg-type arg-name [, ...]]) { body; }
- Functions can call other functions, but they can't recurse
- Each MPSL program contains a
main() { ... }
entry-point
-
Arithmetic operators:
-(x)
- negatex + y
- addx - y
- subtractx * y
- multiplyx / y
- dividex % y
- modulo
-
Bitwise and shift operators and intrinsics:
~(x)
- bitwise NOTx & y
- bitwise ANDx | y
- bitwise ORx ^ y
- bitwise XORx >> y
- shift arithmetic rightx >>> y
- shift logical rightx << y
- shift logical leftror(x, y)
- rotate rightrol(x, y)
- rotate leftlzcnt(x)
- count leading zerospopcnt(x)
- population count (count of bits set to1
)
-
Logical operators:
!(x)
- logical NOTx && y
- logical ANDx || y
- logical OR
-
Comparison operators:
x == y
- check if equalx != y
- check if not equalx > y
- check if greater thanx >= y
- check if greater than or equalx < y
- check if lesser thanx <= y
- check if lesser than or equal
-
Assignment operators:
++x
- pre-increment--x
- pre-decrementx++
- post-incrementx--
- post-decrementx = y
- assignmentx += y
- add with assignmentx -= y
- subtract with assignmentx *= y
- multiply with assignmentx /= y
- divide with assignmentx %= y
- modulo with assignmentx &= y
- bitwise AND with assignmentx |= y
- bitwise OR with assignmentx ^= y
- bitwise XOR with assignmentx >>= y
- shift arithmetic right with assignmentx >>>= y
- shift logical right with assignmentx <<= y
- shift logical left with assignment
-
Built-in intrinsics for special number handling:
isnan(x)
- check for NaNisinf(x)
- check for infinityisfinite(x)
- check for finite numbersignbit(x)
- get a sign bitcopysign(x, y)
- copy sign
-
Built-in intrinsics for floating-point rounding:
round(x)
- round to nearest integerroundeven(x)
- round to even integertrunc(x)
- round towards zero (truncate)floor(x)
- round down (floor)ceil(x)
- round up (ceil)
-
Built-in intrinsics that map easily to CPU instructions:
abs(x)
- absolute valuefrac(x)
- extract fractionsqrt(x)
- square rootmin(x, y)
- minimum valuemax(x, y)
- maximum value
-
Other built-in intrinsics:
exp(x)
- exponentiallog(x)
- logarithm of base Elog2(x)
- logarithm of base 2log10(x)
- logarithm of base 10pow(x, y)
- powersin(x)
- sinecos(x)
- cosinetan(x)
- tangentasin(x)
- arcsineacos(x)
- arccosineatan(x)
andatan2(x, y)
- arctangent
-
Built-in DSP intrinsics (
int
andint2..8
only):vabsb(x)
- absolute value of packed bytesvabsw(x)
- absolute value of packed wordsvabsd(x)
- absolute value of packed dwordsvaddb(x, y)
- add packed bytesvaddw(x, y)
- add packed wordsvaddd(x, y)
- add packed dwordsvaddq(x, y)
- add packed qwordsvaddssb(x, y)
- add packed bytes with signed saturationvaddusb(x, y)
- add packed bytes with unsigned saturationvaddssw(x, y)
- add packed words with signed saturationvaddusw(x, y)
- add packed words with unsigned saturationvsubb(x, y)
- subtract packed bytesvsubw(x, y)
- subtract packed wordsvsubd(x, y)
- subtract packed dwordsvsubq(x, y)
- subtract packed qwordsvsubssb(x, y)
- subtract packed bytes with signed saturationvsubusb(x, y)
- subtract packed bytes with unsigned saturationvsubssw(x, y)
- Subtract packed words with signed saturationvsubusw(x, y)
- subtract packed words with unsigned saturationvmulw(x, y)
- multiply packed words (signed or unsigned)vmulhsw(x, y)
- multiply packed words and store high word of a signed resultvmulhuw(x, y)
- multiply packed words and store high word of an unsigned resultvmuld(x, y)
- multiply packed words (signed or unsigned)vminsb(x, y)
- minimum of packed bytes (signed)vminub(x, y)
- minimum of packed bytes (unsigned)vminsw(x, y)
- minimum of packed words (signed)vminuw(x, y)
- minimum of packed words (unsigned)vminsd(x, y)
- minimum of packed dwords (signed)vminud(x, y)
- minimum of packed dwords (unsigned)vmaxsb(x, y)
- maximum of packed bytes (signed)vmaxub(x, y)
- maximum of packed bytes (unsigned)vmaxsw(x, y)
- maximum of packed words (signed)vmaxuw(x, y)
- maximum of packed words (unsigned)vmaxsd(x, y)
- maximum of packed dwords (signed)vmaxud(x, y)
- maximum of packed dwords (unsigned)vsllw(x, y)
- shift left logical of packed words by scalary
vsrlw(x, y)
- shift right logical of packed words by scalary
vsraw(x, y)
- shift right arithmetic of packed words by scalary
vslld(x, y)
- shift left logical of packed dwords by scalary
vsrld(x, y)
- shift right logical of packed dwords by scalary
vsrad(x, y)
- shift right arithmetic of packed dwords by scalary
vsllq(x, y)
- shift left logical of packed qwords by scalary
vsrlq(x, y)
- shift right logical of packed qwords by scalary
vcmpeqb(x, y)
- compare packed bytes (signed) if equalvcmpeqw(x, y)
- compare packed words (signed) if equalvcmpeqd(x, y)
- compare packed dwords (signed) if equalvcmpgtb(x, y)
- compare packed bytes (signed) if greater thanvcmpgtw(x, y)
- compare packed words (signed) if greater thanvcmpgtd(x, y)
- compare packed dwords (signed) if greater than
-
Built-in special constants:
INF
- infinityNaN
- not a number
-
Built-in math constants from C's math.h:
M_E = 2.71828182845904523536
- Euler's numberM_LOG2E = 1.44269504088896340736
- log2(e)M_LOG10E = 0.434294481903251827651
- log10(e)M_LN2 = 0.693147180559945309417
- ln(2)M_LN10 = 2.30258509299404568402
- ln(10)M_PI = 3.14159265358979323846
- PIM_PI_2 = 1.57079632679489661923
- PI/2M_PI_4 = 0.785398163397448309616
- PI/4M_1_PI = 0.318309886183790671538
- 1/PIM_2_PI = 0.636619772367581343076
- 2/PIM_2_SQRTPI = 1.1283791670955125739
- 2/sqrt(pi)M_SQRT2 = 1.4142135623730950488
- sqrt(2)M_SQRT1_2 = 0.707106781186547524401
- 1/sqrt(2)
A typical MPSL program has an entry-point called main()
and uses input and output variables provided by the embedder. If a
and b
variables of type float4
are provided then we can write a simple shader that returns their sum:
// Shader's entry-point - can have return value (if embedded defines it), but has no arguments.
float4 main() {
// In case the embedder provides two arguments `a` and `b` of `float4` type.
return a + b;
}
Where a
, b
, and a hidden return variable are provided by the embedder (including their types). This means that the same MPSL program is able to use different data layouts and different number of data arguments passed to the compiled shader.
MPSL is written in C++ and provides C++ APIs for embedders. Here is a summary of MPSL's design choices:
- MPSL is written in C++ and exposes a simple C++ interface for embedders. It's possible to wrap it in a pure C interface, but it's not planned to be part of a MPSL project at the moment.
- MPSL uses error codes, not exceptions, and guarantees that every failure is propagated to the embedder as an error code. MPSL never aborts on out-of-memory condition, never throws, and clean ups all resources in case of error properly.
- MPSL has its own pooled memory allocator, which uses the OS allocator to allocate larger blocks of memory, which are then split into smaller chunks and pools. It's very fast and prevents memory fragmentation.
- MPSL allows embedder to specify a data-layout of his own structures, which means that embedders generally don't have to change their data structures to use MPSL.
To use MPSL from your C++ code you must first include mpsl/mpsl.h
to make all public APIs available within mpsl
namespace. The following concepts are provided:
-
mpsl::Context
- This is an expression's environment. A program cannot be created without having aContext
associated.Context
also manages virtual memory that is used by shaders. -
mpsl::Program[1-4]<>
- Program represents a compiled MPSL shader. The[1-4]
is a number of data arguments passed to the shader. Here the data argument doesn't represent variables used in a shader, it represents number of "pointers" passed to the shader, where each pointer can contain variables the shader has access to. -
mpsl::Layout
- A layout of data arguments passed to the shader. Each data argument (or sometimes called slot) has its ownLayout
definition. -
mpsl::LayoutTmp<N>
- A layout that usesN
bytes of stack before it allocates dynamic memory. SinceLayout
instances are short-lived it's beneficial to allocate them on stack. -
mpsl::OutputLog
- An interface that can be used to catch messages produced by MPSL parser, analyzer, optimizer, and JIT compiler. -
mpsl::StringRef
- A string reference, which may be used to specify string and its length. You can pass non-NULL terminated strings to all MPSL APIs. -
mpsl::Int[2-4]
-int
vectors, which map to MPSL'sint[2-4]
types. -
mpsl::Float[2-4]
-float
vectors, which map to MPSL'sfloat[2-4]
types. -
mpsl::Double[2-4]
-double
vectors, which map to MPSL'sdouble[2-4]
types. -
mpsl::Error
- An error type (typedef touint32_t
) that is returned from MPSL APIs to report the result status.
Check out mpsl.h for more details about class member functions and public enumerations that can be used by embedders.
Below is a minimal code that creates a simple data layout and compiles a very simple shader:
#include <mpsl/mpsl.h>
// Data structure that will be passed to the program.
struct Data {
double a, b;
float c;
double result;
};
int main(int argc, char* argv[]) {
// Create the shader environment.
mpsl::Context context = mpsl::Context::create();
// Create the `Data` layout and register all variables that the shader has
// access to. Special variables like `@ret` have always `@` prefix to prevent
// a collision with valid identifiers.
mpsl::LayoutTmp<> layout;
layout.addMember("a" , mpsl::kTypeDouble | mpsl::kTypeRO, MPSL_OFFSET_OF(Data, a));
layout.addMember("b" , mpsl::kTypeDouble | mpsl::kTypeRO, MPSL_OFFSET_OF(Data, b));
layout.addMember("c" , mpsl::kTypeFloat | mpsl::kTypeRO, MPSL_OFFSET_OF(Data, c));
layout.addMember("@ret", mpsl::kTypeDouble | mpsl::kTypeWO, MPSL_OFFSET_OF(Data, result));
// This is our shader-program that will be compiled by MPSL.
const char body[] = "double main() { return sqrt(a * b) * c; }";
// Create the program object and try to compile it. The `Program1<Data>` means
// that the program accepts one data argument of type `Data`. You can just use
// `Program1<>` if you want the argument untyped (void).
mpsl::Program1<Data> program;
mpsl::Error err = program.compile(context, body, mpsl::kNoOptions, layout);
if (err) {
printf("Compilation failed: ERROR 0x%08X\n", static_cast<unsigned int>(err));
}
else {
Data data;
data.a = 4.0;
data.b = 16.0;
data.c = 0.5f;
err = program.run(&data);
if (err)
printf("Execution failed: ERROR 0x%08X\n", static_cast<unsigned int>(err));
else
printf("Return=%g\n", data.result);
}
// RAII - `Context` and `Program` are ref-counted and will
// be automatically destroyed when they go out of scope.
return 0;
}
More documentation will come in the future.
- AsmJit - 1.0 or later.
Please consider a donation if you use the project and would like to keep it active in the future.
- Petr Kobalicek kobalicek.petr@gmail.com