Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
773 lines (578 sloc) 38.7 KB

Reverse-mode automatic differentiation with Circle and Apex

struct terms_t {
  double x;
  double y;

terms_t my_grad(terms_t input) {
  return apex::autodiff_grad("sq(x / y) * sin(x * y)", input);

This example shows how to leverage the Apex DSL library to implement reverse-mode automatic differentation. The expression to differentiate is passed in as a compile-time string, and the primary inputs are provided as a vector of strings.

This illustrates shared object library development. There are three things you can count on:

  1. No template metaprogramming.
  2. No operator overloading.
  3. No performance compromise.

Do all your development in an ordinary C++/Circle shared object project. Call into this shared object library during source translation and capture the returned IR. Lower the IR to code using Circle macros. This is a new way forward for DSLs in C++.

Expression templates

In Standard C++, code must either be included in textual form or compiled to binary and linked with the program. Only the former form is generic--template libraries may be specialized to solve an application-specific problem.

EDSLs have been attempted by the C++ community for almost twenty years. In Standard C++, the expression template idiom is used to repurpose the C++ syntax as a domain-specific language. Operator overloading and template metaprogramming are combined to capture the result of a subexpression as a type. For example, if either terminal in the subexpression a + b is an EDSL type, the result object of the addition expression is a type that includes details of the operator and of each operand. For example, op_add_t<left_type, right_type>, where the template arguments are EDSL operator types that recursively specify their operand types. The type of the result object for the full expression contains the information of a parse tree over that same expression input. The expression template may be traversed (as if one were traversing a parse tree) and some calculation performed at each node.

Expression templates are extremely difficult to write, error messages are opaque (mostly due to the hierarchical nature of the involved types) and build times are long. Most critically, expression-template EDSLs don't allow very complex compile-time transformations on the parse tree content. Once the expression template is built, the user remains limited by C++'s lack of compile-time imperative programming support. The user cannot lower the expression template to a rational IR, or build tree data structures, or run the content through optimizers or analyzers.

The Apex vision for libraries

Circle's integrated interpreter and code reflection mechanisms establish a larger design space for libraries. What is a library with Circle? Any code that provides a service.

As demonstrated with the Tensor Compiler example, a Circle program can dynamically link to a shared object library at compile time, use that library to solve an intricate problem (tensor contraction scheduling), then lower the resultin solution (the intermediate representation) to code using Circle's macro system.

Apex is a collection of services to help programmers develop this new form of library. Currently it includes a tokenizer and parser for a C++ subset (called the Apex grammar), as well as a reverse-mode automatic differentation package that serves as both an example for building new libraries and an ingredient for additional numerical computing development.

Functionality built using Apex presents itself to the application developer as an embedded domain-specific language. But the design of Apex EDSLs is very different from the design of expression templates: there is no operator overloading; there is no template metaprogramming; we don't try to co-opt C++ tokens into carrying DSL information.

The client communicates with the library by way of compile-time strings. The contents may be provided as literals or assembled from code and data using Circle's usual compile-time mechanisms. The library transforms the input text into code:

  1. The Apex tokenizer lexes the text into tokens. The tokenizer code is in the shared object library
  2. The Apex parser processes the tokens into a parse tree. The parser code is also in The parse tree is an light-weight class hierarchy. There are node types for unary and binary operators, function calls, terminals, and so on. It is not a template library.
    Parse errors are formed by the parser--you don't get C++ frontend errors when the input is malformed, but Apex backend errors, which are cleaner and more relevant to the problem of parsing.
  3. The EDSL library traverses the parse tree and performs semantic analysis. This is where the library intelligence resides. All usual programming tools are available to the library. It can interact with the file system, host interpreters to execute scripts, and so on. The library intelligence should be compiled into a shared object; Apex's autodiff package is compiled into
    The output of the intelligence layer is the problem solution in an intermediate representation. This IR may be expressed using any C++ data structure. Because the library's shared object is loaded into the compiler's process during source translation, objects created by the intelligence layer occupy the same address space as the Circle interpreter, allowing those objects to be consumed and modified by meta code.
  4. A codegen header supplies the interface between the client, the Apex tokenizer and parser, and the library intelligence. This header provides Circle macros to lower the EDSL operation from IR to expression code.

Although this seems like an involved four-step pipeline, the first two components are reusable and provided by libraries. Even if you choose a different tokenizer or parser, you can use them from libraries. The intelligence layer establishes a nice separation of concerns, as you can develop it independently of the code generator. Finally, the codegen layer is very small, as all difficult work was pushed down and handled by shared object libraries.

A strength of this approach is that it requires very little Circle code, only a thin layer of macros for code generation. All the intelligence can be written with Standard C++ for portability and to ease migration into this new language.

Autodiff for Circle


#include <apex/autodiff_codegen.hxx>

struct terms_t {
  double x;
  double y;

terms_t my_grad(terms_t input) {
  return apex::autodiff_grad("sq(x / y) * sin(x * y)", input);

int main() {
  terms_t grad = my_grad( { .3, .5 } );
  printf("%f %f\n", grad.x, grad.y);
  return 0;
$ circle grad1.cxx -I ../include -M ../Debug/ 
$ ./grad1
1.053170 0.425549

To use Apex's autodiff, pass the formula to differentiate as a string, followed by a class object with the values of each primary input. The names referenced in the formula must correspond to the member names in the class object. We'll use Circle's type introspection to help relate the type information to the variable names.

The result object is another instance of the class type, this time holding the partial derivatives rather than the values of the independent variables.

After just two days of programming, this package supports these expressions and elementary functions:

  • Binary + - * and /.
  • Unary -.
  • sq, sqrt, exp, log, sin, cos, tan, sinh, cosh, tanh, pow and norm functions.

The call to autodiff_grad has distinct compile-time and runtime phases. At compile time, the formula is tokenized and parsed; the parse tree is lowered by make_autodiff to an IR called a "tape," and that tape is lowered by autodiff_codegen.hxx to code using Circle macros. At runtime, the independent variables are evaluated and the tape-generated code is executed, yielding the gradient. All scheduling is performed at compile time, and there is no runtime dependency on any part of the library.

Reverse-mode differentation is essentially a sparse matrix problem. Each dependent variable/subexpression is a row in the sparse matrix (an item in the tape) with a non-zero column for each partial derivative we'll compute to complete the chain rule. When considered as a DAG traversal, the chain rule calculation involves propagating partials from the root down each edge, and incrementing a component of the gradient vector by the concatenated chain rule coefficient. When viewed as linear algebra, the entire gradient pass is a sparse back-propagation operation.

The Apex autodiff example adopts the DAG view of the problem. The implementation performs best when the size of the DAG is small enough so that the gains of explicit scheduling of each individual back-propagation term more than offset the parallelism left on the table by not using an optimized sparse matrix back-propagation code.

However, the separation of autodiff intelligence and code generation permits selection of a back-propagation treatment most suitable for the particular primary inputs and expression graph. Calls into the autodiff library with different expressions may generate implementations utilizing different strategies, without the vexations of template metaprogramming.

Writing an embedded DSL for Circle programs

How do we implement the autodiff DSL? We basically write our own small compiler frontend--it takes text input, performs syntax and semantic analysis, and emits IR, just like a general-purpose compiler.

This would be too much work if written from scratch for each DSL library. We'll use the language components available in Apex to tokenize and parse the input text into a parse tree for the Apex grammar (a C++-inspired expression grammar), and consume the parse tree as configuration for the library.

The tokenizer

Apex includes a tokenizer that breaks an input text into operators (all the ones recognized by C++) and identifiers.


struct token_t {
  tk_kind_t kind : 8;
  int store : 24;
  const char* begin, *end;

  operator tk_kind_t() const { return kind; }
typedef const token_t* token_it;

The token structure holds an enumeration defining the kind of token (eg '+' token, integer token or identifier token) and an index into a store to retrieve a resource, like a string, integer or floating-point value.


struct tokenizer_t {
  std::vector<std::string> strings;
  std::vector<uint64_t> ints;
  std::vector<double> floats;

  // Byte offset for each line start.
  std::vector<int> line_offsets;

  // Original text we tokenized.
  std::string text;

  // The text divided into tokens.
  std::vector<token_t> tokens;

  parse::range_t token_range() const;

  int reg_string(range_t range);
  int find_string(range_t range) const;

  // Return 0-indexed line and column offsets for the token at
  // the specified byte offset. This performs UCS decoding to support
  // multibyte characters.
  int token_offset(source_loc_t loc) const;
  int token_line(int offset) const;
  int token_col(int offset, int line) const;
  std::pair<int, int> token_linecol(int offset) const;
  std::pair<int, int> token_linecol(source_loc_t loc) const;
  void tokenize();

The tokenizer_t class holds the input text, the array of tokens, the resources, and an array of line offsets to ease mapping between tokens and line/column positions within the input text. The tokenizer expects UTF-8 input, so characters may consume between one and four bytes; the token_linecol functions map token indices and byte offsets within the text to the correct line/column positions, accounting for these multi-byte characters.

To use the tokenizer, set the text data member and call the tokenize member function.

The parser

The parser consumes tokens from left-to-right and constructs a parse tree from bottom-to-top. Apex includes a hand-written recursive descent (RD) parser, which is the most practical and flexible approach to parsing.


struct node_t {
  enum kind_t {

  kind_t kind;
  source_loc_t loc;

  node_t(kind_t kind, source_loc_t loc) : kind(kind), loc(loc) { }
  virtual ~node_t() { }

  template<typename derived_t>
  derived_t* as() {
    return derived_t::classof(this) ? 
      static_cast<derived_t*>(this) : 

  template<typename derived_t>
  const derived_t* as() const {
    return derived_t::classof(this) ? 
      static_cast<const derived_t*>(this) : 
typedef std::unique_ptr<node_t> node_ptr_t;
typedef std::vector<node_ptr_t> node_list_t;

Each parse tree node derives apex::parse::node_t. source_loc_t is the integer index of the token from which the parse node was constructed, and we include one in each parse node. The tokenizer object can map source_loc_t objects back to line/column numbers for error reporting.

The full implementation of the parser is in grammar.cxx. We'll run the parser at compile time from a meta context in the Circle program. But unlike a template, which is C++ generic programming offering, we don't need to see the source of the parser from the source code of the client. The parser is compiled into, and the Circle interpreter will make a foreign function call to run the parser and retrieve the parse tree. We don't even need access to at runtime--the IR from the DSL library is lowered to Circle code during compile time, and the resulting binary retains no evidence of's role in its generation.


struct parse_t {
  tok::tokenizer_t tokenizer;
  node_ptr_t root;

parse_t parse_expression(const char* str);

Calling apex::parse::parse_expression tokenizes and parses an input text and returns both the tokenizer (which has line-mapping content) and the root node of the parse tree. node_ptr_t is an std::unique_ptr; when the user destroys the root object from meta code, the entire tree is recursively destroyed from the smart pointers' destructors.

The autodiff IR

The autodiff library traverses the parse tree and builds a data structure called a tape or Wengert list, which includes instructions for evaluating the value and partial derivatives for each subexpression.


struct autodiff_t {
  struct item_t {
    // The dimension of the tape item. 
    // 0 == dim for scalar. dim > 0 for vector.
    int dim;

    // The expression to execute to compute this dependent variable's value.
    // This is evaluated during the upsweep when creating the tape from the 
    // independent variables and moving through all subexpressions.
    ad_ptr_t val;

    // When updating the gradient of the parent, this tape item loops over each
    // of its dependent variables and performs a chain rule increment.
    // It calls grad(index, coef) on each index. This recurses, down to the
    // independent vars, multiplying in the coef at each recurse. 

    // When we hit an independent var, the grads array is empty (although it
    // may be empty otherwise) and we simply perform += coef into the slot
    // corresponding to the independent variable in the gradient array.
    struct grad_t {
      int index;
      ad_ptr_t coef;
    std::vector<grad_t> grads;

  // The first var_names.size() items encode independent variables.
  std::vector<autodiff_var_t> vars;
  std::vector<item_t> tape;

autodiff_t make_autodiff(const std::string& formula, 
  const std::vector<autodiff_var_t>& vars);

The result object of make_autodiff is an object of type autodiff_t. This holds the tape, and each tape item holds expressions to evaluating the tape's subexpression and that subexpression's gradient. The index in each gradient component refers to a position within the tape corresponding to the variable (dependent or independent) that the partial derivative is computed with respect to. When traversing the tape DAG, we concatenate partial derivatives; when we hit a terminal node (an independent variable), we increment the output gradient by the total derivative--this is the chain rule in action.

Although the DSL doesn't yet support it, the tape is designed to accomodate vector types in addition to scalar types.

The autodiff IR needs to be comprehensive enough to encode any operations found in the expression to differentiate. We chose the design for easy lowering using intrinsics like @op and @expression to generate code from strings.


struct ad_t {
  enum kind_t {
  kind_t kind;
  ad_t(kind_t kind) : kind(kind) { }

  template<typename derived_t>
  derived_t* as() {
    return derived_t::classof(this) ? 
      static_cast<derived_t*>(this) : 

  template<typename derived_t>
  const derived_t* as() const {
    return derived_t::classof(this) ? 
      static_cast<const derived_t*>(this) : 
typedef std::unique_ptr<ad_t> ad_ptr_t;

struct ad_tape_t : ad_t {
  ad_tape_t(int index) : ad_t(kind_tape), index(index) { }
  static bool classof(const ad_t* ad) { return kind_tape == ad->kind; }

  int index;

struct ad_literal_t : ad_t {
  ad_literal_t(double x) : ad_t(kind_literal), x(x) { }
  static bool classof(const ad_t* ad) { return kind_literal == ad->kind; }

  double x;

struct ad_unary_t : ad_t {
  ad_unary_t(const char* op, ad_ptr_t a) :
    ad_t(kind_unary), op(op), a(std::move(a)) { }
  static bool classof(const ad_t* ad) { return kind_unary == ad->kind; }

  const char* op;
  ad_ptr_t a;

struct ad_binary_t : ad_t {
  ad_binary_t(const char* op, ad_ptr_t a, ad_ptr_t b) : 
    ad_t(kind_binary), op(op), a(std::move(a)), b(std::move(b)) { }
  static bool classof(const ad_t* ad) { return kind_binary == ad->kind; }

  const char* op;
  ad_ptr_t a, b;

struct ad_func_t : ad_t {
  ad_func_t(std::string f) : ad_t(kind_func), f(std::move(f)) { }
  static bool classof(const ad_t* ad) { return kind_func == ad->kind; }

  std::string f;
  std::vector<ad_ptr_t> args;

The autodiff code in generates ad_t trees into the tape data structure. Each tree node is allocated on the heap and stored in an std::unique_ptr. Because the shared object is loaded into the address space of the compiler, the result object of the foreign-function library call is fully accessible to meta code in the translation unit by way of the Circle interpreter.

As with the tokenizer and parser, the implementation of the autodiff library is totally abstracted from the library's caller.

The tape-building class ad_builder_t in autodiff.cxx has member functions for each operation and elementary function supported by the DSL. For example, to support multiplication we implement the product rule of calculus:

int ad_builder_t::mul(int a, int b) {
  // The sq operator is memoized, so prefer that.
  if(a == b)
    return sq(a);

  // grad (a * b) = a grad b + b grad a.
  item_t item { };
  item.val = mul(val(a), val(b));
    b,      // a * grad b
    a,      // b * grad a
  return push_item(std::move(item));

The operands are indices to lower nodes in the tape. We use function overloads like mul and val to create ad_t nodes, which are assembled recursively into expression trees.

int ad_builder_t::sin(int a) {
  item_t item { };
  item.val = func("std::sin", val(a));
    func("std::cos", val(a))
  return push_item(std::move(item));

The sine function is supported with a similar member function. We generate ad_func_t nodes which identify the functions to call by string name. When the IR is lowered to code in autodiff_codegen.hxx, we'll use @expression to perform name lookup and convert these qualified names to function lvalues.

Note that we can deliver a rich calculus package without having to define a type system to interact with the rest of the C++ application. We don't have to require that sin and cos implement any particular concept or interface to participate in differentiation, because these are first-class functions supported by the DSL.

To allow user-extension to the autodiff library, such as user-defined functions, any convention may be used to communicate between the library and the client. A participating function and its derivative could adopt a particular naming convention (e.g., the function ends with _f and the derivative ends with _grad); the function and derivative could be member functions of a class that is named in the input string (e.g., naming "sinc" in the formula string performs name lookup for class sinc_t and calls member functions f and grad)

The strength of this design is that you aren't relying on C++'s overload resolution and type systems to coordinate between the library's implementation and its users; the library can introduce its own conventions for interoperability.

The autodiff code generator


template<typename type_t>
@meta type_t autodiff_grad(@meta const char* formula, type_t input) {

  // Parse out the names from the inputs.
    "argument to autodiff_eval must be a class object");

  // Collect the name of each primary input.
  @meta std::vector<autodiff_var_t> vars;
  @meta size_t num_vars = @member_count(type_t);

  // Copy the values of the independent variables into the tape.
  double tape_values[num_vars];
  @meta for(int i = 0; i < num_vars; ++i) {
    // Confirm that we have a scalar double-precision term.
    static_assert(std::is_same<@member_type(type_t, i), double>::value,
      std::string("member ") + @member_name(type_t, i) + " must be type double");

    // Push the primary input name.
    @meta vars.push_back({
      @member_name(type_t, i),

    // Set the primary input's value.
    tape_values[i] = @member_ref(input, i);

  // Construct the tape. This makes a foreign function call into
  @meta apex::autodiff_t autodiff = apex::make_autodiff(formula, vars);
  @meta size_t count = autodiff.tape.size();

  // Compute the values for the whole tape. This is the forward-mode pass. 
  // It propagates values from the terminals (independent variables) through
  // the subexpressions and up to the root of the function.

  // Evaluate the subexpressions.
  @meta for(size_t i = num_vars; i < count; ++i)
    tape_values[i] = autodiff_expr(autodiff.tape[i].val.get());

  // Evaluate the gradients. This is a top-down reverse-mode traversal of 
  // the autodiff DAG. The partial derivatives are parsed along edges, starting
  // from the root and towards each terminal. When a terminal is visited, the
  // corresponding component of the gradient is incremented by the product of
  // all the partial derivatives from the root of the DAG down to that 
  // terminal.
  double coef[num_vars];
  type_t grad { };

  // Visit each child of the root node.
  @meta int root = count - 1;
  @meta for(const auto& g : autodiff.tape[root].grads) {
    // Evaluate the coefficient into the stack.
    coef[root] = autodiff_expr(g.coef.get());

    // Recurse on the child.
    @macro autodiff_tape(g.index, root);

  return std::move(grad);

autodiff_grad is implemented as a metafunction. Recall that these are like super function templates: some parameters are marked @meta, requiring compile-time arguments be provided. By making the input formula string @meta, we can pass the string to apex::make_autodiff at compile time to generate an IR. This function is implemented in, so -M must be specified as a circle argument to load this shared object library as a dependency.

After the tape has been computed and returned, we first initialize the tape values in forward order (that is, from the leaves of the expression tree up to the root). Then, we may the reverse pass, propagating partial derivatives from the root of the expression tree down to the terminals, where the gradient is finally updated.

Although the values in the tape will be used again during the top-down gradient pass, their storage may be a performance limiter in problems with a very large number of temporary nodes. Because the library defines its own IR and scheduling intelligence, it's feasible to extend the IR and emit instructions to rematerialize temporary values to alleviate storage pressure.


@macro auto autodiff_expr(const ad_t* ad) {
  @meta+ if(const auto* tape = ad->as<ad_tape_t>()) {
    @emit return tape_values[tape->index];

  } else if(const auto* literal = ad->as<ad_literal_t>()) {
    @emit return literal->x;

  } else if(const auto* unary = ad->as<ad_unary_t>()) {
    @emit return @op(

  } else if(const auto* binary = ad->as<ad_binary_t>()) {
    @emit return @op(

  } else if(const auto* func = ad->as<ad_func_t>()) {
  	// Support 1- and 2-parameter function calls.
    if(1 == func->args.size()) {
      @emit return @expression(func->f)(autodiff_expr(func->args[0].get()));

    } else if(2 == func->args.size()) {
      @emit return @expression(func->f)(autodiff_expr(func->args[0].get()),

The expression macro autodiff_expr recurses an ad_t tree and switches on each node kind.

  • The macro is expanded in the scope of the caller, so the tape_values object is visible; this provides access to the value of each subexpression.
  • The unary and binary nodes hold strings with the operator names, such as "+" or "/". We can pass these strings to @op along with the expression arguments to synthesize the corresponding kind of expression.
  • Function call nodes have the name of the function stored as a string. When evaluated with @expression, name lookup is performed on the qualified name (eg, "std::cos") and returns a function lvalue or overload set.

Each tape item (corresponding to sparse matrix row) includes one ad_t tree that renders the value of the subexpression, and one ad_t per child node in the DAG to compute partial derivatives. The values are computed in bottom-up order (forward through the tape), and the partial derivatives are computed in top-down order (reverse mode through the tape). An optimization potential may be exposed by evaluating all partial derivatives in parallel (there are no data dependencies between them), and using a parallelized sparse back-propagation code to concatenate the partial derivatives. Again, these choices should be made by the intelligence of the library, which is well-separated from the metaprogramming concerns of the code generator.

DSL error reporting

Circle has been adding capability for better integration of compile-time exceptions with compiler errors. If an exception is thrown either from the Circle interpreter or from a foreign function call to a dynamically-loaded shared object, a backtrace for the unhandled exception is printed along with the exception's error. This helps us understand exactly where and why the error was generated.

Consider breaking the syntax of our Apex autodiff function:

terms_t my_grad(terms_t input) {
  return apex::autodiff_grad("sq(x / y) * 2 sin(x)", input);

The Apex grammar doesn't support this kind of implicit multiplication. The sin call is simply an unexpected token, so our parser throws an exception indicating such:


void grammar_t::unexpected_token(token_it pos, const char* rule) {
  const char* begin = pos->begin;
  const char* end = pos->end;
  int len = end - begin;

  std::string msg = format("unexpected token '%.*s' in %s", len, begin, rule);

  throw parse_exception_t(msg);

This exception originates in compiled code, and unrolls the stack through the foreign function call to apex::make_autodiff, through the entire Circle compiler, until it is finally caught from Circle's main function and printed as an uncaught exception. Wonderfully, Circle constructs backtrace information as this exception unrolls through function calls and macro expansions, and presents the backtrace as compiler errors:

grad1.cxx errors

The text saved in the parse_exception_t class (which derives std::runtime_error, which is how we access its error message) is printed in the most-indented part of the error. Around that, we're shown that it's thrown from apex::make_autodiff from In turn, that function call is made from inside the instantiation of the metafunction apex::autodiff_grad. The offending string is shown here, and we can see the call to sin which corresponds to the parse error thrown from inside Apex.

The backtrace will print the full path of functions called, regardless of if they are called from interpreted or compiled code. Additionally, the throw-expression generating the exception is flagged:


#include <stdexcept>

void func1() {
  throw std::runtime_error("A bad bad thing");  

void func2() {

void func3() {

@meta func3();

test.cxx errors

Circle as a build system

Circle integrates with the host environment and provides functionality of scripting languages and build systems. We can use this compile-time capability and drive program generation from resources. To extend the gradient example let's specify functions to differentiate in JSON files in the working directory:


  "F1" : "sin(x / y + z) / sq(x + y + z)",
  "F2" : "tanh(sin(x) * exp(y / z))"

The translation unit will open this file using the header-only json.hpp library, read the contents, inject functions for evaluating both functions, and call into the Apex library for code to compute the derivatives for each library. Finally, we'll generate a driver program that takes command-line arguments for evaluating one of the functions and its gradient.


#include <json.hpp>
#include <fstream>
#include <iostream>
#include <map>
#include <apex/autodiff_codegen.hxx>

// Parse the JSON file and keep it open in j.
using nlohmann::json;
using apex::sq;

struct vec3_t {
  double x, y, z;

// Record the function names encountered in here!
@meta std::vector<std::string> func_names;

@macro void gen_functions(const char* filename) {
  // Open this file at compile time and parse as JSON.
  @meta std::ifstream json_file(filename);
  @meta json j;
  @meta json_file>> j;
  @meta for(auto& item : j.items()) {
    // For each item in the json...
    @meta std::string name = item.key();
    @meta std::string f = item.value();
    @meta std::cout<< "Injecting '"<< name<< "' : '"<< f<< "' from "<< 
      filename<< "\n";
    // Generate a function from the expression.
    extern "C" double @("f_" + name)(vec3_t v) {
      double x = v.x, y = v.y, z = v.z;
      return @expression(f);
    // Generate a function to return the gradient.
    extern "C" vec3_t @("grad_" + name)(vec3_t v) {
      return apex::autodiff_grad(f.c_str(), v);

    @meta func_names.push_back(name);

First, we'll write a statement macro gen_functions which takes a filename (since this is a macro, the value of the argument must be known at compile time). Expanding the macro creates a new meta scope but reuses the real scope of the call site. This way, we can create new meta objects like the file handle and JSON object without polluting the declarative region we're expanding the macro into.

We open the JSON file at compile time and range-for through its contents. We'll echo back to the user the functions being generated, then define functions "f_" + name and "grad_" + name. The Circle operator @() transforms strings to tokens. If we expand this statement macro into a namespace scope, the real declarations are interpreted as function definitions.

Circle macros allow you to programmatically inject code into any statement- or expression-accepting scope. Unlike preprocessor macros, these macros follow argument deduction and overload resolution rules. They establish a new meta scope to allow you to use your own tooling. Only the real statements fall through the macro and are inserted into the target scope.

After the value and gradient functions are defined, we simply push the name of the function to the func_names array which sits in the global namespace.

@macro gen_functions("formula.json");

std::pair<double, vec3_t> eval(const char* name, vec3_t v) {
  @meta for(const std::string& f : func_names) {
    if(!strcmp(name, @string(f))) {
      return {
        @("f_" + f)(v),
        @("grad_" + f)(v)

  printf("Unknown function %s\n", name);

void print_usage() {
  printf("  Usage: grad2 name x y z\n");

int main(int argc, char** argv) {
  if(5 != argc)
  const char* f = argv[1];
  double x = atof(argv[2]);
  double y = atof(argv[3]);
  double z = atof(argv[4]);
  vec3_t v { x, y, z };

  auto result = eval(f, v);
  double val = result.first;
  vec3_t grad = result.second;

  printf("  f: %f\n", val);
  printf("  grad: { %f, %f, %f }\n", grad.x, grad.y, grad.z);
  return 0;

In the second half of the program we generate a driver capability. eval is a normal runtime function that takes the name of the function as a string and the primary inputs as a vec3_t. We then loop over all the function names pushed in gen_functions. The Circle extension @string converts a compile-time std::string back into a string literal, so we can strcmp it against our runtime function name. If we have a match, we'll evaluate the function and its gradient and return the results in an std::pair.

Note that we have to expand gen_functions over the input file formula.json prior to entering the definition for eval, because gen_functions populates the func_names array. If we were to change the order of operations here, func_names would be empty when eval is translated, resulting in a broken driver program.

$ circle grad2.cxx -I ../include/ -M ../Debug/ 
Injecting 'F1' : 'sin(x / y + z) / sq(x + y + z)' from formula.json
Injecting 'F2' : 'tanh(sin(x) * exp(y / z))' from formula.json

$ ./grad2 F1 .5 .6. .7
  f: 0.308425
  grad: { 0.361961, 0.358750, 0.354255 }

More build options

We've built both diagnostics and a client program into our translation unit, driven by a simple portable JSON resource file sitting in the source directory.

With a one-line change we can turn grad2 into grad3, a command-line tool that gets pointed at a JSON configuration file and generates code from that. How do we pass arguments through the Circle compiler to the translation unit? Preprocessor macros! We can still get some use from them:


@macro gen_functions(INPUT_FILENAME);

Instead of expanding the gen_functions statement macro on "formula.json", let's expand it on a macro defined to a string literal. How do we build? Just use the -D command-line argument to bind macros. Keep in mind the escaped quotes " to satisfy the requirements of the Linux terminal:

$ circle grad3.cxx -I ../include/ -M ../Debug/ -DINPUT_FILENAME=\"formula2.json\"
Injecting 'F3' : 'exp(x * x) + y / z' from formula2.json
Injecting 'F4' : 'sqrt(sq(x) + sq(y) + sq(z))' from formula2.json

$ ./grad3 F4 .4 .5 .6
  f: 0.877496
  grad: { 0.455842, 0.569803, 0.683763 }

This is a cute change. We've just separated the resource from the source code of the program.

But can we really exploit the build-system qualities of the compiler? What about having the translation unit scrape all the JSON files in the working directory, and generating code from all the functions found in those?


#include <json.hpp>
#include <fstream>
#include <iostream>
#include <map>
#include <apex/autodiff_codegen.hxx>
#include <dirent.h>

inline std::string get_extension(const std::string& filename) {  
  return filename.substr(filename.find_last_of(".") + 1);

inline bool match_extension(const char* filename, const char* ext) {
  return ext == get_extension(filename);

// ... Snipped out the same gen_functions macro as grad2.cxx.

// Use Circle like a build system:

// Open the current directory.
@meta DIR* dir = opendir(".");

// Loop over all files in the current directory.
@meta while(dirent* ent = readdir(dir)) {

  // Match .json files.
  @meta if(match_extension(ent->d_name, "json")) {

    // Generate functions for all entries in this json file.
    @macro gen_functions(ent->d_name);

@meta closedir(dir);
$ circle grad4.cxx -I ../include -M ../Debug/ 
Injecting 'F1' : 'sin(x / y + z) / sq(x + y + z)' from formula.json
Injecting 'F2' : 'tanh(sin(x) * exp(y / z))' from formula.json
Injecting 'F3' : 'exp(x * x) + y / z' from formula2.json
Injecting 'F4' : 'sqrt(sq(x) + sq(y) + sq(z))' from formula2.json

$ ./grad4 F3 .2 .3 .4
  f: 1.790811
  grad: { 0.416324, 2.500000, 1.875000 }

POSIX systems put their filesystem APIs in dirent.h. Let's include this and write our own function to test the extension of a filename, a simple operation curiously missing from the POSIX API.

Now, at compile time, use opendir to open the current directory. Keep hitting readdir to return a descriptor for the next file in the directory. If the filename ends with ".json", expand the gen_functions macro on the JSON's filename. When we've exhausted the directory, be a good citizen and closedir the directory handle.

Even though we're nested in a while and an if statement, the real scope of the program at the site of the macro expansion is the global namespace, so the macro will inject its generated functions in the global namespace.

One of the biggest strengths of using Circle as a build system over trying to express equivalent operations in CMake or Make or any other tool is familiarity. Even if you didn't know the dirent.h API, you can look it up and in a minute or two know exactly how to use it. Perusing the CMake reference for a directory enumeration command isn't the same help, because you're still stuck trying to express yourself in a language you probably don't know very well (and a language that is much less expressive than C++).

The design goal of Circle is to extend the compiler to let you do much more with the same language.

You can’t perform that action at this time.