Skip to content

Avoiding generic programming pitfalls

jckarter edited this page Jan 8, 2011 · 5 revisions

Code bloat

Excessive generic function specialization can lead to large executables, and added instruction cache pressure can eliminate any performance benefit. By default, a Clay function with no type information will specialize for every set of input types it's given:

  muladd(a, b, c) = a * b + c; // A separate muladd function will be compiled for every set of input types

For small functions that will likely be inlined, the cost will be minimal, but for a large function, this is problematic.

Funneling types

An obvious way to curtail the number of instances is to explicitly limit the set of allowed types:

  // Only allow int inputs
  muladd(a: Int, b: Int, c: Int) = a * b + c;
  // Only allow Int or Float inputs
  [A, B, C | allValues?(T => inValues?(T, Int, Float), A, B, C)]
  muladd(a: A, b: B, c: C) = a * b + c;

Limiting to a small set of types may be undesirable, especially when dealing with various different-sized integer or floating-point types. If a function operates on a related group of types, it can convert those types to a canonical "funnel" type:

  // Generic wrapper accepts any set of Integer? types and funnels them down to Int64
  [A, B, C | allValues?(Integer?, A, B, C)]
  muladd(a: A, b: B, c: C) = muladd(Int64(a), Int64(b), Int64(c));
  
  // Principal overload implements the actual operation
  overload muladd(a: Int64, b: Int64, c: Int64) = a * b + c;

The size of the generic stub will then be minimal—only the size of the conversions and the call to the principal overload—while only one instance of the actual body of the function needs to be generated.

Dynamic dispatch

Often, a function contains a large body of generic code with only a small amount of behavior that needs to be specialized on the type of its arguments. For example, the following function doesn't need to care about the types of its lastname, firstname, and address functions beyond whether bindStatement accepts them as arguments:

  saveClient(db: DB, lastname, firstname, address) {
      var stmt = DBStatement(db, "insert into clients (lastname, firstname, address) values (?, ?, ?)");
      bindStatement(stmt, 0, lastname);
      bindStatement(stmt, 1, firstname);
      bindStatement(stmt, 2, address);
      execStatement(stmt);
  }

Nonetheless, saveClient will be instantiated for every set of input types. However, we can funnel the desired input types into a variant type, and use the * dynamic dispatch operator to dispatch to specialized bindStatement calls:

  variant ClientField = String | StringConstant | UTF8String;

  [L, F, A | allValues?(T => VariantMember?(ClientField, T), L, F, A)]
  saveClient(db: DB, lastname: L, firstname: F, address: A) {
      saveClient(ClientField(lastname), ClientField(firstname), ClientField(address));
  }

  overload saveClient(db: DB, lastname: ClientField, firstname: ClientField, address: ClientField) {
      var stmt = DBStatement(db, "insert into clients (lastname, firstname, address) values (?, ?, ?)");
      bindStatement(stmt, 0, *lastname);
      bindStatement(stmt, 1, *firstname);
      bindStatement(stmt, 2, *address);
      execStatement(stmt);
  }

With this code, only the conversion stub and bindStatement need to be specialized; the main body of saveClient requires only one instance for all of the member types of ClientField.

Bad error messages