Skip to content

Clay for C++ programmers

jckarter edited this page Jan 3, 2012 · 33 revisions

This document is intended to introduce Clay to programmers familiar with C++. It isn't intended as a "Clay vs. C++" argument but as a purely informative comparison.

Clay vs C++: Executive summary

Clay's primary goal is to be a systems programming language, with a focus on performance and generic programming. Clay thus shares many features with C++:

  • Value semantics, and Resource-Acquisition-is-Initialization (RAII)
  • Raw memory access, including pointers and pointer arithmetic
  • Minimal runtime dependencies—no required VM or garbage collector
  • ABI compatibility with C
  • Templates (called generics in Clay)
  • ...

However, Clay also discards many C++ features:

  • Source compatibility with C
  • Language-level support for object-oriented programming (although Clay is flexible enough in which to implement a custom object system and to provide bindings to foreign object systems such as Cocoa, glib, or COM)
  • ...

On the other hand, it adds many new features over C++98:

  • Pervasive type propagation—nearly all types can be inferred
  • Generic specialization based on type predicates
  • Multiple dispatch through variant types
  • ...

and has many features also added by C++11:

  • Variadic generics and types
  • Lambdas
  • Perfect forwarding
  • ...

Comparing "Hello World"

In C++:

  #include <iostream>
  int main() { std::cout << "Hello world!\n"; }

In Clay:

  main() { println("Hello world!"); }

This simple example introduces some basic differences between C++ and Clay:

  • Clay's io library is part of its prelude, a module implicitly available to every Clay module, so there is no need to include a header file to access the println function.
  • main does not need to declare its return type, because Clay will infer the type from the body of the function. Since there is no return statement, Clay infers that it returns no values. Clay also allows main to return an integer value back to the OS, as in C++.
  • Clay's println is a variadic function that prints zero or more arguments to stdout, followed by a newline.

Type systems

Clay's set of fundamental types is similar to C and C++'s:

  • Sized integers: Int8, Int16, Int32, Int64 (with the last three aliased as Short, Int, and Long, respectively)
  • The corresponding unsigned integers: UInt8 etc. (with corresponding UShort etc. aliases)
  • Floating-point numbers: Float32, Float64 (aliased as Float and Double)
  • Pointers: Pointer[T] for any type T. Pointer[UInt8] is aliased as RawPointer
  • Arrays: Array[T,n] for any type T and integer size n

To ease communication with C libraries, Clay's standard library defines aliases CLong and CULong that match the standard sizes for the corresponding C types.

Clay type names are also constructor/cast functions for their respective types:

  var x = Int(5.0); // x will equal the Int 5
  var y = Float(7); // y will equal the Float 7.0f
  var z = Pointer[UInt](&x); // z will be a UInt pointer, containing the address of x
  var w = Array[Int, 3](9, 18, 27); // w will be an array of the 3 Ints 9, 18, and 27

These types behave mostly like their C++ counterparts; however, there are some minor differences:

  • Clay never implicitly converts between types, unlike C++. When calling functions, the parameter types must match exactly, even among integer or float types:
foo(x: Int, y: Int) = x + y; // Function takes two Ints, returns an Int
main() { println(foo(2u, 3u)); } // ERROR: 2u and 3u are UInt but foo expects Ints

However, functions can be made generic on their inputs and perform conversions on behalf of their callers. See the Generics section for details.

  • Clay's Arrays have value semantics and do not degenerate into pointers when used in expressions or passed to functions, so this is valid and will copy x into y:
var x = array(1, 2, 3);
var y = array(4, 5, 6);
y = x;

The library function begin returns a pointer to the first element of an array. Using the & operator on an array will give a pointer to the entire array, which will have the same address value as begin but will be of a different type: &array gives a Pointer[Array[T, n]], while begin(array) gives a Pointer[T], for array element type T.

  • Clay's cast functions also perform bounds checking, so the following will throw an assertion:
var x = UInt(-1);

Use wrapCast instead:

var x = wrapCast(UInt, -1);

Clay also provides some primitive data structure types:

  • Tuples: Tuple[..T] for any set of types ..T. These provide anonymous structure types, similar to boost::tuple or std::tuple introduced in C++11.
  • Unions: Union[..T] for any set of types ..T. Unlike C or C++'s unions, Clay Unions are anonymous like Tuples.

Clay has a special generic type Static[x] without a clear analog in C++. When used as a runtime value, a Clay function or type name x manifests itself as a value of the type Static[x]; for example, Int is of type Static[Int] and main is of type Static[main]. Integers and floats can also be introduced as static values using the static keyword: static 0 gives a value of the type Static[0]. A value of any of these types is empty and indistinguishable from any other value of the same type. Static values with their unique types allow compile-time constructs to be available as runtime values, and allow functions to be overloaded and specialized on numeric parameters using the unique types.

New types can be introduced with Clay using records and variants. A record, like a C++ struct, aggregates a set of member types:

  record Foo (x: Int, y: Float);

A variant is a type-safe union, similar to boost::variant in C++:

  variant Foo (Int, Float);

Unlike Clay Unions, C++ unions, or even boost::variant, Clay variants are open and can have new types introduced as instances, even from different modules. For example, in Clay any object that can be thrown must be a member of the Exception variant:

  record NoSuchFileError (filename: String);
  instance Exception (NoSuchFileError);

Records are constructed by field order:

  record Foo (x: Int, y: Float);

  var a = Foo(2, 3.0f); // a.x = 2, a.y = 3.0f

Variants are constructed from a member type:

  variant Foo (Int, Float);
  var a = Foo(5); // a contains an Int 5
  var b = Foo(7.0f); // b contains a Float 7.0f

As in C++, Clay types can be parameterized on type or numeric arguments. The primitive types Pointer[], Array[], Tuple[], Union[], and Static[] are parameterized in this way. New record or variant types can be defined with parameters too using the same [] syntax:

  record Foo[T] (x: Int, y: Array[T, 12]);
  variant Foo[T] (Int, Array[T, 12]);

Functions

Function definitions in Clay look similar to C or C++ function definitions. However, argument types are suffixed to argument names with :, and return types are placed after the argument signature.

  // C++
  int triangular_number(int x) { return (x * (x + 1)) / 2; }
  // Clay
  triangular_number(x:Int) : Int { return (x * (x + 1)) / 2; }

Clay also allows for multiple return values, separated by commas in the return type declaration and return statement:

  // Clay
  div_mod(num:Int, denom:Int) : Int, Int { return num / denom, num % denom; }

If a function definition consists of a single return statement, the definition can instead be expressed with = followed by the returned expression(s):

  // Clay
  triangular_number(x:Int) : Int = (x * (x + 1)) / 2;
  div_mod(num:Int, denom:Int) : Int, Int = num / denom, num % denom;

The return type declaration is optional. If not present, the return types of the function will be inferred from the types of the returned expressions:

  // Clay
  triangular_number(x:Int) = (x * (x + 1)) / 2;
  div_mod(num:Int, denom:Int) = num / denom, num % denom;

Clay functions may also be overloaded using the overload keyword:

  // Clay
  triangular_number(x:Int) = (x * (x + 1)) / 2;
  overload triangular_number(x:UInt) = (x * (x + 1)) / 2;

Clay also supports generic functions. If an argument is not declared with a type, the function is generic for all types for that argument:

  // C++
  template<typename T>
  T triangular_number(T x) { return (x * (x + 1)) / 2; }
  // Clay
  triangular_number(x) = (x * (x + 1)) / 2;

In Clay, the set of allowed input types of a generic function may also be constrained with a guard pattern. A guard pattern precedes a function or overload definition, and consists of an opening square bracket [, one or more pattern variable declarations (which stand in for arbitrary types), a pipe |, a boolean expression involving the pattern variables, and a final closing bracket ]. The function or overload will only accept types for which the expression gives a true value. Arguments and return types are bound to pattern variables in the same way they are declared with concrete types.

  // Clay, with declared return type
  [T | Number?(T)]
  triangular_number(x:T) : T = (x * (x + 1)) / 2;

  // With inferred return type
  [T | Number?(T)]
  triangular_number(x:T) = (x * (x + 1)) / 2;

A function may also be overloaded for different guard patterns. Overloads are checked in reverse definition order, and the first overload that matches a call site is invoked.

  [T | Integer?(T)]
  abs(x:T) {
      if (x < 0)
          return -x;
      else
          return x;
  }

  [T | Float?(T)]
  abs(x:T) {
      if (x < 0.0)
          return -x;
      else if (x == 0.0) // abs(-0) => +0
          return 0.0;
      else
          return x;
  }

Pattern variables may also be used as arguments of parameterized types, to constrain an argument to instances of that type. The pipe and guard expression may be left out of the pattern guard if no additional constraint is necessary.

  // Define incr_pointee for all Pointer[T]
  [T]
  incr_pointee(p:Pointer[T]) { p^ += 1; }
  // Define incr_pointee only for Pointers to "Number?" types
  [T | Number?(T)]
  incr_pointee(p:Pointer[T]) { p^ += 1; }

References

Like C++, Clay supports references, which act as implicit aliases for other values. In Clay, a reference variable can be bound with the ref keyword.

  // C++
  int x = 1;
  int &y = x;

  y = 2;
  std::cout << x << "\n";
  // Clay
  var x = 1;
  ref y = x;

  y = 2;
  println(x);

The ref keyword may also be used in a return statement (or = shorthand function definition) to return by reference. The result of a return-by-reference function can then be assigned to like a variable. Mutable accessors such as index (which implements the [] operator) can be defined in this fashion.

  // C++
  class int4 {
      int values[4];
      int &operator[](int i) { return values[i]; }
  };

  void foo(int4 &a) {
      a[0] = a[3];
  }
  // Clay
  record Int4 (values: Array[Int, 4]);
  overload index(a:Int4, i:Int) = ref a.values[i];

  foo(a:Int4) {
      a[0] = a[3];
  }

In Clay, all arguments are passed by reference.

  incr(x:Int) { x += 1; }
  foo() {
      var x = 2;
      incr(x);
      println(x); // prints 3
  }

Note that, although Clay automatically propagates the type of an expression, the value/reference-ness of an expression is not automatically propagated. Assigning a reference expression to a var binding will copy the referenced value, and returning a reference expression without the ref keyword will return a copy of the referenced value. To pass a reference upward through multiple functions as a reference thus requires consistent use of the ref keyword.

  foo(a:Int4) {
      // Although index(Int4, Int) returns by reference, x will be a copy of the referenced return value 
      var x = a[0];

      // This returns a copy of the referenced value
      return a[0];
  }

  bar(a:Int4) {
      // y will be a reference to the same value as returned
      ref y = a[0];
      // This returns the same reference
      return ref a[0];
  }

However, passing a reference as a function argument will pass along the same reference:

  incr(x:Int) { x += 1; }

  baz(a:Int4) { incr(a[0]); } // a[0] will be directly incremented by incr

In generic code, you may not know whether a function returns by reference or by value given certain arguments. For example, many sequences are immutable and their index operator returns a value rather than a reference. To forward the reference/value-ness of an expression, use the forward keyword in a return expression:

  base1_index(seq, index) = forward seq[index - 1];

  foo(x) {
      var a = [1, 2, 3, 4, 5]
      // assign 6 to a[4] using base1_index
      base1_index(a, 5) = 6;

      // ERROR: range(n) returns an immutable sequence of integers
      // index(Range) returns by value and cannot be assigned
      base1_index(range(5), 5) = 6;
  }

Modules and namespaces

Operators

Both C++ and Clay allow operator overloading. In C++, operators are represented with special names of the form operator <op>:

  struct vec2 { double x, y; };

  vec2 operator+(vec2 a, vec2 b) { vec2 r = { a.x + b.x, a.y + b.y }; return r; }

  vec2 a = { 1.0, 2.0 };
  vec2 b = { 3.0, 4.0 };
  vec2 c = a + b;

In Clay, operators are treated as special syntax for a set of standard library functions. For example, the + operator invokes the add function (implicitly imported from the prelude module). Overloading the add function extends the + operator:

  record Vec2 (x: Double, y: Double);

  overload add(a: Vec2, b: Vec2) = Vec2(a.x + b.x, a.y + b.y);

  var a = Vec2(1.0, 2.0);
  var b = Vec2(3.0, 4.0);

  var c = a + b;

Some Clay operators have different syntax from the equivalent C++ operators:

  • In Clay, pointer dereferencing is represented by postfix ^ rather than prefix *. This makes pointer dereference unambiguous with struct field dereference, so Clay does not need a separate -> operator:
    • C++: *foo Clay: foo^
    • C++: *foo.bar Clay: foo.bar^
    • C++: foo->bar Clay: foo^.bar
    • C++: *foo->bar Clay: foo^.bar^
  • Clay's logical boolean operators are the keywords not, and, and or. and and or short-circuit like && and || in C++.
  • Clay does not yet provide operators for bitwise operations. Function syntax is used for bitwise operations:
    • C++: x << y Clay: bitshl(x, y)
    • C++: x >> y Clay: bitshr(x, y)
    • C++: ~x Clay: bitnot(x)
    • C++: x & y Clay: bitand(x, y)
    • C++: x | y Clay: bitor(x, y)
    • C++: x ^ y Clay: bitxor(x, y)
      • bitand, bitor, and bitxor are variadic and can take more than two arguments:
        • C++: x | y | z | w Clay: bitor(x, y, z, w)
  • Clay provides additional bitwise functions not directly provided by C++:
    • C++: x & ~y Clay: bitandc(x, y)
    • C++: x << y | x >> (8*sizeof(x) - y) Clay: bitrol(x, y)
    • C++: x >> y | x << (8*sizeof(x) - y) Clay: bitror(x, y)

Object lifecycle

Clay and C++ both allow custom types to implement custom value semantics and to control object resources by overloading copy construction, destruction, and assignment. A classic example is a string type that allocates a dynamic buffer in which to store the string contents. To safely manage its resources, the string type must allocate a new buffer when it is copied or assigned, and free the buffer when it is destroyed. In C++, this is implemented using constructors, destructors, and operator= within a class:

  #include <cstring>
  using std::strdup;
  using std::free;

  class silly_string {
  private:
      char *str;
  public:
      silly_string(char const *s) : str(strdup(s)) {}

      silly_string(silly_string const &s)
          : str(strdup(s.str)) {}

      ~silly_string() { free(str); }

      silly_string &operator=(silly_string const &s) {
          free(str);
          str = strdup(s.str);
      }
  };

In Clay, constructors, destructors, and assignment are free functions. A type name doubles as the constructor function for that type. Destruction is performed by the destroy function, and assignment by the assign function:

  import libc.(strdup, free);

  record SillyString (str: Pointer[CChar]);

  overload SillyString(s: Pointer[CChar]) --> returned: SillyString {
      returned.str <-- strdup(s);
  }
  overload SillyString(s: SillyString) = SillyString(strdup(s.str));

  overload destroy(s: SillyString) { free(s.str); }

  overload assign(to: SillyString, from: SillyString) {
      free(to.str);
      to.str = strdup(from.str);
  }

Strings

In C++, literal strings evaluate to char const * pointers to immutable constant string data (or to arrays of char if used to initialize a character array), and the standard C++ library provides a std::string class for holding dynamically mutable, growable strings. Clay behaves similarly: Literal strings evaluate to StringConstant values, which are handles to immutable constant string data, and a String container supports dynamic strings.

Since Clay's StringConstants are a distinct type from Pointer[Char], they support high-level string and sequence operations directly without being cast to String:

  var x = "foo" + "bar"; // x will be String("foobar")
  for (c in "antidisestablishmentarianism") // iterate all the Chars in a string constant
      println(c, ": ", Int(c));

Char is also a distinct type from Int8 or UInt8, and must be explicitly converted. This allows the above example to correctly print c as a character and as an integer ASCII code. Like C++'s char, Clay Chars only represent 8-bit code points; multi-byte and Unicode support is left to the library.

...

Compile-time computation

C++ indirectly supports compile-time computation through template classes:

  template<int n>
  class factorial {
      static int value = n * factorial<n - 1>::value;
  };

  template<>
  class factorial<0> {
      static int value = 1;
  };

  int main() { std::cout << factorial<5>::value << "\n"; }

Clay directly supports compile-time computation through static values. The static keyword evaluates its parameter at compile time. The result n becomes a runtime-stateless value of type Static[n], which can be bound to a static function argument. Type parameters are also evaluated at compile time.

  factorial(n) = reduce(multiply, 1, range(n));

  define showStatic;
  overload showStatic(static 1) { println("one"); }
  overload showStatic(static 2) { println("two"); }

  main() {
      showStatic(static factorial(0));
      showStatic(static factorial(2));
      var a = Array[Int, factorial(3)](1,2,3,4,5,6);
      println(a);
  }
Something went wrong with that request. Please try again.