-
Notifications
You must be signed in to change notification settings - Fork 0
LanguageIntroduction
This introduction is under development. Beware incomplete thoughts and poor writing.
At its lowest level, Pants is simply Lambda Calculus in continuation-passing-style with some syntax sugar translated to C.
A common feeling within the software engineering community is that language expressiveness can come at the cost of performance, and Pants is setting out to disprove that hypothesis. Pants is an attempt to create a powerful scripting-like language with full support for fancy custom control structures and DSLs without any performance compromises or any messy scope bloat.
Of course, in its current very alpha state, there is an incredibly unfortunate amount of performance compromise. However, a primary design goal is to create a language that is easy to analyze statically, and therefore provide future deep optimizations.
The basic grammar (which will be explained in more detail) is as follows:
<program> := <expression>*
<expression> := <assignment>
| <application>
<assignment> := <term> (`=` | `:=`) <expression>
<appliciation> := <term>+
<term> := <header>* <value> <trailer>*
<header> := <left-open-call-indicator>
<trailer> := <right-open-call-indicator>
| <index>
| <field>
| <closed-call>
<index> := `[` <expression>+ `]`
<field> := `.` <identifier>
<closed-call> := `(` [ <out-argument>+ `;` ] <out-argument>* `)`
<out-argument> := <application>
| <identifier> `:` <application>
| `:(` <expression>+ `)`
| `::(` <expression>+ `)`
<value> := `(` <expression>+ `)`
| <function>
| <bytestring>
| <charstring>
| <identifier>
| <integer>
| <float>
| <dictionary>
| <array>
<function> := `{` [ `|` [ <in-argument>+ `;` ] <in-argument>+ `|` ]
<expression>+ `}`
<in-argument> := <identifier>
| <identifier> `:` <application>
| `:(` <identifier> `)`
| `::(` <identifier> `)`
<array> := `[` <application>* `]`
<dictionary> := `{` (<application> `:` <application>)* `}`
<left-open-call-indicator> := `@` | `.`
<right-open-call-indicator> := `.`
Notably, an identifier is any set of unicode characters, excluding the following set:
' ', '\n', '\r', '\t', ';', ',', '(', ')', '[', ']', '{', '}', '|', "'", '"', '.', ':', '@', and '#'
Further, an identifier can't start with a digit, and cannot be simply "=". This means that the only keyword in Pants is "=".
If I haven't gotten sick of trying to figure out Boost::Spirit::Qi (or C++ parser combinators in general) by the time you read this, you can find the full grammer in a strange C++ syntax in parser.cpp.
Update: I did get sick of it, here's the new parser.
Pants is simply a series of expressions. An expression can be either an assignment or an application. Expressions are separated by newlines or semicolons. An assignment is simply a variable name, a specific assignment operator (there's two, more on that in a minute), and a subexpression.
An application is where things get interesting.
An application, in the context of Pants, is a space-separated list of one or more terms. A term is a piece of syntax you're probably used to. All of the following examples are terms:
1
variable
object.field
array[index]
"hello world"
myfunction(1, 2)
{"key1": "value1"}
{|arg1, arg2| print(arg1, arg2)}
(some sub expression)
Notice that a term can be a value or something more complicated. Pants has a simple list of value types, but those value types include user-created objects, anonymous functions, strings, etc. Note that a dictionary and an anonymous function are similar in syntax. The following are dictionaries:
{}
{"key1": "value1"}
{1: 2, 3: "4"}
while the following are anonymous functions:
{"value"}
{print("value")}
{|x| print(x)}
{|x,y| print(y)}
If an application is made up of simply one term, the value of the application is just that term.
However, if an application is made up of two or more terms, then the application turns into an "open" function call.
There's two types of function calls in Pants - open calls, and closed calls. Closed calls are in the traditional syntax you're used to. Closed calls have a function value followed by a parenthetical list of arguments. Closed function calls are parsed as a single term, and examples of closed function calls are:
factorial(10)
ackermann(4,3)
print("hello")
socket.write(data, len)
An open call, in contrast, has no parenthetical argument list. In an application with two or more terms, in the absence of a call indicator (more on call indicators in a second), the first term is assumed to be a function, and the remaining terms are assumed to be arguments. As a result, the following is a list of examples of open calls:
factorial 10
ackermann 4 3
print "hello"
socket.write data len
This list of examples is semantically identical to the previous list of closed call examples.
An interesting caveat is that
do_stuff()
and
do_stuff
are not semantically identical. The first evaluates to the return value of the
do_stuff
function call, whereas the second evaluates to the do_stuff
function itself. Since the second do_stuff
example is not a two-or-more term
application, the value of the application is the single term it consists of.
This open-call behavior gives you the ability to write powerful and expressive
domain specific languages and language extensions. Recall how easy it is to
define an anonymous function. Pants implements the traditional if
keyword as
a regular library function that takes two (optionally 3) arguments. The
following two if
function calls are identical:
if (some_predicate(val1, val2)) {
print "true!"
}
if((some_predicate(val1, val2)), {print "true!"})
So far, we've only discussed functions that take arguments on the right side of the function call. In fact, most languages only support arguments on the right-hand-side of the function identifier. Pants, however, supports left arguments as well.
Recall that in the absence of a call indicator, a two-or-more term application assumes that the first term is the function to call. However, an application allows for one of the terms to have a call indicator specified. If a call indicator is specified, then the term with the call indicator is the function to call. A call indicator is either a trailing period, a preceding period, or a preceding at-symbol.
Suppose we wanted to support something like Ruby's unless
trailing keyword.
We want to use it in the following example:
{ print "we did it!" } @unless we_didnt
In this example, we are going to run the anonymous function that outputs
"we did it!" as long as the variable we_didnt
is false. Here, the at-sign is
a call-indicator, saying that in this 3-term application, unless
is the
function we want to run. We're going to provide the anonymous function as the
first left-argument and the value of we_didnt
as the first
right-argument.
So, we need unless
to be defined as a function that takes one left and one
right argument. Turns out, we can do that as follows:
unless = {|code_thunk; predicate|
if (not predicate) {
code_thunk()
}
}
Here, the semicolon in the argument definition list separates left-arguments and right-arguments.
Observant readers will note that we can rewrite unless
a little more simply as:
unless = {|code_thunk; predicate| if (not predicate) code_thunk}
Pants supports a whole slew of argument passing styles, such as arbitrary argument lists (both passing and receiving), default values in function definitions, keyword arguments, and a number of other things.
TODO
To explain the semantics of Pants, we'll start off with a simple program. Here,
we'll implement the standard while
loop, then add features as we need them,
such as break
and continue
.
Let's take a first stab at it.
while = {|test, body|
if test() {
body()
while(test, body)
}
}
myval = true
while {myval} {
print "hi"
myval := false
}
The definition of while
here is recursive, and works because Pants supports
tail-recursion. In other words, while
will loop because the while
is
calling itself, and due to tail-recursion, the stack doesn't grow and waste
memory.
You'll notice that in this example, test
is an anonymous function that
returns myval
. What happens if we wrote while
like this?:
# Broken implementation of while
while = {|test, body|
if test {
body()
while(test, body)
}
}
myval = true
while (myval) {
print "hi"
myval := false
}
Here, we've simply changed the definition of while
to not call test
like a
function, and we've changed the usage of while
to take myval
surrounded by
parenthesis instead of brackets.
So what's wrong? We set myval
to true
, and then call while
with it, so
while
gets true
for test
. We never actually re-evaluate myval
at any
point, so while
will run forever.
In the correct definition, we're passing an anonymous function as the first
argument to while
that returns whatever the current value of myval
is, so
at each loop iteration of while
, myval
gets re-evaluated when we call
test
inside of while
.
So, let's look at our usage of while
again:
myval = true
while {myval} {
print "hi"
myval := false
}
You may be wondering what's up with the two different kinds of assignment
operators being used with myval
. You also may recall that I briefly mentioned
before that there are two forms of assignment in Pants.
The first style of assignment is definition. When a variable gets defined,
it is introduced into the scope as a new variable, regardless of any previous
variables that have been defined. To specify a definition, the programmer uses
a single equals "=". If we only used a single equals to assign values to
myval
, we would get not one, but two distinct, different variables named
myval
. But we don't want to make a new myval
every loop iteration; we want
to mutate the previously defined myval
.
The second style of assignment is mutation and it is specified by a colon-equals, like ":=". When a variable is mutated, it must already be defined and in scope, and the value of that variable changes to whatever new value it is getting mutated to.
You may have noticed that there are a great deal of possible identifiers, with only one reserved identifier or keyword ("="). This means that the following are all completely valid identifiers, and thus, variables:
+, -, /, *, ^, !, ==, <, <=, >, >=, !=
Because Pants supports left-arguments, all of the operators you are used to are defined simply as functions and require call-indicators for infix notation. E.g.:
1 .+ 2
4 .- 2
1 .== 3
"hey" .!= "there"
All of the provided operators are defined to either accept one left and one right argument, two left arguments, or two right arguments, such that infix, prefix, and postfix notation are all supported.
With this in mind, the following codeblock is equivalent:
+(1, 2)
4 2 @-
== 1 3
"hey" !=. "there"
We've now covered enough semantics to implement something slightly more useful,
such as something like Ruby's each
method. Let's assume an iterable object
has a size method and supports index operations.
each = {|iterable; func|
i = 0
while {i .< iterables.size()} {
func(iterables[i])
i := i .+ 1
}
}
We can now use each
as follows:
[1, 2, 3] .each {|item|
print item
}
Okay, let's say we want to write an include?
function that checks an iterable
for some item, and returns true
if it exists, false
otherwise. We want to
use it like:
[1, 2, 3] .include? 2 # should be true
[1, 2, 3] .include? 4 # should be false
Let's give it a shot.
include? = {|iterable; item|
iterable .each {|possible_item|
if (possible_item .== item) {
return true
}
}
return false
}
Hold up cowboy, return
out of what function? There's three anonymous
functions there.
Because of this problem, Pants does not define a return
anywhere. However,
Pants defines a new variable called cont
(which stands for continuation,
more on those later) at the beginning of every function scope. You can save
this value in the function scope it makes sense to have and use it later.
So, let's try again:
include? = {|iterable; item|
return = cont
iterable .each {|possible_item|
if (possible_item .== item) {
return true
}
}
return false
}
Now this include?
function works, because return
returns out of the right
function.
You may be thinking, this cont
thing is magical! You're right.
Continuations are, as we just saw, essentially save-able wrappers of return statements.
Notably, if you have saved a continuation, you can call it at any point in time afterwards, and as many times as you want. Check out this mind-bending example:
saved_cont = null
myfunction = {
return = cont
saved_cont := return
return 1
}
val = myfunction()
print val
if (val .== 1) {
saved_cont 2
}
The output of this code example is:
1
2
What is going on here? When myfunction
is called, it saves its continuation,
and then returns 1. We store the 1 in val
, print val
, and then check the
value of val
. If it's a 1, we call the continuation. So far so good.
What happens when we call the saved continuation is that we fall out of the
original call to myfunction
with a return value of 2.
So, in this alternate universe we just found ourselves in, where myfunction
returned 2, we store the 2 in val
, print val
, and check if val
is 1.
It's not, so we're done.
You can read more about continuations at Wikipedia's continuation entry.
You may be aware of the distinction between dynamic and lexical scoping. If not, suffice it to say that every mainstream programming language uses lexical scope. Let's write a motivating example:
x = 1
f1 = { print x }
f2 = {
x = 2
f1()
}
In lexical scoping, when a variable is encountered, its value is determined
based on where it was last defined, lexically, in the source file. In the
above example, with lexical scoping, f2
will output 1.
In dynamic scoping on the other hand, when a variable is encountered, its
value is determined based on the call stack. In the above example, with dynamic
scoping, f2
will output 2.
There are pros and cons to both types of scoping, but it's clear that over decades of programming language research, lexical scoping is far better, clearer to understand, and easier to reason about. If you're interested in learning more about various types of language scoping rules, check out Wikipedia's scope entry.
Pants is lexically scoped.
However, Pants also implements dynamic scoping; it just doesn't give you
direct access to it. The only way to access dynamically scoped variables is
through lexically-scoped proxy objects that you construct with the
DynamicVar
object constructor. And in case you're wondering, we'll talk more
about objects later.
Whenever you call the DynamicVar
object constructor, a completely brand new
dynamic variable is created for you that has not been used in your program
before. Here's an example usage of DynamicVar
which we'll walk through in a
second:
my_dv = DynamicVar()
my_dv.call "hello, dynamic scope" {
print my_dv.get()
}
First, DynamicVar
is in fact a zero-argument function that returns an object.
We save this dynamic variable proxy object in a lexical variable, my_dv
.
Second, dynamic variables are set using the call
member function. The call
member function takes two arguments. First, it takes the value to set the
dynamic variable to, and second, it takes a block of code to run in which the
dynamic variable is set to that value.
Last, in any scope, you can retrieve the value of a dynamic variable by calling
the get
member function on the proxy object. If the dynamic variable isn't
defined in your current call stack, an exception will be raised (more on
exceptions in a minute).
This allows you, along with continuations, to do some really interesting things.
Let's rewrite while
so that we can have break
and continue
control
structures.
First, the code, and then an explanation.
while_dv = DynamicVar()
while = {|test, body|
if test() {
should_continue = {
fall_out_of_should_continue = cont
while_dv.call fall_out_of_should_continue {
body()
true
}
}
if should_continue() {
while(test, body)
}
}
}
break = { while_dv.get()(false) }
continue = { while_dv.get()(true) }
The overall strategy here is that within a while
loop, we will define a
dynamic variable that will allow for popping out of the loop using a
continuation, which will take a value as to whether or not the loop should
continue with the next iteration. We'll then define break
and continue
to
use that dynamic variable appropriately.
So, first, we instantiate a new dynamic variable proxy object. It's going to
be used by while
, break
, and continue
, so we define it first.
Next, we define while
. As before, while
still works recursively, and, as
before, the base case is to do nothing and just return if the test
evaluates
to false
.
In the case that test
is true, though, we're going to run one loop iteration.
We want some function that will return true
if we should continue and loop
again, and will return false
if we need to stop. That function is
should_continue
.
should_continue
is very first going to save its own continuation. The
continuation of should_continue
is then stored in the dynamic variable, and
the body
of the loop is called. If body
returns without incident, we want
to loop again, so the last statement of the block is true
. The last statement
is then propagated up as should_continue
's return value.
However, if for some reason the continuation of should_continue
is called
from within body
, the rest of the body
will not run, and that value will
fall out of should_continue
as the return value.
So, we have all of the pieces in place. We'll see if test
is good. If it is,
we'll run body
. If body
completes successfully, we'll loop again. If body
does not complete, it is due to a continuation being called, which will be due
to break
or continue
. break
will return false
, meaning we should not
loop again, whereas continue
will return true
.
We've just implemented a fancy control structure completely within the language. Let's see how it works:
while {true} {
print "still running"
if (all_done()) { break() } # could also be: if all_done() break
}
At this point, you may be wondering why, if dynamic scope can be so useful, why no currently popular language uses or supports it. The truth is nearly all modern languages support a very specialized use-case of dynamic scope: exceptions.
If you think about it, exceptions are simply an exception handler data
structure stored in dynamic scope. Whenever you start a try
-block, a new
exception handler is stored in dynamic scope for the exception handler dynamic
variable, and all throw
or raise
does is call it.
With the addition of continuations, we can completely implement exceptions within Pants as well.
throw_dv = DynamicVar()
try = {|body, handler|
fall_out_of_try = cont
throw_dv.call {|error| fall_out_of_try handler(error)} {
body()
}
}
throw = {|error| throw_dv.get()(error)}
If you understood while
, break
, and continue
, try
and throw
should
actually look pretty straightforward to you.
Pants does a little bit of extra work to make exceptions nice and useful -
specifically, Pants keeps track of throw_dv
and uses it at runtime for a
number of runtime exceptions, such as array-out-of-bounds, invalid function
calls, missing object fields, etc, etc.
Let's see how exceptions work:
try {
{ throw "can't divide by zero!" } @unless (a .> 0)
print (b ./ a)
} {|e|
print "exception!" e
}