Thinking Functionally: Combinators

Paul Louth edited this page May 17, 2018 · 13 revisions

Combinators

The word "combinator" is used to describe functions whose result depends only on their parameters. That means there is no dependency on the outside world, and in particular no other functions or global value can be accessed at all.

In practice, this means that a combinator function is limited to combining its parameters in various ways.

We have seen some combinators already: the compose function. On the other hand, a function like Console.WriteLine, although primitive, is not a combinator, because it has a dependency on the outside world (I/O).

Combinator birds

Combinators are the basis of a whole branch of logic (naturally called "combinatory logic") that was invented many years before computers and programming languages. Combinatory logic has had a very large influence on functional programming.

To read more about combinators and combinatory logic, I recommend the book "To Mock a Mockingbird" by Raymond Smullyan. In it, he describes many other combinators and whimsically gives them names of birds. Here are some examples of some standard combinators and their bird names:

/// Identity function, or the Idiot bird
static A I<A>(A x) => x;

/// The Kestrel
static Func<B, A> K<A, B>(A x) => (B y) => x;

/// The Mockingbird
static Func<A, A> M<A>(Func<A, A> x) =>
    a => x(x(a));

/// The Thrush
static Func<Func<A, B>, B> T<A, B>(A x) => (Func<A, B> y) =>
    y(x);

/// The Queer bird
static Func<Func<B, C>, Func<A, C>> Q<A, B, C>(Func<A, B> x) => (Func<B, C> y) => (A z) =>
        y(x(z));

/// The Starling
static Func<Func<A, B>, Func<A, C>> S<A, B, C>(Func<A, B, C> x) => (Func<A, B> y) => (A z) =>
    x(z, y(z));

/// The infamous Y-combinator, or Sage bird
static Func<A, B> Y<A, B, C>(Func<Func<A, B>, A, B> f) => (A x) =>
    f(Y<A, B, C>(f), x);

The letter names are quite standard, so if you refer to "the K combinator", everyone will be familiar with that terminology.

It turns out that many common programming patterns can be represented using these standard combinators. For example, the Kestrel is a common pattern in fluent interfaces where you do something but then return the original object. The Thrush is the pipe operation, the Queer bird is forward composition, and the Y-combinator is famously used to make functions recursive.

Indeed, there is a well-known theorem that states that any computable function whatsoever can be built from just two basic combinators, the Kestrel and the Starling. If you have time to spare I highly recommend adding this to your reading material.

Combinator libraries

A combinator library is a code library that exports a set of combinator functions that are designed to work together. The user of the library can then easily combine simple functions together to make bigger and more complex functions, like building with Lego.

A well designed combinator library allows you to focus on the high level operations, and push the low level "noise" to the background. We've already seen some examples of this power in the examples in "why use language-ext" series, and the List module is full of them -- the fold and map functions are also combinators, if you think about it.

Another advantage of combinators is that they are the safest type of function. As they have no dependency on the outside world they cannot change if the global environment changes. A function that reads a global value or uses a library function can break or alter between calls if the context is different. This can never happen with combinators.

In language-ext there is a massive combinator library are available for parsing: the LanguageExt.Parsec library. The Parsec library is almost an exact replica of the Haskell Parsec library, it can be used to parse very simple blocks of text up to entire language parsers. The type of the combinator is Parser<A> and it is also a monad, which we'll discuss later, just accept for now that it means we can use it with LINQ.

The key thing to note from this is that the difference between a parser that parses a single character, and one that parses an entire source-file and creates an abstract-syntax-tree (AST) for a programming language is its generic argument: Parser<char> and Parser<SourceFile>. To get to the stage of parsing a SourceFile you must combine smaller parsers until you have one big parser that does a ton of stuff.

Parsec has a lot of functionality spread throughout a number of namespaces and static classes. So upfront just include these:

using LanguageExt;
using LanguageExt.Parsec;
using static LanguageExt.Prelude;
using static LanguageExt.Parsec.Prim;
using static LanguageExt.Parsec.Char;
using static LanguageExt.Parsec.Expr;
using static LanguageExt.Parsec.Token;

You won't need all of that, but it's useful to know they're there:

    var parser = ch('a');   // Create a Parser<char>
    var result = parse(parser, "abcde");

    // result.IsFaulted == false
    // result.Reply.Result == "a"
    // result.Reply.State.ToString() == "bcde"

This parses the first letter 'a' from the string. Simple enough. But what if we wanted to parse any letter?

    var parser = satisfy(Char.IsLetter);   // Create a Parser<char>
    var result1 = parse(parser, "abcde");
    var result2 = parse(parser, "fghij");
    var result3 = parse(parser, "klmno");

    // result1.Reply.Result == "a"
    // result2.Reply.Result == "f"
    // result3.Reply.Result == "k"

So far so good. satisfy(Char.IsLetter) already has a primitive parser in the LanguageExt.Parsec.Char class called letter.

    var parser = letter; 
    var result = parse(parser, "abcde");

How do we keep parsing all of the letters?

    var parser = many1(letter);          // Create a Parser<Seq<char>>
    var result = parse(parser, "abcde");

    // result.IsFaulted == false
    // result.Reply.Result == ['a', 'b', 'c', 'd', 'e' ]
    // result.Reply.State.ToString() == ""

This is our first composed combinator. The many1 parser takes an argument of Parser<A> and combines it to make a Parser<Seq<char>>. It encapsulates the rules of many1 (which is to read as many A as possible, but at least 1).

What about if we have a non-letter in the string?

    var parser = many1(letter);          // Create a Parser<Seq<char>>
    var result = parse(parser, "two words");

    // result.IsFaulted == false
    // result.Reply.Result == ['t', 'w', 'o']
    // result.Reply.State.ToString() == " words"

It still works, which is good. But we'd now like to parse the words, which means we need to deal with the spaces.

First lets create a parser for white-spaces:

    var spaces = many(satisfy(System.Char.IsWhiteSpace));

We use many rather than many1. The idea is that this parser always succeeds, even if we can't parse anything.

Now let's create a word parser that parses the letters of a word and consumes any additional white-space:

     var word = from w in many1(letter)
                from s in spaces
                select w;

This is where the monad part of the parser-combinator comes in, we can use them in LINQ expressions and they'll deal with all failure cases automatically. Notice how s contains the additional spaces after the word, but it ignores that and only returns the w (which is a Seq<char>). That means each time we parse a word we drop the spaces at the end of the word, which should then mean the next character to consume is a non-space character.

Now we combine word parser with the many1 parser to parse 'many words'

    var parser = many1(word);          // Create a Parser<Seq<Seq<char>>>
    var result = parse(parser, "two words");

Remember these are just combinations of smaller parsers into bigger parsers, but the signature is always: Parser<_>.

So the last parser Parser<Seq<Seq<char>>> is a little unwieldy, so lets change the inner Seq<char> to a string. We need to update the word parser to do that:

     var word = from w in asString(many1(letter))
                from s in spaces
                select w;

That gives us a result of Seq<string> which is what we want.

What about leading spaces?

     var parser = from sp in spaces
                  from ws in many1(word)
                  select ws;

Notice how we're reusing our spaces parser, and everything just glues together perfectly. This is the power of combinators, everything 'just works' because it's parser-in, parser-out.

Hmm, can we go further. What about parsing words and numbers? (but not words with numbers in, or numbers followed by a letter). Let's build the parser for a number first:

     var number = from d in asString(many1(digit))
                  from s in spaces
                  select Int32.Parse(d);

Here we're using the built-in digit parser (which is simply: satisfy(Char.IsDigit)), converting to a string and then calling Int32.Parse(d) if we successfully parse a series of digits.

So now we need to be able to parse either a word or and number. Combinators to the rescue again!

    var term = either(word, digit);

either is a parser that will try the first parser, and if it can't parse then it will try the second. The problem is that word and digit are Parser<string> and Parser<int> respectively. So we need a common type to represent them:

    abstract class Term { }

    class Word : Term
    {
        public readonly string Value;
        public Word(string value) => Value = value;
    }

    class Number : Term
    {
        public readonly int Value;
        public Number(int value) => Value = value;
    }

Now let's update the word and number parsers:

    var word = from w in asString(many1(letter))
               from s in spaces
               select new Word(w) as Term;

    var number = from d in asString(many1(digit))
                 from s in spaces
                 select new Number(Int32.Parse(d)) as Term;

Now the either will work:

    var term = either(word, digit);

And we can update our parser to use the term parser:

     var parser = from sp in spaces
                  from ws in many1(term)
                  select ws;

That gives us a parser that can parse a string of text and categorise the values. We've come a long way from the ch('a') parser that only parses a single character, but are still working with the Parser type. In this case: Parser<Seq<Term>>

Let's continue. The from s in spaces is repeated for word and number. If we added more terms then we're in a non-DRY place. So let's make it dry by declaring a local function:

    Parser<A> token<A>(Parser<A> parserA) =>
        from res in parserA
        from spc in spaces
        select res;

This may start to give you an insight into what Parser<A> is. Yes, that's right, it's just a function. This is its definition:

    public delegate ParserResult<T> Parser<T>(PString input);

We're just composing functions as in the previous articles.

So with the token parser we can now rewrite the word and number parsers:

    var word = from w in token(asString(many1(letter)))
               select new Word(w) as Term;

    var number = from d in token(asString(many1(digit)))
                 select new Number(Int32.Parse(d)) as Term;

Let's test it:

    var result = parse(parser, "   4 words are here");

    // result.Reply.Result = [Number, Word, Word, Word]

Perfect, that's what we want. What about if we remove the space after the 4:

    var result = parse(parser, "   4words are here");

    // result.Reply.Result = [Number, Word, Word, Word]

Gah! What's happening here is our spaces parser which can parse 0 or more spaces is succeeding. But really we want to have at least one space. We could change spaces to:

    var spaces = many1(satisfy(System.Char.IsWhiteSpace));

But that would mean the token parser would fail on the last word "here" because there isn't a space after it. So what can we do?

    var spaces = either(
        eof,
        many1(satisfy(System.Char.IsWhiteSpace)).Map(_ => unit)
    );

Lots of things going on here. We've changed the spaces parser to parse either:

  • An end of file
  • Or, many but at least one space. We then map the Seq<char> which are the spaces to a Unit to match the eof parser signature of Parser<Unit>.

You may remember that Unit is a type that can have exactly one value: unit. This is a perfect example where it comes in useful: we care that we've parsed spaces, but we don't need to see them. So Parser<Unit> is a little bit like a void parser. It does some work, but you don't get to see the results.

Anyway, the spaces parser now will accept a series of white-space characters or the end-of-file token.

If we run it with our bad text from before we get:

    var result = parse(parser, "   4words are here");

    // result.IsFaulted = true
    // result.Reply.Error = "error at (line 1, column 5): unexpected w, expecting digit or end of input"

Wow! Right?! We're getting not only the error, but what was expected and where it occurred. This is powerful stuff!

If we restore our text then we get:

    var result = parse(parser, "   4 words are here");

    // result.IsFaulted = false
    // result.Reply.Result = [Number, Word, Word, Word]

Everything is good in the world!

NEXT: Function signatures

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.