Lab for experimenting with polymorphism mechanisms in Clojure.
More (better) docs on algebras in doc.
homonymy
Mechanisms:
-
delegation: roll-your-own
-
multimethods
-
Clojure Protocols
-
multimodels
What’s the difference between Protocols and multimodels? Multimodels support dynamic rebinding. Protocols are fixed; once you extend a type to support a protocol the binding is fixed; to swap an implementation, you have to dispatch on a different type. You can mimic true multimodeling with the right technique (core.matrix does this), but the technique requires mutable global variables. Below we demonstrate a method that naturally supports multimodeling without mutable globals.
This assumes you know how to use Cider in emacs. If you use some
other development environment you should be able to figure out what to
do. The main example is src/clojure/multimodels/examples.clj
.
-
Fork/clone
-
lein repl
, then dorefresh-all
(see dev/user.clj for config stuff)
Fire up a cider
session, open multimodels/examples.clj, and load it
(C-c C-k). At the bottom of that file are examples of computing with
algebraic models.
Then explore. Several algebras are provided; using them as a pattern it should be easy to implement your own.
-
monoid N0: (N, 0, +), underlying set is Natural numbers with zero, identity element is 0, operation is addition
-
monoid N1: (N, 1, *), underlying set is positive Natural numbers, identity element is 1, operation is multiplication
-
Q3+: quotient group mod 3 over integers, identity is 0, op is addition
user => (register-struct (algebra.struct.N0.))
user => (register-struct (algebra.struct.N1.))
user => (register-struct (algebra.struct.Q3+.))
user => (set-model :n0)
user => (** 3 5) ;; we use ** as the abstract Group multiplication operator
8
user => (set-model :n1)
user => (** 3 5)
15
user => (set-model :quotient-3)
user => (** 3 5)
2
You use defprotocol
to declare a protocol. Note that this "defines"
a protocol object but only "declares" its operators. Furthermore
defprotocol
is one of those pseudo-functions, like def
, whose main
purpose is a side-effect; def
has the side-effect of interning a var
(thus altering the environment), and defprotocol
does something
similar: it constructs a map representing the protocol and interns a
var constructed using the current namespace and the name parameter of
the declaration. Instead of returning this map as the "value",
defprotocol interns it and returns the var.
You can inspect the (side-) result of defprotocol
:
user=> (require '(clojure [pprint :as pp]))
user=> (def p (defprotocol P "test protocol" (f [t])))
#'user/p
user=> (type p)
clojure.lang.Symbol
user=> (pp/pprint p)
P
nil
user=> (pp/pprint P)
{:on user.P,
:on-interface user.P,
:doc "test protocol",
:sigs
{:f1 {:name f1, :arglists ([t]), :doc nil},
:f2 {:name f2, :arglists ([t a]), :doc nil}},
:var #'user/P,
:method-map {:f1 :f1, :f2 :f2},
:method-builders
{#'user/f2
#object[user$fn__14240 0x39f583af "user$fn__14240@39f583af"],
#'user/f1
#object[user$fn__14253 0x19811f83 "user$fn__14253@19811f83"]}}
nil
user=>
First off, notice that we’re looking at a map that is the value of the
var #'user/P
: user⇒ (= (var P) (:var P))
. If you deref (:var
P)
you’ll see the same map you get by printing P.
Notice the :method-builders
key: its value is a map from vars to
objects. From the names we can infer they are function objects, but
let’s confirm that:
user=> (clojure.test/function? (get-in P [:method-builders #'user/f1]))
true
The form of the method-builder vars tells use that in fact there is
another side-effect of executing defprotocol
: it also interns the
functions of the protocol in the current environment. To see the
evidence, run (dir user)
in a fresh repl to see the list of interned
vars; then execute defprotocol
, then run (dir user)
again; you
should see the protocol symbol and the protocol functions listed as
interned vars.
Yet another side-effect: defprotocol
generates a Java Interface. From the docstring:
defprotocol will automatically generate a corresponding interface, with the same name as the protocol, i.e. given a protocol: my.ns/Protocol, an interface: my.ns.Protocol. The interface will have methods corresponding to the protocol functions, and the protocol will automatically work with instances of the interface.
NB: more precisely: defprotocol will call gen-interface, which will define the corresponding class (using java.lang.ClassLoader.defineClass). Under aot compilation, it will also write the bytecode to disk.
Let’s have a look:
user=> (type user/P)
clojure.lang.PersistentArrayMap
user=> (require '(clojure [reflect :as r]))
user=> (type user.P)
java.lang.Class
user=> (pp/pprint (r/reflect user.P))
{:bases nil,
:flags #{:interface :public :abstract},
:members
#{{:name f2,
:return-type java.lang.Object,
:declaring-class user.P,
:parameter-types [java.lang.Object],
:exception-types [],
:flags #{:public :abstract}}
{:name f1,
:return-type java.lang.Object,
:declaring-class user.P,
:parameter-types [],
:exception-types [],
:flags #{:public :abstract}}}}
Make a note of this example using pprint and clojure.reflect to inspect the side-effect products of defprotocol - the Clojure protocol map and the Java Interface.! It will come in very handy the first time you run in to the dreaded "No single method" exception (see below, troubleshooting.)
Contrast this with deftype
, which "[d]ynamically generates compiled
bytecode for class with the given name, in a package with the same
name as the current namespace…" but does not intern a var for the
type in the current namespace.
Algebra is where logic meets mathematics. Loosely speaking, an algebra is the marriage of a signature (which is a formal logical calculus) and a structure (which is an informal mathematical "object'); what ties them together is a model, which uses the mathematical structure to interpret the linguistic expressions formable using the signature.
What this implies is that many different structures may be used to model a given signature or language. The classic example, which is implemented in this project, is the algebraic *Group*. Groups are extremely simple; they have an underlying set, one distinguished element of that set that acts as an identity element, and one binary operation; additionally, some basic rules about how the operation works (e.g. a*b=b*a). Infinitely many mathematical structures may behave as Groups. The textbook examples are: (N,0,+), where the set is Nat (with zero), the identity element is 0, and the operation is addition, and (N+,1,*), where the set is the positive Natural numbers, the identity element is 1, and the operation is multiplication. The structures are obviously not the same, but as Groups they behave in exactly the same way.
The relevance of such an algebraic perspective to programming is pretty obvious, even though it is not often explicitly noted. The distinction between signature and structure is analogous to the one between interface and implementation. If you design well, you can swap implementations of an interface without changing the behavior of the system, e.g. going from a hashmap to an arraymap.
One of the beauties of Clojure’s Protocols is that they make it relatively easy to work in this manner.
Clojure’s Protocols only include functions; algebras will always or at least usually include some constant symbols (like the digits 0-9), just as the underlying structures will contains some constant "values" like the natural numbers. So strictly speaking we should think of Protocols as analogous to the "operator signature" of an algebra, i.e. the subset of a signature consisting of all the function symbols.
Once you have a signature, (a Clojure Protocol definition), you need to relate it to a structure in order to use it to express anything meaningful. Mathematically this involves specifying an "interpretation", which is just a mapping from symbols in the signature to values in the structure; I’m calling this a model. Technically it’s a little more complicated than that but the basic concepts of signature, structure, and model seem to be pretty straightforward, and they match actual mathematical practice and terminology pretty well.
So we think of a Clojure Protocol as a Signature (an "OpSig" or
operater signature, to be more precise), and we think of the code we
write to implement the operators in the signature as a structure.
To bind the two together, we use Clojure’s extend
macro, which does
precisely what we need: expresses a mapping from signature to
structure, or, in more programming-oriented language, from interface
to implementation.
However, to really make this work - to make it possible to switch from
one model to another (swap implementations) - you need more than just
defprotocol
and friends.
One implementation trick, which I learned from core.matrix, is to exploit the fact that Clojure’s Protocol mechanism dispatches function calls on the first arg. Knowing this, we just parameterize operation calls with a first arg whose sole purpose is to determine dispatch - the "content" of the arg is irrelevant. Of course to do this you have to intercept the call in the first place, and then decide which type to use for dispatch. For that you keep a var; changing the var effectively switches from one model (interpretation of the signature) to another, by changing the dispatch parameter.
One reason I wrote this little app is to have a clean and simple expression of the technique used by core.matrix. I had to study that code pretty hard before the technique stood out from implementation details. I don’t mean the code is bad or hard to read, I mean it’s mixed up with the details of implementing core.matrix, so I wanted something purely focussed on demonstrating the technique with minimal extra stuff. So that I’ll be able to return to it in six months, after I’ve forgotten everything about core.matrix, and so that others can learn the technique independent of matrix stuff. Also, I wanted to highlight the algebraic structure of the technique, which I’ve tried to do by using the algebraic terminology of signature, structure, and model, and organizing the code accordingly.
The way I do it here is slightly different than the way core.matrix does it. I use a default implementation (defined on java.lang.Object) to manage dispatch. So calls to Protocol functions are always sent to the default Object implementation, which checks to see what the current model (implementation) is, and rewrites and forwards the call as required. core.matrix uses a separate API "wrapper" namespace to do this, before calls reach the Protocol interface. That approach has the virtue of separating the user interface from the Protocol interface, but that is also a vice, since it means you have to keep them in sync. I decided to use default Object as dispatcher in order to ensure that the user API always matches the Protocol signature. And also just to experiment. I don’t know which technique is preferable.
from map to foo-map
This is a map: {:a 1}
This is a foo-map: {:foo 0, :a 1, :b 2}
This is a foo-vector: [:foo 1 2 3]
A foo-list: '(:foo 1 2 3)
Clojure’s Protocol mechanism
(together with, say,
Prismatic Schema and/or
core.typed) allow us to
treat these as distinct types. Since these types depend on a
particular data value - :foo
- they thereby emulated dependent
types.
Another example: type VecInt4 - integer vectors of length 4. We
start with a function f
that operates on vectors:
(defn f [^PersistentVector v] ...do something with v...)
We want a function that only operates on integer vectors or length 4.
We can easily do this by writing f
as a dispatch function that
inspects its argument at runtime and then forwards the call to an
appropriate implementation function. If f
receives an argument that
is not a vi4 datum, it will throw an exception; otherwise, it will
pass it to the implementation function, call it vi4-f
.
A better way would be to use a multimethod. The same thing happens, but in this case Clojure’s built-in dispatching mechanism for multimethods will take responsibility for routing the call to the appropriate implementation function. Using a multimethod means we don’t have to give our implementation function a distinct name - it will have the same name as the dispatch function, and Clojure will take care of the housekeeping.
In both these approaches, there is only one type involve:
PersistentVector
. Protocols allow us to treat VecInt4 as a distinct
type.
(deftype VecInt4 )
(defprotocol PVecInt4
(f ...))
(extend VecInt4
PVecInt4
...)
"No single method: M of interface: I found for function: F of protocol: P"
Note the reference to by interface and protocol: it’s going from function-in-protocol (Clojure) to method-in-interface (Java). The interface is generated at runtime by defprotcol.
"IllegalArgumentException No implementation of method: M of protocol: #'P found for class: K"
Self-explanatory.
Useful tools:
-
tools.namespace
-
clojure.reflect
-
clojure.pprint
-
iroh? "a library to inspect, manipulate and game the jvm."