# Rankle

_N.B. See the [Jupyter Notebook](https://jupyter.org/) documentation for information on using this document interactively._

Rankle is a Clojure library with experiments involving array programming and specifically the [J](http://jsoftware.com) language's implementation of _rank_.

If you just want to read this document, skip straight to [Rank in J](#Rank-in-J).

If you want to run the cells in this document, locally you need:

* [Clojure](https://www.clojure.org/guides/getting_started)
* [J](http://jsoftware.com/start.htm)

This Jupyter notebook requires the [clojupyter kernel](https://github.com/semperos/clojupyter) to run (note I use a fork that updates the version of Clojure used by the kernel), and expects you to have defined a `J_INSTALL_DIR` environment variable where it can find your local J installation.

## Table of Contents

1. [Helpers](#Helpers)
1. [Rank in J](#Rank-in-J)
1. [Rank in Rankle](#Rank-in-Rankle)

## Helpers

In [1]:
(ns rankle.ipynb
  (:require [cemerick.pomegranate :as pg]
            [clojure.string :as str]
            [clojure.java.shell :as sh]))

(pg/add-classpath "./src")

(require '[com.semperos.rankle :as r])

### J Execution

J code can be executed by passing it over STDIN to the `$J_INSTALL_DIR/bin/jconsole` program. This document provides Clojure functions to make it trivial to execute J code from these Clojure notebook cells, assuming you first setup the `$J_INSTALL_DIR` environment variable and run `jupyter-notebook` such that it sees that definition.
    
The `j` function takes a Clojure string representing a J program, executes it with J, and prints the results.
    
The `jdebug` function first prints the Clojure string of J code, then runs J's own `;:` ("words") verb which shows how J parses the input, and then runs `j` on the input. This is especially useful when the encoding of the J program in the Clojure string requires escape characters).

In [2]:
(def J_INSTALL_DIR "J_INSTALL_DIR")
(def j-install-dir (System/getenv J_INSTALL_DIR))

(if-not (str/blank? j-install-dir)
  (def j-cmd (str j-install-dir "bin/jconsole"))
  (throw (ex-info (str "You must set a " J_INSTALL_DIR " environment variable with the full path "
                       "to the directory of your J installation.")
                  {:env-var-not-found J_INSTALL_DIR})))

(defn j
  "Invoke a J program and print its output."
  [jcode]
  (letfn [(wrap [s] (str s "\n exit''"))
          (deindent [s] (subs s 3))]
    (println (deindent (:out (sh/sh j-cmd :in (wrap jcode)))))))

(defn jdebug 
  ([jcode] (jdebug jcode nil))
  ([jcode {:keys [parse?]
           :or {parse? false}}]
   (println "⌁ J INPUT ⌁")
   (println jcode)
   (println)
   (when parse? (j (str ";: '" jcode "'")))
   (println "⌁ J OUTPUT ⌁")
   (j jcode)))

(jdebug "<\"0 i. 3 3")

⌁ J INPUT ⌁
<"0 i. 3 3

⌁ J OUTPUT ⌁
┌─┬─┬─┐
│0│1│2│
├─┼─┼─┤
│3│4│5│
├─┼─┼─┤
│6│7│8│
└─┴─┴─┘
   


## Rank in J

J features the notion of [rank](https://en.wikipedia.org/wiki/Rank_(J_programming_language) to describe the dimensionality of J data and how its operations behave along those dimensions. Rank is fundamental to J's power and expressivity. Since a complete primer of the J programming language is outside the scope of this document, please refer to J's extensive online documentation as well as the labs and studio examples provided as part of its installation on your system.

The rank of a J noun (data) is the number of its dimensions. Scalar values have rank 0; a flat array (a list) has rank 1; a two-dimensional array (a table) has rank 2; and so on.

J verbs (functions that only return nouns) also have rank:

> When you write a verb, it has a _verb rank_ which tells the maximum rank of operand(s) the verb can handle. Any operand of higher rank is automatically chopped up into _cells_ whose rank does not exceed the rank of the verb. The results of applying the verb on the cells are collected into the final result of the verb.

To understand what benefits a first-class concept of rank could provide to a Clojure program, this section demonstrates rank in terms of J's implementation. See the [Rank in Rankle](#Rank-in-Rankle) section for Clojure equivalents provided by Rankle.

### Rank of Data (Nouns)

Programmatically, the rank of data is the count of its shape, or expressed differently, the number of its dimensions. In J, the verb [`#` ("tally")](https://code.jsoftware.com/wiki/Vocabulary/number#monadic) returns the count of top-level [items](https://code.jsoftware.com/wiki/Vocabulary/AET#Item) in its operand, while [`$` ("shape of")]((https://code.jsoftware.com/wiki/Vocabulary/dollar#monadic) returns the full shape of its operand, i.e. it returns an array of the counts of its operand's items, starting at the top-level and traversing nested arrays exhaustively.

Follow the code examples below to understand how `#` and `$` work:

In [3]:
;; Count of array of 4 numbers,
;; returning the scalar value 4
(j "# 1 2 3 4")

4
   


In [4]:
;; Shape of array of 4 numbers,
;; returning an _array_ of 1 item, the number 4.
;;
;; J's printed representation of a single-item array,
;; unfortunately, is indistinguishable from a single
;; scalar value.
(j "$ 1 2 3 4")

4
   


In [5]:
;; A 2-by-3 table:
(j "1 2 3 ,: 4 5 6")

1 2 3
4 5 6
   


In [6]:
;; Count of 2-by-3 table,
;; returning the scalar value 2,
;; since the rows are the table's top-level items.
(j "# 1 2 3 ,: 4 5 6")

2
   


In [7]:
;; Shape of 2-by-3 table,
;; returning an array of counts
;; starting at the top level of the data.
;; It has 2 rows, each with 3 items.
(j "$ 1 2 3 ,: 4 5 6")

2 3
   


In [8]:
;; Programmatic definition of the rank of this 2-by-3 table,
;; indicating this table is of rank 2 because its shape
;; consists of two items.
(j "# $ 1 2 3 ,: 4 5 6")

2
   


In [9]:
;; The rank of a simple array is 1,
;; because its shape is a list with
;; a single item.
(j "# $ 1 2 3")

1
   


In [10]:
;; The rank of a scalar is 0, because
;; its shape is an empty list.
(j "# $ 1")

0
   


### Rank of Functions (Verbs)

J verbs have a default rank in their monadic (one argument/operand) and dyadic (two arguments/operands) cases that provide maximally-useful semantics. To quote again from J's wiki:

> Most verbs have as small a rank as possible to take maximum advantage of \[the fact that J automatically chops up operands into cells whose rank does not exceed the rank of the verb\]. For example, the arithmetic verbs, like +, have rank 0 because they can operate on atoms individually.

Since `+` in J has rank `0`, rather than having to write explicit loops or use a function like Clojure's `map`, by default J will apply `+` to every item in its argument, whether that argument be a scalar, a 1-dimensional array, or an N-dimensional array. For example:

In [11]:
(j "1 + 10 11 12")

11 12 13
   


In [12]:
(j "10 100 1000 + 3 4 5")

13 104 1005
   


In [13]:
;; Array of rank 2:
(j "100 * i. 4 5")

   0  100  200  300  400
 500  600  700  800  900
1000 1100 1200 1300 1400
1500 1600 1700 1800 1900
   


In [14]:
;; Array of rank 3:
(j "i. 4 5 2")

 0  1
 2  3
 4  5
 6  7
 8  9

10 11
12 13
14 15
16 17
18 19

20 21
22 23
24 25
26 27
28 29

30 31
32 33
34 35
36 37
38 39
   


In [15]:
;; Adding the rank 2 and rank 3 arrays.
;; This is possible because both share a common frame,
;; i.e. their first axes match.
(j "(100 * i. 4 5) + i.4 5 2")

   0    1
 102  103
 204  205
 306  307
 408  409

 510  511
 612  613
 714  715
 816  817
 918  919

1020 1021
1122 1123
1224 1225
1326 1327
1428 1429

1530 1531
1632 1633
1734 1735
1836 1837
1938 1939
   


See [J's wiki page on agreement](https://code.jsoftware.com/wiki/Vocabulary/Agreement) for a detailed explanation of the more complex examples presented above.

J exposes the ability to change a verb's rank when invoked. The [`"` (rank conjunction)](https://code.jsoftware.com/wiki/Vocabulary/quote) expects a verb as its left operand and a rank number as its right, returning a new verb that behaves like the original but along the specified rank.

To investigate the ranks of verbs, we will use the [`b.` ("verb info")](https://code.jsoftware.com/wiki/Vocabulary/bdotu) adverb which shows diagnostic information about verbs. With `0` as its right-hand operand, this returns a 3-integer list of:

1. The rank of the verb when used monadically (i.e., the rank of the y-argument)
1. The rank of the x-argument when used dyadically
1. The rank of the y-argument when used dyadically

In J's parlance, monadic means "with one argument" and dyadic means "with two arguments." J verbs only support monadic and dyadic arities.

In [16]:
;; The default rank of + is 0 0 0, i.e.:
;; operates on individual atoms of x and y,
;; producing a result of the same shape
(j "+ b.0")

0 0 0
   


In [17]:
;; The / ("over") adverb causes its left operand (a verb)
;; to be inserted in-between items of its right operand
;; (here, an array). It results in a verb which operates
;; on its x and y arguments as a whole, i.e. rank infinity.
(j "+/ b.0")

_ _ _
   


In [18]:
;; So by default, this sums the items of a simple list:
(j "+/ 1 2 3 4 5")

15
   


In [19]:
;; A 2-by-5 table will demonstrate what happens
;; when we use +/ with different explicit ranks.
(j "i. 2 5")

0 1 2 3 4
5 6 7 8 9
   


In [20]:
;; The default rank is infinite, which means the verb created
;; by the combination of +/ works on its whole argument by default.
;; With this in mind, it will put + in between top-level items of the 2-by-5
;; table, i.e., it puts + between the sub-arrays that are one rank lower
;; than the overall data, here the rows of the table which are simple lists
;; and therefore rank 1:
(j "+/ i. 2 5")

5 7 9 11 13
   


In [21]:
;; The above is equivalent to the following:
(j "0 1 2 3 4 + 5 6 7 8 9")

5 7 9 11 13
   


In [22]:
;; Now if we want to sum differently, we can assign our verb +/
;; an explicitly different rank using " (rank conjunction)
;; To target the rank-1 items of this noun, we will assign
;; +/ a rank of 1 and thereby sum the rows of the table:
(jdebug "+/\"1 i. 2 5")                 ; jdebug because we have to escape the " character

⌁ J INPUT ⌁
+/"1 i. 2 5

⌁ J OUTPUT ⌁
10 35
   


In [23]:
;; Specifying a verb rank of 2 gives us the same behavior
;; as the default (infinite) rank in this case, because
;; our argument is of rank 2:
(jdebug "+/\"2 i. 2 5")

⌁ J INPUT ⌁
+/"2 i. 2 5

⌁ J OUTPUT ⌁
5 7 9 11 13
   


In [24]:
;; Using data of a higher rank can demonstrate further:
(jdebug "i. 2 5 3")

⌁ J INPUT ⌁
i. 2 5 3

⌁ J OUTPUT ⌁
 0  1  2
 3  4  5
 6  7  8
 9 10 11
12 13 14

15 16 17
18 19 20
21 22 23
24 25 26
27 28 29
   


In [25]:
;; Default rank of +/ again is infinite, so
;; it operates on top-level items of the data,
;; which in this case are 5-by-3 tables:
(jdebug "+/ i. 2 5 3")

⌁ J INPUT ⌁
+/ i. 2 5 3

⌁ J OUTPUT ⌁
15 17 19
21 23 25
27 29 31
33 35 37
39 41 43
   


In [26]:
;; Which is equivalent to adding:
(j "i. 5 3")

 0  1  2
 3  4  5
 6  7  8
 9 10 11
12 13 14
   


In [27]:
;; to:
(j "15 + i. 5 3")

15 16 17
18 19 20
21 22 23
24 25 26
27 28 29
   


In [28]:
;; written:
(j "(i. 5 3) + (15 + i. 5 3)")

15 17 19
21 23 25
27 29 31
33 35 37
39 41 43
   


In [29]:
;; Use +/ with rank 1,
;; which sums the rows of each constituent table,
;; because they are the only rank-1 cells in this overall data,
;; returning a 2-by-5 shaped result:
(jdebug "+/\"1 i. 2 5 3")

⌁ J INPUT ⌁
+/"1 i. 2 5 3

⌁ J OUTPUT ⌁
 3 12 21 30 39
48 57 66 75 84
   


In [30]:
;; Using +/ with rank 2 targets the rank-2 cells
;; of this data. The rank-2 cells are the two 5-by-3
;; tables.
;; This results in summing the columns of each
;; constituent table:
(jdebug "+/\"2 i. 2 5 3")

⌁ J INPUT ⌁
+/"2 i. 2 5 3

⌁ J OUTPUT ⌁
 30  35  40
105 110 115
   


In [31]:
;; This is equivalent to putting + between
;; the rows of each constituent table. The first row
;; in the above output `30 35 40` is obtained by
;; this equivalent expression, which shows us manually
;; adding the rows of the first constituent table:
(j "0 1 2 + 3 4 5 + 6 7 8 + 9 10 11 + 12 13 14")

30 35 40
   


## Rank in Rankle

Rankle contains many table-related experiments and utilities, but this section focuses on those related to rank.

The namespace alias `r` is used for vars in the `com.semperos.rankle` namespace.

### Rank of Data

As in J, the rank of data is determined by the length of its shape. The shape of data is calculated using the `r/shape` function and behaves as follows:

In [32]:
(r/shape 4)

[]

In [33]:
(r/shape [4])

[1]

In [34]:
(r/shape [4 5 6])

[3]

In [35]:
(r/shape [[1 2 3] [4 5 6]])

[2 3]

_N.B.: Using `r/shape` with maps is not currently defined._

The rank of the above examples are therefore:

In [36]:
(r/rank 4)

0

In [37]:
(r/rank [4])

1

In [38]:
(r/rank [4 5 6])

1

In [39]:
(r/rank [[1 2 3] [4 5 6]])

2

_N.B.: The shape and rank of strings, maps, and ragged collections is not currently defined._

### Rank of Functions

Currently, the rank of a function is indicated by adding a `:rank` entry to the function var's metadata:

In [40]:
(:rank (meta #'r/count))

##Inf

The definition of `count` defers completely to `clojure.core/count` and so behaves as you would expect:

In [41]:
(mapv count [[], [1 2 3], [[4 5 6] [7 8 9]]])

[0 3 2]

With a single argument, the `r/rank` function simply returns the rank of its argument. With two arguments, `r/rank` expects the first to be a function and the second to be a number for the new rank, returning a function that applies the given function with respect to the given rank.

We will use `r/count` to demonstrate in Clojure what we demonstrated above in J.

In [42]:
(r/count [])

0

In [43]:
(r/count [1 2 3])

3

In [44]:
;; The `r/print-aligned` function gives us a J-like
;; print-out of multidimensional data.
(r/print-aligned [[0 1 2] [3 4 5]])

  0  1  2
  3  4  5


In [45]:
;; We also have an `r/reshape` which behaves like J's dyadic `$`
(r/print-aligned (r/reshape [2 3] (range 6)))

  0  1  2
  3  4  5


In [46]:
(def tbl-2x3 (r/reshape [2 3] (range 6)))
(r/count tbl-2x3)

2

In [47]:
((r/rank r/count 1) tbl-2x3)

(3 3)

In [48]:
((r/rank r/count 2) tbl-2x3)

2

In [49]:
;; Higher rank lets us do more:
(def tbl-2x3x4 (r/reshape [2 3 4] (range 24)))
(r/print-aligned tbl-2x3x4)

  0  1  2  3
  4  5  6  7
  8  9 10 11

 12 13 14 15
 16 17 18 19
 20 21 22 23



In [50]:
;; Rank infinity means that r/count works
;; on the data as a whole, counting its top-level
;; items:
(r/count tbl-2x3x4)

2

In [51]:
;; By changing the rank of r/count to 1, it will
;; try to count those cells (sub-parts of the overall data)
;; that are of rank 1, in this case, it will count the items in the
;; flat lists (which are rank 1 by definition)that make up the rows
;; of each constituent 3x4 table:
((r/rank r/count 1) tbl-2x3x4)

((4 4 4) (4 4 4))

In [52]:
;; With rank 2, the r/count function will search out the
;; cells that are rank 2 and count those. The two tables
;; in this data are, by definition, rank 2, and thus count
;; returns the number of top-level items (rows) in those tables:
((r/rank r/count 2) tbl-2x3x4)

(3 3)

In [53]:
;; The entire tbl-2x3x4 data structure is of rank 3,
;; so changing r/count from rank ##Inf to rank 3 is
;; equivalent in this case:
((r/rank r/count 3) tbl-2x3x4)

2

To hammer the point home, with normal Clojure `count`, we would use `map` to traverse the 2-by-3-by-4 table as follows to get counts of the lower-ranked collections within:

In [54]:
(count tbl-2x3x4)

2

In [55]:
;; Count the rank-2 cells:
(map count tbl-2x3x4)

(3 3)

In [56]:
;; Count the rank-1 cells:
(map (partial map count) tbl-2x3x4)

((4 4 4) (4 4 4))

Let's closely compare this to working with the smaller 2-by-3 table from earlier:

In [57]:
(count tbl-2x3)

2

In [58]:
(map count tbl-2x3)

(3 3)

In the case of `tbl-2x3x4`, doing `(map count ___)` results in counting rank-2 cells (tables) within the overall data, returning the number of rows in each table. In the case of `tbl-2x3`, doing the same `(map count ___)` results in counting rank-1 cells (lists) within the overall data, returning the number of items in each row of the data.

So even though a first-class concept of rank allows us to perform `map`-like operations, this comparison of `(map count ___)` underscores a fundamental difference in the nature of their abstractions:

> Clojure sequence operations allow us to apply functions to data by navigating the data structurally, whereas rank allows us to apply functions to data by targeting specific dimensions of the data, regardless of its overall dimensionality.

The kind of abstractions that rank affords, then, provide value only when working with data that has been arranged into a homogeneous, non-ragged representation like a multidimensional array. While this sounds like a severe limitation, in practice it allows us to use the many powerful array-programming approaches developed by the APL and J communities over the last 50 years, which, although based in larger part on properties of arithmetic and algebra than other languages, provide concise solutions both to analytical and "enteprise" problems.

To continue, try reading the [Practical Rankle Examples](PracticalRankleExamples.ipynb) notebook.