Skip to content
Florian Biermann edited this page Dec 21, 2016 · 11 revisions

Gradual Flattening

The idea of gradual flattening is to, instead of retaining the quad rope's tree structure, flatten it to a simple array at each application of a bulk modifying function. The following pseudocode illustrates the general idea:

val map : (f : 'a -> 'b) -> (qr : 'a Node) -> 'b Leaf

The supposed benefit is that (1) removing the tree structure allows for best-case O(n) map, which currently is O(n log n), (2) indexing into the data structure is best-case O(1) even for large arrays, which potentially has a positive effect on dynamic programming algorithms.

Also, it seems that it may be much easier to implement a faithful lazy tree scheduler when quad ropes are flat, as we can just allocate a new tiling of the rectangle at each split.

Nevertheless, one immediate problem is the performance of set, which suddenly becomes O(n) for large arrays, as the entire array has to be allocated anew. Maybe a simple solution could be to simply forbid writing to index pairs, forcing programmers to use an even more high-level style of programming.

Another possibility may be to split the argument quad rope to build a small enough leaf which is not as costly to reallocate. Thinking briefly about it, this may take O(n log n) time and O(log n) space.

Lazy Tree Splitting

A simple approach to lazy tree splitting may be to simply assume that all quad ropes are balanced. Then, we can always split four-way at the root (two-way for flat and thin nodes) as soon as the splitting condition becomes true and launch new tasks for each branch. The balancing may not be perfect, but it is probably much better than eagerly spawning tasks.

If we also implement gradual flattening, we can just use tiling to split the array.

A less naive splitting approach may be to ignore the rectangular shape of quad ropes and just split them as we would do with normal 4-ary trees. If we leave the leaves alone, this would probably also directly result in correct reconstruction.

Sparseness

When using a single large array to back the entire quad rope, we need to know whether any quad rope contains sparse parts. Otherwise, we'd allocate memory for sparse portions of a quad rope during e.g. map. Then, we should have two variants of all higher order functions: one simple one that does not use a single array and one "clever" one that takes sparseness into account.

Sparseness and gradual flattening may go very well together.

Generalized Two-Dimensional Scan

Computing the prefix sum of a two-dimensional array can be achieved using a summed area table:

let prefixSum (vals : double [,]) =
	let sums = Array2D.copy vals
	// Init (i, 0)
	for i in 1 .. Array2D.length1 sums - 1 do
	    sums.[i, 0] <- sums.[i, 0] + sums.[i - 1, 0]
	// Init (0, j).
	for j in 1 .. Array2D.length2 sums - 1 do
		sums.[0, j] <- sums.[0, j] + sums.[0, j - 1]
	// Init remaining.
	for i in 1 .. Array2D.length1 sums - 1 do
	    for j in 1 .. Array2D.length2 sums - 1 do
			sums.[i, j] <- sums.[i, j] + sums.[i - 1, j] + sums.[i, j - 1] - sums.[i - 1, j - 1]
	sums

This may be straight-forward to implement on quad ropes as well, but this seems too specific to be implemented for any type unless, for each type, a dual of operators is passed to the function call:

val scan : ('a -> 'a -> 'a) -> ('a -> 'a -> 'a) -> 'a QuadRope -> 'a QuadRope

let prefixSum : 'a QuadRope -> 'a QuadRope = scan (+) (-)

This seems unintuitive, but we may have to be more explicit either way when we add sparseness.

Here is complete code for two-dimensional arrays that generalize area sum maps:

/// Specialized prefix sum for any type that has a static member + and - .
let inline prefixSum (vals : ^a [,] when ^a : (static member (+) : ^a -> ^a -> ^a) and ^a : (static member (-) : ^a -> ^a -> ^a)) =
    let sums = Array2D.zeroCreate (Array2D.length1 vals) (Array2D.length2 vals)
    // Initial value must be copied
    sums.[0, 0] <- vals.[0, 0]
    // Init (i, 0)
    for i in 1 .. Array2D.length1 sums - 1 do
        sums.[i, 0] <- vals.[i, 0] + sums.[i - 1, 0]
    // Init (0, j).
    for j in 1 .. Array2D.length2 sums - 1 do
        sums.[0, j] <- vals.[0, j] + sums.[0, j - 1]
    // Init remaining.
    for i in 1 .. Array2D.length1 sums - 1 do
        for j in 1 .. Array2D.length2 sums - 1 do
            sums.[i, j] <- vals.[i, j] + sums.[i - 1, j] + sums.[i, j - 1] - sums.[i - 1, j - 1]
    sums

/// Generalization to any type:
let scan (f : 'a -> 'a -> 'a) (g : 'a -> 'a -> 'a) (vals : 'a [,]) : 'a [,] =
    let sums = Array2D.zeroCreate (Array2D.length1 vals) (Array2D.length2 vals)
    // Initial value must be copied
    sums.[0, 0] <- vals.[0, 0]
    // Init (i, 0)
    for i in 1 .. Array2D.length1 sums - 1 do
        sums.[i, 0] <- f vals.[i, 0] sums.[i - 1, 0]
    // Init (0, j).
    for j in 1 .. Array2D.length2 sums - 1 do
        sums.[0, j] <- f vals.[0, j] sums.[0, j - 1]
    // Init remaining.
    for i in 1 .. Array2D.length1 sums - 1 do
        for j in 1 .. Array2D.length2 sums - 1 do
            sums.[i, j] <- g  ( f vals.[i, j] (f sums.[i - 1, j] sums.[i, j - 1])) sums.[i - 1, j - 1]
    sums

let prefixSum2 vals =
    scan (+) (-) vals

> prefixSum (Array2D.create 10 10 1) = prefixSum2 (Array2D.create 10 10 1);;
val it : bool = true

We can implement a similar pattern using target descriptors on quad ropes, where we need to extend the target by one row and column containing the initial state. Then, we can propagate state by accessing the target array uniformly. See commit c238fcc.

Furthermore, it is not obvious that we can use standard parallel scan algorithms. We can however exploit the quad rope structure. There is one case in which we can compute the prefix sums of NE and SW children in parallel, which is when rows ne <= rows nw && cols sw <= cols nw. In all other cases on full nodes, we must make sure to scan ne and sw in the correct order, as there are dependencies between both. See commit 2aa4ace.

Performance evaluation next.

1D-scan seems a special case of 2D-scan:

I(0, j) = E(0, j) + I(0, j-1) + I(-1, j) - I(-1, j-1)

where I(-1, j) = I(-1, j-1) = e, hence:

I(0, j) = E(0, j) + I(0, j-1)

It is unclear to me whether we can use this for anything.

Slicing is inherently sequential!

I have tried to find an expression that calculates this in advance, but I cannot seem to find one. A more sophisticated argument however would be nice.

Sparseness, Revisited

Often times, the reduce function is defined as follows:

let reduce (f : 'a -> 'a) (epsilon : 'a) : 'a list -> 'a = function
    | [] -> epsilon
	| x :: xs -> reduce f (f epsilon x) xs

The variable epsilon is initially the neutral element for f, such that f epsilon x = f x epsilon = x. It can be used as an initial state as well. However, if we assume it to be the neutral element for f, then we can use epsilon to detect sparse sub-trees in a quad rope.

First, we can skip combining the results of reducing the children of a quad rope node, because we know already that f epsilon x = x. Second, we can now always return the neutral element in case of an empty sub-rope, which is much nicer than throwing an error. Third, in the case of Sparse (h, w, v) when v = epsilon -> epsilon.

This should be general enough to avoid materialization of sparse sub-trees for neutral elements. In the case of mapreduce f g epsilon qr, we need to check Sparse (h, w, v) when f v = epsilon -> epsilon.

We could of course require some kind of proof that f epsilon x = f epsilon x = x, but from a pragmatic point of view, this seems excessive. We can simply rely on that the user knows what they do. After all, this is part of the domain and still a declarative approach.

Generalized N-ZIP

The main limitation of this implementation is that all quad ropes must contain the same type 'a. Also, for n = 2 we need to wrap the second argument to f into an array, because f has type 'a -> 'a[] -> 'b.

/// Generalized n-zip. Allows for zipping arbitrary many quad ropes,
/// which however are restricted to containing the same
/// type. Specializes internally for n = 1 and n = 2, but the latter
/// requires wrapping a single argument in an array, which is
/// inefficient.
let nzip f qr qrs =
    let rec nzip qr qrs tgt =
        match qr with
            | Empty -> Empty

            // Call f.
            | Leaf slc ->
                let qrs' = Array.map materialize qrs
                leaf (Target.mapi (fun i j v -> f v (Array.map (fun qr' -> fastGet qr' i j) qrs')) slc tgt)

            | Node (_, _, _, _, ne, nw, sw, se) ->
                let qrs' = Array.map (sliceToMatch qr) qrs
                let nw0 = nzip nw (Array.map (fun (a, _, _, _) -> a) qrs') tgt
                let ne0 = nzip ne (Array.map (fun (_, b, _, _) -> b) qrs') (Target.ne tgt qr)
                let sw0 = nzip sw (Array.map (fun (_, _, c, _) -> c) qrs') (Target.sw tgt qr)
                let se0 = nzip se (Array.map (fun (_, _, _, d) -> d) qrs') (Target.se tgt qr)
                node ne0 nw0 sw0 se0

            | Slice _ ->
                nzip (materialize qr) qrs tgt

            // If only main argument is sparse, make first element of
            // array main argument and original main argument last
            // array element.
            | Sparse _ ->
                let qr' = Array.head qrs
                let qrs' = Array.append qrs.[1..] [|qr|]
                nzip qr' qrs' tgt

    match qr with
        // When every argument is sparse, no need to recurse.
        | Sparse (h, w, v) when Array.forall isSparse qrs ->
            Sparse (h, w, f v (Array.map (fun (Sparse (_, _, v)) -> v) qrs))
        // Otherwise, specialize on array length.
        | _ ->  match qrs with
                | [||] -> map (fun a -> f a [||]) qr
                | [|qr0|] -> zip (fun a b -> f a [|b|]) qr qr0
                | _ -> nzip qr qrs (Target.make (rows qr) (cols qr))