Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 319 lines (252 sloc) 13.89 kb
36532fb @spl Import file for first blog entry with code:
authored
1 <!--
2 This is the source for the splonderzoek blog entry dated 2008-02-15:
3
4 http://splonderzoek.blogspot.com/2009/02/incremental-fold-design-pattern.html
5 -->
6
7 <p>
8 I recently read the article "How to Refold a Map" by David F. Place in <a href="http://www.haskell.org/haskellwiki/The_Monad.Reader">The Monad.Reader</a> <a href="http://www.haskell.org/sitewiki/images/6/6a/TMR-Issue11.pdf">Issue 11</a>. I've been thinking about incremental algorithms in Haskell for some time, and I realized that Place has written a specific instance (and optimization) of a more general concept: the <em>incremental fold</em>.
9 </p>
10 <p>
11 In this article, I demonstrate a design pattern for converting a datatype and related functions into an incremental fold. The pattern is not difficult to comprehend, but it would be nice to improve upon it. I explore a few improvements and issues with those improvements. Ultimately, I'd like to see this functionality in a program instead of a design pattern.
12 </p>
13 <p>
14 <em>Note: This is a <a href="http://www.haskell.org/haskellwiki/Literate_programming">literate Haskell</a> article. You can copy the text of the entire article, paste it into a new file called <code>IncrementalTreeFold.lhs</code>, and load it into <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/ghci.html">GHCi</a>.</em>
15 </p>
16 <pre>
17
18 > {-# LANGUAGE MultiParamTypeClasses #-}
19 > {-# LANGUAGE FlexibleInstances #-}
20 > {-# LANGUAGE ScopedTypeVariables #-}
21
22 > module IncrementalTreeFold where
23 > import Prelude hiding (elem)
24 > import qualified Data.Char as Char (ord)
25
26 </pre>
27
28 <h2>Introducing a Typical Binary Tree</h2>
29
30 <p>
31 Before we get to the conversion, let's choose an appropriate datatype. Place adapted the <a href="http://hackage.haskell.org/packages/archive/containers/0.2.0.0/doc/html/src/Data-Map.html#Map">Map type</a> used in <a href="http://hackage.haskell.org/packages/archive/containers/0.2.0.0/doc/html/Data-Map.html">Data.Map</a> (or <a href="http://hackage.haskell.org/packages/archive/containers/0.2.0.0/doc/html/src/Data-Set.html#Set">Set</a> in <a href="http://hackage.haskell.org/packages/archive/containers/0.2.0.0/doc/html/Data-Set.html">Data.Set</a>). To simplify my presentation, I will use an ordered binary tree with labeled nodes.
32 </p>
33 <pre>
34
35 > data Tree a
36 > = Tip
37 > | Bin a (Tree a) (Tree a)
38 > deriving Show
39
40 </pre>
41 <p>
42 Next, let's introduce some useful functions. An incremental fold is not necessarily like applying a <a href="http://www.citeulike.org/user/spl/tag/fold">fold function</a> (a.k.a. a <a href="http://knol.google.com/k/edward-kmett/catamorphisms/3qi7x2qrdushx/2">catamorphism</a>, not a <a href="http://www.citeulike.org/user/spl/tag/crush">crush function</a> that has become known as a fold) to a value directly. Instead, as I will later show, it integrates into existing functions that manipulate values. That said, we should have some functions for building <code>Tree</code>s. Here is the beginning of a <code>Tree</code> API. (There are a number of other operations, e.g. <code>delete</code> and <code>lookup</code>, that can easily be added but do not contribute much to the discussion.)
43 </p>
44 <p>
45 <code>empty</code> builds a tree with no elements.
46 </p>
47 <pre>
48
49 > empty :: (Ord a) => Tree a
50 > empty = Tip
51
52 </pre>
53 <p>
54 <code>singleton</code> builds a tree with a single element.
55 </p>
56 <pre>
57
58 > singleton :: (Ord a) => a -> Tree a
59 > singleton x = Bin x Tip Tip
60
61 </pre>
62 <p>
63 <code>insert</code> puts a value in the appropriate place given a left-to-right ordering of values in the tree.
64 </p>
65 <pre>
66
67 > insert :: (Ord a) => a -> Tree a -> Tree a
68 > insert x t =
69 > case t of
70 > Tip ->
71 > singleton x
72 > Bin y lt rt ->
73 > case compare x y of
74 > LT -> Bin y (insert x lt) rt
75 > GT -> Bin y lt (insert x rt)
76 > EQ -> Bin x lt rt
77
78 </pre>
79 <p>
80 <code>fromList</code> creates a tree from a list of values.
81 </p>
82 <pre>
83
84 > fromList :: (Ord a) => [a] -> Tree a
85 > fromList = foldr insert empty
86
87 </pre>
88 <p>
89 <code>elem</code> determines if a value is an element of a tree.
90 </p>
91 <pre>
92
93 > elem :: (Ord a) => a -> Tree a -> Bool
94 > elem x t =
95 > case t of
96 > Tip ->
97 > False
98 > Bin y lt rt ->
99 > case compare x y of
100 > LT -> elem x lt
101 > GT -> elem x rt
102 > EQ -> True
103
104 </pre>
105 <p>
106 Now, using our library of sorts, we can create binary search tree and check if a value is in the tree.
107 </p>
108 <pre>
109
110 > test1 = 37 `elem` fromList [8,23,37,82,3]
111
112 </pre>
113
114 <h2>Tree Folds</h2>
115
116 <p>
117 Suppose that we now want the size of the tree. For good abstraction and high reuse, we create a <code>fold</code> function.
118 </p>
119 <pre>
120
121 > data Alg a b = Alg { ftip :: b, fbin :: a -> b -> b -> b }
122
123 > fold :: Alg a b -> Tree a -> b
124 > fold alg = go
125 > where
126 > go Tip = ftip alg
127 > go (Bin x lt rt) = fbin alg x (go lt) (go rt)
128
129 </pre>
130 <p>
131 <code>fold</code> allows us to write a simple <code>size</code> function.
132 </p>
133 <pre>
134
135 > size :: Tree a -> Int
136 > size = fold (Alg 0 (\_ lr rr -> 1 + lr + rr))
137
138 </pre>
139 <p>
140 I use the datatype <code>Alg</code> here to contain the <a href="http://www.alpheccar.org/en/posts/show/77">algebra of the fold</a>. In <code>size</code>, we simply replace each constructor in the algebra of <code>Tree</code> with a corresponding element from the algebra of integer addition. Since you're reading this article, you're probably a Haskell programmer and already familiar with the sorts of functions that can be written with folds. Here are a few others.
141 </p>
142 <pre>
143
144 > filter :: (a -> Bool) -> Tree a -> [a]
145 > filter f = fold (Alg [] (\x lr rr -> if f x then [x] else [] ++ lr ++ rr))
146
147 > ord :: Tree Char -> Tree Int
148 > ord = fold (Alg Tip (\x lt rt -> Bin (Char.ord x) lt rt))
149
150 </pre>
151
152 <h2>Incremental Change</h2>
153
154 <p>
155 Now that we have a grasp on using a fold on a datatype, I would like to show how to extend my binary tree "library" defined above to support an incremental fold. The incremental fold can (I believe) do everything a traditional fold can do, but it does it during <code>Tree</code> construction instead of externally in a separate function. This means that every time we produce a new <code>Tree</code> (via <code>singleton</code>, <code>insert</code>, or <code>fromList</code> for example), we get a new result of the incremental fold.
156 </p>
157 <p>
158 Transforming our library into an incremental calculating machine involves several steps. The first step is extending the datatype to hold the incremental result. Since we want to be polymorphic in the result type, we add a type parameter <code>r</code> to the <code>Tree</code> type constructor. And since each constructor may possibly have an incremental result, it must also be extended with a place holder for <code>r</code>.
159 </p>
160 <pre>
161
162 > data Tree' a r
163 > = Tip' r
164 > | Bin' a (Tree' a r) (Tree' a r) r
165 > deriving Show
166
167 </pre>
168 <p>
169 For convenience and possibly to hide the modified constructors from the outside world, we add a function for retrieving the increment result.
170 </p>
171 <pre>
172
173 > result' :: Tree' a r -> r
174 > result' (Tip' r) = r
175 > result' (Bin' _ _ _ r) = r
176
177 </pre>
178 <p>
179 As I mentioned earlier, the machinery of the fold is now in the construction. To implement this second step, we use <a href="http://splonderzoek.blogspot.com/2008/01/smart-constructors.html">smart constructors</a>.
180 </p>
181 <pre>
182
183 > tip' :: Alg a r -> Tree' a r
184 > tip' alg = Tip' (ftip alg)
185
186 > bin' :: Alg a r -> a -> Tree' a r -> Tree' a r -> Tree' a r
187 > bin' alg x lt rt = Bin' x lt rt (fbin alg x (result' lt) (result' rt))
188
189 </pre>
190 <p>
191 Both <code>tip'</code> and <code>bin'</code> construct new values of <code>Tree' a r</code> and using the algebra, calculate the incremental result to be stored in each value. Thus, the actual fold operation is "hidden" in the construction of values.
192 </p>
193 <p>
194 Now, in order to put the incremental fold to work in a function, we simply (1) add the algebra to the function's arguments, (2) add an wildcard pattern for the result field in constructor patterns, and (3) replace applications of the constructors with that of their incremental cousins. Here's an example of the <code>singleton</code> and <code>insert</code> functions modified for incremental folding.
195 </p>
196 <pre>
197
198 > singleton' :: (Ord a) => Alg a r -> a -> Tree' a r
199 > singleton' alg x = bin' alg x (tip' alg) (tip' alg)
200
201 > insert' :: (Ord a) => Alg a r -> a -> Tree' a r -> Tree' a r
202 > insert' alg x t =
203 > case t of
204 > Tip' _ ->
205 > singleton' alg x
206 > Bin' y lt rt _ ->
207 > case compare x y of
208 > LT -> bin' alg y (insert' alg x lt) rt
209 > GT -> bin' alg y lt (insert' alg x rt)
210 > EQ -> bin' alg x lt rt
211
212 </pre>
213 <p>
214 Comparing these functions with the initial versions, we see that the changes are readily apparent. Modify every other <code>Tree'</code>-hugging function in the same manner, and you have a design pattern for an incremental fold!
215 </p>
216
217 <h2>Improving the Incremental Implementation</h2>
218
219 <p>
220 Of course, you may complain that there's some amount of boilerplate work involved. For example, we have to add this <code>alg</code> argument everywhere. Let's try to replace that with a type class.
221 </p>
222 <pre>
223
224 < class Alg'' a r where
225 < ftip'' :: r
226 < fbin'' :: a -> r -> r -> r
227
228 </pre>
229 <p>
230 And we redefine our smart constructors.
231 </p>
232 <pre>
233
234 < tip'' :: (Alg' a r) => Tree' a r
235 < tip'' = Tip' ftip''
236
237 </pre>
238 <p>
239 But there's a problem here! GHC reports that it <code>Could not deduce (Alg'' a r) from the context (Alg'' a1 r)</code>. The poor compiler cannot infer the type of the parameter <code>a</code> since <code>ftip''</code> has only type <code>r</code>.
240 </p>
241 <p>
242 Let's try another version of the class. In this one, we add a dummy argument to <code>ftip'</code> in order to force GHC to correctly infer the full type.
243 </p>
244 <pre>
245
246 > class Alg'' a r where
247 > ftip'' :: a -> r
248 > fbin'' :: a -> r -> r -> r
249
250 > tip'' :: forall a r . (Alg'' a r) => Tree' a r
251 > tip'' = Tip' (ftip'' (undefined :: a))
252
253 > bin'' :: (Alg'' a r) => a -> Tree' a r -> Tree' a r -> Tree' a r
254 > bin'' x lt rt = Bin' x lt rt (fbin'' x (result' lt) (result' rt))
255
256 </pre>
257 <p>
258 This provides one (not very pretty) solution to the problem. I'm able to get around the need to require an argument for <code>tip''</code> by using <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/other-type-extensions.html#scoped-type-variables">lexically scoped type variables</a>. But it doesn't remove the ugly type from <code>ftip''</code>, and the user is forced to ignore it when writing an instance.
259 </p>
260 <p>
261 The functions can now be rewritten with the <code>Alg''</code> constraint.
262 </p>
263 <pre>
264
265 > empty'' :: (Ord a, Alg'' a r) => Tree' a r
266 > empty'' = tip''
267
268 > singleton'' :: (Ord a, Alg'' a r) => a -> Tree' a r
269 > singleton'' x = bin'' x tip'' tip''
270
271 > insert'' :: (Ord a, Alg'' a r) => a -> Tree' a r -> Tree' a r
272 > insert'' x t =
273 > case t of
274 > Tip' _ ->
275 > singleton'' x
276 > Bin' y lt rt _ ->
277 > case compare x y of
278 > LT -> bin'' y (insert'' x lt) rt
279 > GT -> bin'' y lt (insert'' x rt)
280 > EQ -> bin'' x lt rt
281
282 > fromList'' :: (Ord a, Alg'' a r) => [a] -> Tree' a r
283 > fromList'' = foldr insert'' empty''
284
285 </pre>
286 <p>
287 These versions look more like the non-incremental implementations above. To use them, we need to declare an instance of <code>Alg''</code> with an appropriate algebra for our desired incremental result. Here's how we would rewrite <code>size</code>.
288 </p>
289 <pre>
290
291 > newtype Size = Size { unSize :: Int }
292
293 > instance Alg'' a Size where
294 > ftip'' _ = Size 0
295 > fbin'' _ lr rr = Size (1 + unSize lr + unSize rr)
296
297 > size'' :: Tree' a Size -> Int
298 > size'' = unSize . result'
299
300 > test2 = size'' $ insert'' 's' $ insert'' 'p' $ insert'' 'l' $ fromList'' "onderzoek"
301
302 </pre>
303
304 <p>
305 <code>Size</code> is still defined as a fold, but the result is incrementally built with each application of a library function. This can have a nice performance boost as Place also found in his article.
306 </p>
307
308 <h2>Generic Thoughts</h2>
309
310 <p>
311 On reflecting over my implementation, I really don't like the dummy arguments required by constructors like <code>Tip</code>. There are other approaches to dealing with this, but I haven't yet found a better one. If you use a <a href="http://www.haskell.org/haskellwiki/Functional_dependencies">functional dependency</a> such as <code>r -> a</code> in the definition of <code>Alg''</code>, then <code>a</code> would be uniquely determined by <code>r</code>. In the case of <code>size''</code>, we would have to specify a concrete element type for <code>Tree'</code> instead of the parameter <code>a</code> (or use <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/type-class-extensions.html#undecidable-instances">undecidable instances</a>). Perhaps, dear reader, you might have a better solution?
312 </p>
313 <p>
314 The incremental fold pattern is great for documenting an idea, but it has several downsides: (1) The obvious one is that it requires modifying a datatype and code. This is not always desirable and often not practical. (2) Implementing an incremental fold can involve a lot of boilerplate code and many, small changes that are monotonous and boring. It's very easy to make mistakes. In fact, I made several copy-paste-and-forget-to-change errors while writing this article.
315 </p>
316 <p>
317 As <a href="http://www.comlab.ox.ac.uk/jeremy.gibbons/">Jeremy Gibbons</a> and others have shown us, <a href="http://www.comlab.ox.ac.uk/publications/publication1452-abstract.html">design patterns are better as programs</a>. Since the code is so regular, it seems very receptive to some <a href="http://www.cs.uu.nl/wiki/GenericProgramming">generic programming</a>. I plan to explore this further, possibly using one of the <a href="http://hackage.haskell.org/packages/archive/pkg-list.html#cat:generics">many generics libraries</a> available for Haskell or designing a new one. Suggestions and feedback are welcome.
318 </p>
Something went wrong with that request. Please try again.