Skip to content


Subversion checkout URL

You can clone with
Download ZIP
100644 249 lines (184 sloc) 10.7 KB
9b19c7b @snoyberg Added conduit README
snoyberg authored
1 Conduits are an approach to the streaming data problem. It is meant as an
2 alternative to enumerators/iterators, hoping to address the same issues with
3 different trade-offs based on real-world experience with enumerators.
5 General Goal
6 ===========================
8 Let's start by defining the goal of enumerators, iterators, and conduits. We
9 want a standard interface to represent streaming data from one point to
10 another, possibly modifying the data along the way.
12 This goal is also achieved by lazy I/O; the problem with lazy I/O, however, is
13 that of deterministic resource cleanup. That is to say, with lazy I/O, you
14 cannot be guaranteed that your file handles will be closed as soon as you have
15 finished reading data from them.
17 We want to keep the same properties of constant memory usage from lazy I/O, yet
18 have guarantees that scarce resources will be freed as early as possible.
20 Enumerator
21 ===========================
23 __Note__: This is biased towards John Millikin's enumerator package, as that is
24 the package with which I have the most familiarity.
26 The concept of an enumerator is fairly simple. We have an `Iteratee` which
27 "consumes" data. It keeps its state while being fed data by an `Enumerator`.
28 The `Enumerator` will feed data a few chunks at a time to an `Iteratee`,
29 transforming the `Iteratee`'s state at each call. Additionally, there is an
30 `Enumeratee` that acts as both an `Enumerator` and `Iteratee`.
32 As a result, there are a few changes to code structure that need to take place
33 in order to fully leverage enumerators:
35 * The `Enumerator`s control code flow. This is an Inversion of Control (IoC)
36 technique.
38 __Practical ramification__: `Iteratee` code can be more difficult to
591fef4 @snoyberg Added clarifications/requirements to
snoyberg authored
39 structure. Note that this is a subjective opinion, noted by many newcomers to
40 the enumerator paradigm.
42 __Requirement__: Nothing specific, likely addressing the requirements
43 below will automatically solve this.
9b19c7b @snoyberg Added conduit README
snoyberg authored
45 * `Iteratee`s are not able to allocate scarce resources. Since they do not
46 have any control of the flow of the program, they cannot guarantee that
47 the resources will be released, especially in the presence of exceptions.
49 __Practical ramification__: There is no way to create an `iterFile`, which
50 will stream data into a file. Instead, you must allocate a file handle
51 before entering the `Iteratee` and pass that in. In some cases, such an
52 approach would mean file handles are kept open too long.
591fef4 @snoyberg Added clarifications/requirements to
snoyberg authored
54 __Clarification__: It is certainly *possible* to write iterFile, but there
55 are no guarantees that it will close the allocated `Handle`, since the calling
56 `Enumerator` may throw an exception before sending an `EOF` to the `Iteratee`.
58 __Requirement__: We need a solution which would allow code something like
aa4ec60 @snoyberg Typo correction
snoyberg authored
59 the following to correctly open and close file handles, even in the presence
591fef4 @snoyberg Added clarifications/requirements to
snoyberg authored
60 of exceptions.
62 run $ enumFile "input.txt" $$ iterFile "output.txt"
9b19c7b @snoyberg Added conduit README
snoyberg authored
64 * None of this plays nicely with monad transformers, though this does not
65 seem to be an inherent problem with enumerators, instead with the current
66 library.
68 __Practical ramification__: You cannot enumerate a file when running in a
69 `ReaderT IO`.
591fef4 @snoyberg Added clarifications/requirements to
snoyberg authored
71 __Requirement__: The following pseudo-code should work:
73 runReaderT (run $ enumFile "input" $$ iterFile "output") ()
9b19c7b @snoyberg Added conduit README
snoyberg authored
75 * Instead of passing around a `Handle` to pull data from, your code should
76 live inside an `Iteratee`. This makes it difficult and/or impossible to
77 interleave two different sources.
79 __Practical ramification__: Even with libraries designed to interoperate
80 (like http-enumerator and warp), it is not possible to create a proper
81 streaming HTTP proxy.
591fef4 @snoyberg Added clarifications/requirements to
snoyberg authored
83 __Requirement__: It should be possible to pass around some type of producer
84 which will be called piecemeal. For example, the request body in Warp should be
85 expressible as:
87 data Request = Request
88 { ...
89 , requestBody :: Enumerator ByteString IO ()
90 }
92 Applications should be able to do something like:
94 bs <- requestBody req $$ takeBytes 10
95 someAction bs
96 rest <- requestBody req $$ takeRest
97 finalAction rest
99 Note that there may be other approaches to solving the same problem, this
100 is just one possibility.
9b19c7b @snoyberg Added conduit README
snoyberg authored
102 * While the concepts are simple, actually writing low-level Iteratee code is
103 very complex. This in turn intimidates users from adopting the approach.
591fef4 @snoyberg Added clarifications/requirements to
snoyberg authored
104 Again, this is a subjective measurement.
106 __Requirement__: Newcomers should be able to easily understand how to use
107 the package, and with a little more training feel comfortable writing their own
108 producers/consumers.
9b19c7b @snoyberg Added conduit README
snoyberg authored
110 Conduits
111 ===========================
113 Conduits attempt to provide a similar high-level API to enumerators, while
114 providing a drastically different low-level implementation. The first question
115 to visit is: why does the enumerator need to control flow of the program? The
116 main purpose is to ensure that resources are released properly. But this in
117 fact solved only *half* the problem; iteratees still cannot release resources.
119 ResourceT
120 ---------------------------
122 So our first issue to address is to create a new way to deal with resource
123 allocation. We represent this as a monad transformer, `ResourceT`. It works as
124 follows:
126 * You can register a cleanup action, which will return a `ReleaseKey`.
128 * If you pass your `ReleaseKey` to the `release` function, your action will be
129 called automatically, and your action will be unregistered.
131 * When the monad is exited (via `runRelease`), all remaining registered actions
132 will be called.
134 * All of this is provided in an exception-safe manner.
136 For example, you would be able to open a file handle, and then register an
137 action to close the file handle. In your code, you would call `release` on your
138 `ReleaseKey` as soon as you reach the end of the contents you are streaming. If
139 that code is never reached, the file handle will be released when the monad
140 terminates.
142 Source
143 ---------------------------
145 Now that we have a way to deal with resources, we can take a radically
146 different approach to production of data streams. Instead of a push system,
147 where the enumerators sends data down the pipeline, we have a pull system,
148 where data is requested from the source. Additionally, a source allows
149 buffering of input data, so data can be "pushed back" onto the source to be
150 available for a later call.
152 Sink
153 ---------------------------
155 A `Sink` is the corrollary to an `Iteratee`. It takes a stream of data, and can
156 return a result, consisting of leftover input and an output. Like an
157 `Iteratee`, a `Sink` provides a `Monad` instance, which allows easy chaining
158 together of `Sink`s.
160 However, a big difference is that your code needn't live in the `Sink` monad.
161 You can easily pass around your sources and connect them to different `Sink`s.
162 As a practical example, when the Web Application Interface (WAI) is translated
163 to conduits, the application lives in the `ResourceT IO` monad, and the
164 `Request` value contains a `requestBody` record, which is a `Source IO
165 ByteString`.
167 Conduit
168 ---------------------------
170 Conduits are simply functions that that a stream of input data and return
171 leftover input as well as a stream of output data. Conduits are far simpler to
172 implement than their corrollary, `Enumeratee`s.
174 Connecting
175 ---------------------------
177 While you can directly pull data from a `Source`, or directly push to a `Sink`, the easiest approach is to use the built-in connect operators. These follow the naming convention from the enumerator package, e.g.:
179 sourceFile "myfile.txt" $$ sinkFile "mycopy.txt"
180 sourceFile "myfile.txt" $= uppercase {- a conduit -} $$ sinkFile "mycopy.txt"
181 fromList [1..10] $$ (+ 1) =$ fold (+) 0
183 Trade-offs
184 ===========================
186 Overall, the approach achieves the goals I had hoped for. The main downside in
187 its current form is its reliance on mutable data. Instead of having an
188 `Iteratee` return a new `Iteratee`, thereby provide an illusion of mutability,
189 in conduit the sources and sinks must maintain their state internally. As a
190 result, code must live in IO and usually use something like an IORef to keep
191 track of the current state.
193 I believe this to be an acceptable trade-off, since:
195 1. Virtually all conduit code will be performing I/O, so staying in the `IO`
196 monad is reasonable.
197 2. By using `monad-control`, conduit can work with any monad *based* on `IO`,
198 meaning all standard transformers (except `ContT`) can be used.
199 3. Enumerator experience has shown that the majority of the time, you construct
200 `Iteratee`s by using built-in functions, such as fold and map. Therefore,
201 the complication of tracking mutable state will usually be abstracted from
202 users.
204 Another minor point is that, in order to provide an efficient `Monad` instance,
205 the `Sink` type is complicated with tracking two cases: a `Sink` which expects
206 data and one which does not. As expressed in point (3) above, this should not
207 have a major impact for users.
209 Finally, since most `Source`s and `Sink`s begin their life by allocating some
210 mutable variable, both types allow some arbitrary monadic action to be run
211 before actual processing begins. The monad (et al) instances and connect
212 functions are all built to run this action once and then continue operation.
214 Status
215 ===========================
217 This is currently no more than a proof-of-concept, to see the differences
218 between enumerators and conduits for practical problems. This may serve as a
219 basis for WAI and Yesod in the future, but that will only be after careful
220 vetting of the idea. Your input is greatly appreciated!
d9914af @snoyberg Added some notes
snoyberg authored
222 Notes
223 ===========================
225 This is just a collection of my personal notes, completely unorganized.
227 * In enumerator, it's relatively easy to combined multiple `Iteratee`s into
228 an `Enumeratee`. The equivalent (turning `Sink`s into a `Conduit`) is
229 harder. See, for example, chunking in http-conduit. Perhaps this can be
230 improved with a better `sequence`.
232 * Names and operators are very long right now. Is that a feature or a bug?
22d9341 @snoyberg More README notes
snoyberg authored
234 * Should we use Vector in place of lists?
236 * It might be worth transitioning to RegionT. Will the extra type parameter
237 scare people away?
239 * Perhaps the whole BSource/BConduit concept doesn't need to be exposed to
240 the user. Advantage of exposing: it makes it obvious at the type level that
241 a source/conduit can be reused, and possibly more efficient implementations
242 (no double buffering). Disadvantage: more functions to implement/user to
243 keep track of, so harder to use.
4b60dca @snoyberg MonadIO instance
snoyberg authored
245 * I dislike the travesty which is `type FilePath = [Char]`, so I'm using the
246 system-filepath package. I've used it for a lot of internal code at work,
247 and it performs wonderfully. If anyone is concerned about this approach,
248 let me know.
Something went wrong with that request. Please try again.