Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 267 lines (198 sloc) 11.804 kb
9b19c7b @snoyberg Added conduit README
snoyberg authored
1 Conduits are an approach to the streaming data problem. It is meant as an
2 alternative to enumerators/iterators, hoping to address the same issues with
3 different trade-offs based on real-world experience with enumerators.
4
23320e2 @snoyberg Updated README
snoyberg authored
5 Current Documentation
6 ===========================
7
8 The most up-to-date documentation is available as an appendix of the Yesod
9 book, at:
10 [http://www.yesodweb.com/book/conduit](http://www.yesodweb.com/book/conduit).
11 The remainder of the contents of this page are kept for historical reasons, to
12 give an idea of the original driving factors behind conduit. Note that plenty
13 of the descriptions of the current state of the package are inaccurate.
14
9b19c7b @snoyberg Added conduit README
snoyberg authored
15 General Goal
16 ===========================
17
18 Let's start by defining the goal of enumerators, iterators, and conduits. We
19 want a standard interface to represent streaming data from one point to
20 another, possibly modifying the data along the way.
21
22 This goal is also achieved by lazy I/O; the problem with lazy I/O, however, is
23 that of deterministic resource cleanup. That is to say, with lazy I/O, you
24 cannot be guaranteed that your file handles will be closed as soon as you have
25 finished reading data from them.
26
27 We want to keep the same properties of constant memory usage from lazy I/O, yet
28 have guarantees that scarce resources will be freed as early as possible.
29
30 Enumerator
31 ===========================
32
33 __Note__: This is biased towards John Millikin's enumerator package, as that is
34 the package with which I have the most familiarity.
35
36 The concept of an enumerator is fairly simple. We have an `Iteratee` which
37 "consumes" data. It keeps its state while being fed data by an `Enumerator`.
38 The `Enumerator` will feed data a few chunks at a time to an `Iteratee`,
39 transforming the `Iteratee`'s state at each call. Additionally, there is an
40 `Enumeratee` that acts as both an `Enumerator` and `Iteratee`.
41
42 As a result, there are a few changes to code structure that need to take place
43 in order to fully leverage enumerators:
44
45 * The `Enumerator`s control code flow. This is an Inversion of Control (IoC)
46 technique.
47
48 __Practical ramification__: `Iteratee` code can be more difficult to
591fef4 @snoyberg Added clarifications/requirements to README.md
snoyberg authored
49 structure. Note that this is a subjective opinion, noted by many newcomers to
50 the enumerator paradigm.
51
52 __Requirement__: Nothing specific, likely addressing the requirements
53 below will automatically solve this.
9b19c7b @snoyberg Added conduit README
snoyberg authored
54
55 * `Iteratee`s are not able to allocate scarce resources. Since they do not
56 have any control of the flow of the program, they cannot guarantee that
57 the resources will be released, especially in the presence of exceptions.
58
59 __Practical ramification__: There is no way to create an `iterFile`, which
60 will stream data into a file. Instead, you must allocate a file handle
61 before entering the `Iteratee` and pass that in. In some cases, such an
62 approach would mean file handles are kept open too long.
63
591fef4 @snoyberg Added clarifications/requirements to README.md
snoyberg authored
64 __Clarification__: It is certainly *possible* to write iterFile, but there
65 are no guarantees that it will close the allocated `Handle`, since the calling
66 `Enumerator` may throw an exception before sending an `EOF` to the `Iteratee`.
67
68 __Requirement__: We need a solution which would allow code something like
aa4ec60 @snoyberg Typo correction
snoyberg authored
69 the following to correctly open and close file handles, even in the presence
591fef4 @snoyberg Added clarifications/requirements to README.md
snoyberg authored
70 of exceptions.
71
72 run $ enumFile "input.txt" $$ iterFile "output.txt"
73
9b19c7b @snoyberg Added conduit README
snoyberg authored
74 * None of this plays nicely with monad transformers, though this does not
75 seem to be an inherent problem with enumerators, instead with the current
76 library.
77
78 __Practical ramification__: You cannot enumerate a file when running in a
79 `ReaderT IO`.
80
591fef4 @snoyberg Added clarifications/requirements to README.md
snoyberg authored
81 __Requirement__: The following pseudo-code should work:
82
83 runReaderT (run $ enumFile "input" $$ iterFile "output") ()
84
9b19c7b @snoyberg Added conduit README
snoyberg authored
85 * Instead of passing around a `Handle` to pull data from, your code should
86 live inside an `Iteratee`. This makes it difficult and/or impossible to
87 interleave two different sources.
88
89 __Practical ramification__: Even with libraries designed to interoperate
90 (like http-enumerator and warp), it is not possible to create a proper
91 streaming HTTP proxy.
92
d7b38f7 @snoyberg Nicer ResourceIO
snoyberg authored
93 __Note__: This might actually be possible using the "nested iteratee"
94 technique. I would still posit that this is far too complicated a
95 solution to the problem.
96
591fef4 @snoyberg Added clarifications/requirements to README.md
snoyberg authored
97 __Requirement__: It should be possible to pass around some type of producer
98 which will be called piecemeal. For example, the request body in Warp should be
99 expressible as:
100
101 data Request = Request
102 { ...
103 , requestBody :: Enumerator ByteString IO ()
104 }
105
106 Applications should be able to do something like:
107
108 bs <- requestBody req $$ takeBytes 10
109 someAction bs
110 rest <- requestBody req $$ takeRest
111 finalAction rest
112
113 Note that there may be other approaches to solving the same problem, this
114 is just one possibility.
115
9b19c7b @snoyberg Added conduit README
snoyberg authored
116 * While the concepts are simple, actually writing low-level Iteratee code is
117 very complex. This in turn intimidates users from adopting the approach.
591fef4 @snoyberg Added clarifications/requirements to README.md
snoyberg authored
118 Again, this is a subjective measurement.
119
120 __Requirement__: Newcomers should be able to easily understand how to use
121 the package, and with a little more training feel comfortable writing their own
122 producers/consumers.
9b19c7b @snoyberg Added conduit README
snoyberg authored
123
124 Conduits
125 ===========================
126
127 Conduits attempt to provide a similar high-level API to enumerators, while
128 providing a drastically different low-level implementation. The first question
129 to visit is: why does the enumerator need to control flow of the program? The
130 main purpose is to ensure that resources are released properly. But this in
131 fact solved only *half* the problem; iteratees still cannot release resources.
132
133 ResourceT
134 ---------------------------
135
136 So our first issue to address is to create a new way to deal with resource
137 allocation. We represent this as a monad transformer, `ResourceT`. It works as
138 follows:
139
140 * You can register a cleanup action, which will return a `ReleaseKey`.
141
142 * If you pass your `ReleaseKey` to the `release` function, your action will be
143 called automatically, and your action will be unregistered.
144
145 * When the monad is exited (via `runRelease`), all remaining registered actions
146 will be called.
147
148 * All of this is provided in an exception-safe manner.
149
150 For example, you would be able to open a file handle, and then register an
151 action to close the file handle. In your code, you would call `release` on your
152 `ReleaseKey` as soon as you reach the end of the contents you are streaming. If
153 that code is never reached, the file handle will be released when the monad
154 terminates.
155
156 Source
157 ---------------------------
158
159 Now that we have a way to deal with resources, we can take a radically
160 different approach to production of data streams. Instead of a push system,
161 where the enumerators sends data down the pipeline, we have a pull system,
162 where data is requested from the source. Additionally, a source allows
163 buffering of input data, so data can be "pushed back" onto the source to be
164 available for a later call.
165
166 Sink
167 ---------------------------
168
a740377 @juhp correct a few small typos in README.md
juhp authored
169 A `Sink` is the corollary to an `Iteratee`. It takes a stream of data, and can
9b19c7b @snoyberg Added conduit README
snoyberg authored
170 return a result, consisting of leftover input and an output. Like an
171 `Iteratee`, a `Sink` provides a `Monad` instance, which allows easy chaining
172 together of `Sink`s.
173
174 However, a big difference is that your code needn't live in the `Sink` monad.
175 You can easily pass around your sources and connect them to different `Sink`s.
176 As a practical example, when the Web Application Interface (WAI) is translated
177 to conduits, the application lives in the `ResourceT IO` monad, and the
178 `Request` value contains a `requestBody` record, which is a `Source IO
179 ByteString`.
180
181 Conduit
182 ---------------------------
183
a740377 @juhp correct a few small typos in README.md
juhp authored
184 Conduits are simply functions that take a stream of input data and return
9b19c7b @snoyberg Added conduit README
snoyberg authored
185 leftover input as well as a stream of output data. Conduits are far simpler to
a740377 @juhp correct a few small typos in README.md
juhp authored
186 implement than their corollary, `Enumeratee`s.
9b19c7b @snoyberg Added conduit README
snoyberg authored
187
188 Connecting
189 ---------------------------
190
191 While you can directly pull data from a `Source`, or directly push to a `Sink`, the easiest approach is to use the built-in connect operators. These follow the naming convention from the enumerator package, e.g.:
192
193 sourceFile "myfile.txt" $$ sinkFile "mycopy.txt"
194 sourceFile "myfile.txt" $= uppercase {- a conduit -} $$ sinkFile "mycopy.txt"
195 fromList [1..10] $$ Data.Conduit.List.map (+ 1) =$ fold (+) 0
196
197 Trade-offs
198 ===========================
199
200 Overall, the approach achieves the goals I had hoped for. The main downside in
201 its current form is its reliance on mutable data. Instead of having an
202 `Iteratee` return a new `Iteratee`, thereby provide an illusion of mutability,
203 in conduit the sources and sinks must maintain their state internally. As a
204 result, code must live in IO and usually use something like an IORef to keep
205 track of the current state.
206
207 I believe this to be an acceptable trade-off, since:
208
209 1. Virtually all conduit code will be performing I/O, so staying in the `IO`
210 monad is reasonable.
211 2. By using `monad-control`, conduit can work with any monad *based* on `IO`,
212 meaning all standard transformers (except `ContT`) can be used.
213 3. Enumerator experience has shown that the majority of the time, you construct
214 `Iteratee`s by using built-in functions, such as fold and map. Therefore,
215 the complication of tracking mutable state will usually be abstracted from
216 users.
217
218 Another minor point is that, in order to provide an efficient `Monad` instance,
219 the `Sink` type is complicated with tracking two cases: a `Sink` which expects
220 data and one which does not. As expressed in point (3) above, this should not
221 have a major impact for users.
222
223 Finally, since most `Source`s and `Sink`s begin their life by allocating some
224 mutable variable, both types allow some arbitrary monadic action to be run
225 before actual processing begins. The monad (et al) instances and connect
226 functions are all built to run this action once and then continue operation.
227
228 Status
229 ===========================
230
231 This is currently no more than a proof-of-concept, to see the differences
232 between enumerators and conduits for practical problems. This may serve as a
233 basis for WAI and Yesod in the future, but that will only be after careful
234 vetting of the idea. Your input is greatly appreciated!
d9914af @snoyberg Added some notes
snoyberg authored
235
236 Notes
237 ===========================
238
239 This is just a collection of my personal notes, completely unorganized.
240
241 * In enumerator, it's relatively easy to combined multiple `Iteratee`s into
242 an `Enumeratee`. The equivalent (turning `Sink`s into a `Conduit`) is
243 harder. See, for example, chunking in http-conduit. Perhaps this can be
244 improved with a better `sequence`.
245
246 * Names and operators are very long right now. Is that a feature or a bug?
22d9341 @snoyberg More README notes
snoyberg authored
247
248 * Should we use Vector in place of lists?
249
250 * It might be worth transitioning to RegionT. Will the extra type parameter
251 scare people away?
252
253 * Perhaps the whole BSource/BConduit concept doesn't need to be exposed to
254 the user. Advantage of exposing: it makes it obvious at the type level that
255 a source/conduit can be reused, and possibly more efficient implementations
256 (no double buffering). Disadvantage: more functions to implement/user to
257 keep track of, so harder to use.
4b60dca @snoyberg MonadIO instance
snoyberg authored
258
259 * I dislike the travesty which is `type FilePath = [Char]`, so I'm using the
260 system-filepath package. I've used it for a lot of internal code at work,
261 and it performs wonderfully. If anyone is concerned about this approach,
262 let me know.
ccb4652 @snoyberg A bunch of renames
snoyberg authored
263
264 * Should we rename ConduitM to Conduit (et al), and then give Conduit a name
265 like ConduitRaw? After all, users interact with the current "M" versions
266 more often than anything else.
Something went wrong with that request. Please try again.