Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 435 lines (312 sloc) 16.663 kb
e2c4f69 import of work in progress website
Noah Slater authored
1 <title>View Cookbook for SQL Jockeys</title>
2
3 <meta charset="utf-8">
4
5 <link rel="stylesheet" href="../style.css">
6
7 <link rel="prev" href="notifications.html">
8
9 <link rel="next" href="security.html">
10
7ee8c0f @janl move script tag up
janl authored
11 <script src="../script.js"></script>
12
e2c4f69 import of work in progress website
Noah Slater authored
13 <h2 id="cookbook">View Cookbook for SQL Jockeys</h2>
14
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
15 <p>This is a collection of some common SQL queries and how to get the same result in CouchDB. The key to remember here is that CouchDB does not work like an SQL database at all and that best practices from the SQL world do not translate well or at all to CouchDB. This chapter’s “cookbook” assumes that you are familiar with the CouchDB basics such as creating and updating databases and documents.
e2c4f69 import of work in progress website
Noah Slater authored
16
17 <h3 id="using">Using Views</h3>
18
7052021 @janl clarify cookbook
janl authored
19 <p>How you would do this in SQL:
e2c4f69 import of work in progress website
Noah Slater authored
20
21 <pre>
22 CREATE TABLE
23 </pre>
24
25 <p>or:
26
27 <pre>
28 ALTER TABLE
29 </pre>
30
7052021 @janl clarify cookbook
janl authored
31 <p>How you can do this in CouchDB:
32
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
33 <p>Using views is a two-step process. First you <em>define</em> a view; then you <em>query</em> it. This is analogous to defining a table structure (with indexes) using <code>CREATE TABLE</code> or <code>ALTER TABLE</code> and querying it using an SQL query.
e2c4f69 import of work in progress website
Noah Slater authored
34
35 <h4 id="defining">Defining a View</h4>
36
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
37 <p>Defining a view is done by creating a special document in a CouchDB database. The only real specialness is the <code>_id</code> of the document, which starts with <code>_design/</code>—for example, <code>_design/application</code>. Other than that, it is just a regular CouchDB document. To make sure CouchDB understands that you are defining a view, you need to prepare the contents of that design document in a special format. Here is an example:
e2c4f69 import of work in progress website
Noah Slater authored
38
39 <pre>
40 {
41 "_id": "_design/application",
42 "_rev": "1-C1687D17",
43 "views": {
44 "viewname": {
45 "map": "function(doc) { ... }",
46 "reduce": "function(keys, values) { ... }"
47 }
48 }
49 }
50 </pre>
51
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
52 <p>We are defining a view <code>viewname</code>. The definition of the view consists of two functions: the <em>map</em> function and the <em>reduce</em> function. Specifying a reduce function is optional. We’ll look at the nature of the functions later. Note that <code>viewname</code> can be whatever you like: <code>users</code>, <code>by-name</code>, or <code>by-date</code> are just some examples.
e2c4f69 import of work in progress website
Noah Slater authored
53
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
54 <p>A single design document can also include multiple view definitions, each identified by a unique name:
e2c4f69 import of work in progress website
Noah Slater authored
55
56 <pre>
57 {
58 "_id": "_design/application",
59 "_rev": "1-C1687D17",
60 "views": {
61 "viewname": {
62 "map": "function(doc) { ... }",
63 "reduce": "function(keys, values) { ... }"
64 },
65 "anotherview": {
66 "map": "function(doc) { ... }",
67 "reduce": "function(keys, values) { ... }"
68 }
69 }
70 }
71 </pre>
72
73 <h4 id="querying">Querying a View</h4>
74
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
75 <p>The name of the design document and the name of the view are significant for querying the view. To query the view <code>viewname</code>, you perform an HTTP <code>GET</code> request to the following URI:
e2c4f69 import of work in progress website
Noah Slater authored
76
77 <pre>
78 /database/_design/application/_view/viewname
79 </pre>
80
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
81 <p><code>database</code> is the name of the database you created your design document in. Next up is the design document name, and then the view name prefixed with <code>_view/</code>. To query <code>anotherview</code>, replace <code>viewname</code> in that URI with <code>anotherview</code>. If you want to query a view in a different design document, adjust the design document name.
e2c4f69 import of work in progress website
Noah Slater authored
82
83 <h4 id="mapreduce">MapReduce Functions</h4>
84
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
85 <p>MapReduce is a concept that solves problems by applying a two-step process, aptly named the <em>map</em> phase and the <em>reduce</em> phase. The map phase looks at all documents in CouchDB separately one after the other and creates a <em>map result</em>. The map result is an ordered list of key/value pairs. Both <code>key</code> and <code>value</code> can be specified by the user writing the map function. A map function may call the built-in <code>emit(key, value)</code> function 0 to <em>N</em> times per document, creating a row in the map result per invocation.
e2c4f69 import of work in progress website
Noah Slater authored
86
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
87 <p>CouchDB is smart enough to run a map function only once for every document, even on subsequent queries on a view. Only changes to documents or new documents need to be processed anew.
e2c4f69 import of work in progress website
Noah Slater authored
88
89 <h5 id="map">Map functions</h5>
90
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
91 <p>Map functions run in isolation for every document. They can’t modify the document, and they can’t talk to the outside world—they can’t have <em>side effects</em>. This is required so that CouchDB can guarantee correct results without having to recalculate a complete result when only one document gets changed.
e2c4f69 import of work in progress website
Noah Slater authored
92
93 <p>The map result looks like this:
94
95 <pre>
96 {"total_rows":3,"offset":0,"rows":[
97 {"id":"fc2636bf50556346f1ce46b4bc01fe30","key":"Lena","value":5},
98 {"id":"1fb2449f9b9d4e466dbfa47ebe675063","key":"Lisa","value":4},
99 {"id":"8ede09f6f6aeb35d948485624b28f149","key":"Sarah","value":6}
100 ]}
101 </pre>
102
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
103 <p>It is a list of rows sorted by the value of <code>key</code>. The <code>id</code> is added automatically and refers back to the document that created this row. The <code>value</code> is the data you’re looking for. For example purposes, it’s the girl’s age.
e2c4f69 import of work in progress website
Noah Slater authored
104
105 <p>The map function that produces this result is:
106
107 <pre>
108 function(doc) {
109 if(doc.name &amp;&amp; doc.age) {
110 emit(doc.name, doc.age);
111 }
112 }
113 </pre>
114
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
115 <p>It includes the <code>if</code> statement as a sanity check to ensure that we’re operating on the right fields and calls the <code>emit</code> function with the name and age as the key and value.
e2c4f69 import of work in progress website
Noah Slater authored
116
117 <h5 id="reduce">Reduce functions</h5>
118
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
119 <p>Reduce functions are explained in <a href="#aggregate">the section called “Aggregate Functions”</a>.
e2c4f69 import of work in progress website
Noah Slater authored
120
121 <h3 id="key">Look Up by Key</h3>
122
7052021 @janl clarify cookbook
janl authored
123 <p>How you would do this in SQL:
e2c4f69 import of work in progress website
Noah Slater authored
124
125 <pre>
126 SELECT <em>field</em> FROM <em>table</em> WHERE value="<em>searchterm</em>"
127 </pre>
a3df50c html improvements
Noah Slater authored
128
7052021 @janl clarify cookbook
janl authored
129 <p>How you can do this in CouchDB:
130
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
131 <p>Use case: get a <em>result</em> (which can be a record or set of records) associated with a <em>key</em> (<code>"searchterm"</code>).
e2c4f69 import of work in progress website
Noah Slater authored
132
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
133 <p>To look something up quickly, regardless of the storage mechanism, an index is needed. An index is a data structure optimized for quick search and retrieval. CouchDB’s map result is stored in such an index, which happens to be a B+ tree.
e2c4f69 import of work in progress website
Noah Slater authored
134
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
135 <p>To look up a value by <code>"searchterm"</code>, we need to put all values into the key of a view. All we need is a simple map function:
e2c4f69 import of work in progress website
Noah Slater authored
136
137 <pre>
138 function(doc) {
139 if(doc.value) {
140 emit(doc.value, null);
141 }
142 }
143 </pre>
144
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
145 <p>This creates a list of documents that have a <code>value</code> field sorted by the data in the <code>value</code> field. To find all the records that match <code>"searchterm"</code>, we query the view and specify the search term as a query parameter:
e2c4f69 import of work in progress website
Noah Slater authored
146
147 <pre>
148 /database/_design/application/_view/viewname?key="searchterm"
149 </pre>
150
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
151 <p>Consider the documents from the previous section, and say we’re indexing on the <code>age</code> field of the documents to find all the five-year-olds:
e2c4f69 import of work in progress website
Noah Slater authored
152
153 <pre>
154 function(doc) {
155 if(doc.age &amp;&amp; doc.name) {
156 emit(doc.age, doc.name);
157 }
158 }
159 </pre>
160
161 <p>Query:
162
163 <pre>
164 /ladies/_design/ladies/_view/age?key=5
165 </pre>
166
167 <p>Result:
168
169 <pre>
170 {"total_rows":3,"offset":1,"rows":[
171 {"id":"fc2636bf50556346f1ce46b4bc01fe30","key":5,"value":"Lena"}
172 ]}
173 </pre>
174
175 <p>Easy.
176
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
177 <p>Note that you have to emit a value. The view result includes the associated document ID in every row. We can use it to look up more data from the document itself. We can also use the <code>?include_docs=true</code> parameter to have CouchDB fetch the documents individually for us.
e2c4f69 import of work in progress website
Noah Slater authored
178
179 <h3 id="prefix">Look Up by Prefix</h3>
180
7052021 @janl clarify cookbook
janl authored
181 <p>How you would do this in SQL:
e2c4f69 import of work in progress website
Noah Slater authored
182
183 <pre>
184 SELECT <em>field</em> FROM <em>table</em> WHERE <em>value</em> LIKE "<em>searchterm</em>%"
185 </pre>
186
7052021 @janl clarify cookbook
janl authored
187 <p>How you can do this in CouchDB:
188
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
189 <p>Use case: find all documents that have a field value that starts with <code>searchterm</code>. For example, say you stored a MIME type (like <code>text/html</code> or <code>image/jpg</code>) for each document and now you want to find all documents that are images according to the MIME type.
e2c4f69 import of work in progress website
Noah Slater authored
190
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
191 <p>The solution is very similar to the previous example: all we need is a map function that is a little more clever than the first one. But first, an example document:
e2c4f69 import of work in progress website
Noah Slater authored
192
193 <pre>
194 {
195 "_id": "Hugh Laurie",
196 "_rev": "1-9fded7deef52ac373119d05435581edf",
197 "mime-type": "image/jpg",
198 "description": "some dude"
199 }
200 </pre>
201
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
202 <p>The clue lies in extracting the prefix that we want to search for from our document and putting it into our view index. We use a regular expression to match our prefix:
e2c4f69 import of work in progress website
Noah Slater authored
203
204 <pre>
205 function(doc) {
206 if(doc["mime-type"]) {
207 // from the start (^) match everything that is not a slash ([^\/]+) until
208 // we find a slash (\/). Slashes needs to be escaped with a backslash (\/)
209 var prefix = doc["mime-type"].match(/^[^\/]+\//);
210 if(prefix) {
211 emit(prefix, null);
212 }
213 }
214 }
215 </pre>
216
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
217 <p>We can now query this view with our desired MIME type prefix and not only find all images, but also text, video, and all other formats:
e2c4f69 import of work in progress website
Noah Slater authored
218
219 <pre>
220 /files/_design/finder/_view/by-mime-type?key="image/"
221 </pre>
222
223 <h3 id="aggregate">Aggregate Functions</h3>
224
7052021 @janl clarify cookbook
janl authored
225 <p>How you would do this in SQL:
e2c4f69 import of work in progress website
Noah Slater authored
226
227 <pre>
228 SELECT COUNT(<em>field</em>) FROM <em>table</em>
229 </pre>
230
7052021 @janl clarify cookbook
janl authored
231 <p>How you can do this in CouchDB:
232
e2c4f69 import of work in progress website
Noah Slater authored
233 <p>Use case: calculate a derived value from your data.
234
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
235 <p>We haven’t explained reduce functions yet. Reduce functions are similar to aggregate functions in SQL. They compute a value over multiple documents.
e2c4f69 import of work in progress website
Noah Slater authored
236
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
237 <p>To explain the mechanics of reduce functions, we’ll create one that doesn’t make a whole lot of sense. But this example is easy to understand. We’ll explore more useful reductions later.
e2c4f69 import of work in progress website
Noah Slater authored
238
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
239 <p>Reduce functions operate on the output of the map function (also called the <em>map re⁠sult</em> or <em>intermediate result</em>). The reduce function’s job, unsurprisingly, is to reduce the list that the map function produces.
e2c4f69 import of work in progress website
Noah Slater authored
240
241 <p>Here’s what our summing reduce function looks like:
242
243 <pre>
244 function(keys, values) {
245 var sum = 0;
246 for(var idx in values) {
247 sum = sum + values[idx];
248 }
249 return sum;
250 }
251 </pre>
252
253 <p>Here’s an alternate, more idiomatic JavaScript version:
254
255 <pre>
256 function(keys, values) {
257 var sum = 0;
258 values.forEach(function(element) {
259 sum = sum + element;
260 });
261 return sum;
262 }
263 </pre>
264
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
265 <p>This reduce function takes two arguments: a list of <code>keys</code> and a list of <code>values</code>. For our summing purposes we can ignore the <code>keys</code>-list and consider only the <code>value</code> list. We’re looping over the list and add each item to a running total that we’re returning at the end of the function.
e2c4f69 import of work in progress website
Noah Slater authored
266
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
267 <p>You’ll see one difference between the map and the reduce function. The map function uses <code>emit()</code> to create its result, whereas the reduce function returns a value.
e2c4f69 import of work in progress website
Noah Slater authored
268
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
269 <p>For example, from a list of integer values that specify the age, calculate the sum of all years of life for the news headline, “786 life years present at event.” A little contrived, but very simple and thus good for demonstration purposes. Consider the documents and the map view we used earlier in this chapter.
e2c4f69 import of work in progress website
Noah Slater authored
270
271 <p>The reduce function to calculate the total age of all girls is:
272
273 <pre>
274 function(keys, values) {
275 return sum(values);
276 }
277 </pre>
278
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
279 <p>Note that, instead of the two earlier versions, we use CouchDB’s predefined <code>sum()</code> function. It does the same thing as the other two, but it is such a common piece of code that CouchDB has it included.
e2c4f69 import of work in progress website
Noah Slater authored
280
281 <p>The result for our reduce view now looks like this:
282
283 <pre>
284 {"rows":[
285 {"key":null,"value":15}
286 ]}
287 </pre>
288
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
289 <p>The total sum of all <code>age</code> fields in all our documents is <code>15</code>. Just what we wanted. The <code>key</code> member of the result object is <code>null</code>, as we can’t know anymore which documents took part in the creation of the reduced result. We’ll cover more advanced reduce cases later on.
e2c4f69 import of work in progress website
Noah Slater authored
290
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
291 <p>As a rule of thumb, the reduce function should reduce a single scalar value. That is, an integer; a string; or a small, fixed-size list or object that includes an aggregated value (or values) from the <code>values</code> argument. It should never just return <code>values</code> or similar. CouchDB will give you a warning if you try to use reduce “the wrong way”:
e2c4f69 import of work in progress website
Noah Slater authored
292
293 <pre>
294 {"error":"reduce_overflow_error","message":"Reduce output must shrink more rapidly: Current output: ..."}
295 </pre>
296
297 <h3 id="unique">Get Unique Values</h3>
298
7052021 @janl clarify cookbook
janl authored
299 <p>How you would do this in SQL:
e2c4f69 import of work in progress website
Noah Slater authored
300
301 <pre>
302 SELECT DISTINCT <em>field</em> FROM <em>table</em>
303 </pre>
304
7052021 @janl clarify cookbook
janl authored
305 <p>How you can do this in CouchDB:
306
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
307 <p>Getting unique values is not as easy as adding a keyword. But a reduce view and a special query parameter give us the same result. Let’s say you want a list of tags that your users have tagged themselves with and no duplicates.
e2c4f69 import of work in progress website
Noah Slater authored
308
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
309 <p>First, let’s look at the source documents. We punt on <code>_id</code> and <code>_rev</code> attributes here:
e2c4f69 import of work in progress website
Noah Slater authored
310
311 <pre>
312 {
313 "name":"Chris",
314 "tags":["mustache", "music", "couchdb"]
315 }
316
317 {
318 "name":"Noah",
319 "tags":["hypertext", "philosophy", "couchdb"]
320 }
321
322 {
323 "name":"Jan",
324 "tags":["drums", "bike", "couchdb"]
325 }
326 </pre>
327
328 <p>Next, we need a list of all tags. A map function will do the trick:
329
330 <pre>
331 function(dude) {
332 if(dude.name &amp;&amp; dude.tags) {
333 dude.tags.forEach(function(tag) {
334 emit(tag, null);
335 });
336 }
337 }
338 </pre>
339
340 <p>The result will look like this:
341
342 <pre>
343 {"total_rows":9,"offset":0,"rows":[
344 {"id":"3525ab874bc4965fa3cda7c549e92d30","key":"bike","value":null},
345 {"id":"3525ab874bc4965fa3cda7c549e92d30","key":"couchdb","value":null},
346 {"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"couchdb","value":null},
347 {"id":"da5ea89448a4506925823f4d985aabbd","key":"couchdb","value":null},
348 {"id":"3525ab874bc4965fa3cda7c549e92d30","key":"drums","value":null},
349 {"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"hypertext","value":null},
350 {"id":"da5ea89448a4506925823f4d985aabbd","key":"music","value":null},
351 {"id":"da5ea89448a4506925823f4d985aabbd","key":"mustache","value":null},
352 {"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"philosophy","value":null}
353 ]}
354 </pre>
355
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
356 <p>As promised, these are all the tags, including duplicates. Since each document gets run through the map function in isolation, it cannot know if the same key has been emitted already. At this stage, we need to live with that. To achieve uniqueness, we need a reduce:
e2c4f69 import of work in progress website
Noah Slater authored
357
358 <pre>
359 function(keys, values) {
360 return true;
361 }
362 </pre>
363
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
364 <p>This reduce doesn’t do anything, but it allows us to specify a special query parameter when querying the view:
e2c4f69 import of work in progress website
Noah Slater authored
365
366 <pre>
367 /dudes/_design/dude-data/_view/tags?group=true
368 </pre>
369
370 <p>CouchDB replies:
371
372 <pre>
373 {"rows":[
374 {"key":"bike","value":true},
375 {"key":"couchdb","value":true},
376 {"key":"drums","value":true},
377 {"key":"hypertext","value":true},
378 {"key":"music","value":true},
379 {"key":"mustache","value":true},
380 {"key":"philosophy","value":true}
381 ]}
382 </pre>
383
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
384 <p>In this case, we can ignore the value part because it is always true, but the result includes a list of all our tags and no duplicates!
e2c4f69 import of work in progress website
Noah Slater authored
385
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
386 <p>With a small change we can put the reduce to good use, too. Let’s see how many of the non-unique tags are there for each tag. To calculate the tag frequency, we just use the summing up we already learned about. In the map function, we emit a <code>1</code> instead of <code>null</code>:
e2c4f69 import of work in progress website
Noah Slater authored
387
388 <pre>
389 function(dude) {
390 if(dude.name &amp;&amp; dude.tags) {
391 dude.tags.forEach(function(tag) {
392 emit(tag, 1);
393 });
394 }
395 }
396 </pre>
397
398 <p>In the reduce function, we return the sum of all values:
399
400 <pre>
401 function(keys, values) {
402 return sum(values);
403 }
404 </pre>
405
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
406 <p>Now, if we query the view with the <code>?group=true</code> parameter, we get back the count for each tag:
e2c4f69 import of work in progress website
Noah Slater authored
407
408 <pre>
409 {"rows":[
410 {"key":"bike","value":1},
411 {"key":"couchdb","value":3},
412 {"key":"drums","value":1},
413 {"key":"hypertext","value":1},
414 {"key":"music","value":1},
415 {"key":"mustache","value":1},
416 {"key":"philosophy","value":1}
417 ]}
418 </pre>
419
420 <h3 id="uniqueness">Enforcing Uniqueness</h3>
421
7052021 @janl clarify cookbook
janl authored
422 <p>How you would do this in SQL:
e2c4f69 import of work in progress website
Noah Slater authored
423
424 <pre>
425 UNIQUE KEY(<em>column</em>)
426 </pre>
427
7052021 @janl clarify cookbook
janl authored
428 <p>How you can do this in CouchDB:
429
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
430 <p>Use case: your applications require that a certain value exists only once in a database.
e2c4f69 import of work in progress website
Noah Slater authored
431
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
432 <p>This is an easy one: within a CouchDB database, each document must have a unique <code>_id</code> field. If you require unique values in a database, just assign them to a document’s <code>_id</code> field and CouchDB will enforce uniqueness for you.
e2c4f69 import of work in progress website
Noah Slater authored
433
b90afaf reverted 203b6254c3aaf206d52f
Noah Slater authored
434 <p>There’s one caveat, though: in the distributed case, when you are running more than one CouchDB node that accepts write requests, uniqueness can be guaranteed only per node or outside of CouchDB. CouchDB will allow two identical IDs to be written to two different nodes. On replication, CouchDB will detect a conflict and flag the document accordingly.
Something went wrong with that request. Please try again.