Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 203 lines (150 sloc) 7.567 kb
55daf57 @krestenkrab Initial commit
authored
1
2 <h1>Link Indexing</h1>
3
4 This module allows you to create simple secondary indexes
5 in Riak based on Riaks' link model. The basic idea is thus:
6
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
7 Assume we model person and companies as separate buckets:
55daf57 @krestenkrab Initial commit
authored
8
9 /riak/person/Name
10 /riak/company/Name
11
12 When you store a `/riak/person/Kresten` object, you describe the
13 employment relation by including this link in the Kresten object:
14
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
15 Link: </riak/company/Trifork>; riaktag="idx@employs"
55daf57 @krestenkrab Initial commit
authored
16
082d40e @krestenkrab Update readme.
authored
17 The magic is that `riak_link_index` will then automatically add (and
18 maintain) a link in the opposite direction; from `Trifork` to
19 `Kresten`, and that link will have tag `employs`. The tag needs to
20 start with `idx@` for `riak_link_index` to recognize it.
55daf57 @krestenkrab Initial commit
authored
21
082d40e @krestenkrab Update readme.
authored
22 Whenever you update or delete a person object, you can pass in new (or
23 multiple) such links, and the old reverse links will automatically be
24 deleted/updated as appropriate. Deleting a company object has no
25 effect the other way around.
55daf57 @krestenkrab Initial commit
authored
26
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
27 > The objects that contain the reverse links (in this case
2e6386f @krestenkrab Make riak_link_index use vset module to manage conflits
authored
28 e.g. `/riak/company/Trifork`) will have special content used to manage
29 the links, so you cannot use them for other stuff!
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
30
55daf57 @krestenkrab Initial commit
authored
31 This module also allows you to install an `index_hook`, which can be
32 used to extract links from your objects. Index hooks can be written in
33 both JavaScript for Erlang.
34
35
36 Installation
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
37 ------------
38
39 > Notice! this only works on the `master` branch of Riak; this
40 > does not work on `riak-0.14.*` releases, because it depends on the
41 > pre- and post-commit hooks to both run in the same internal process
42 > (the `riak_kv_put_fsm`, if you must know).
55daf57 @krestenkrab Initial commit
authored
43
44 To install, you need to make the `ebin` directory containing
45 `riak_link_index.beam` accessible to your Riak install. You can do that
46 by adding a line like this to riaks `etc/vm.args`
47
48 <pre>-pz /Path/to/riak_function_contrib/other/riak_link_index/ebin</pre>
49
50 If you're an Erlang wiz there are other ways, but that should work.
51
52
53 Next, you configure a bucket to support indexing. This involves two things:
54
55 1. Install a set of commit hooks (indexing needs both a pre- and a
56 post-commit hook).
57
58 2. (optionally) configure a function to extract index information
59 from your bucket data. We'll do that later, and start out with
60 the easy version.
61
62 If your bucket is name `person`, it could be done thus:
63
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
64 prompt$ cat > bucket_props.json
65 { "props" : {
66 "precommit" : [{"mod": "riak_link_index", "fun": "precommit"}],
67 "postcommit" : [{"mod": "riak_link_index", "fun": "postcommit"}]
68 }}
69 ^D
70 prompt$ curl -X PUT --data @bucket_props.json \
71 -H 'Content-Type: application/json' \
72 http://127.0.0.1:8091/riak/person
55daf57 @krestenkrab Initial commit
authored
73
74 There you go: you're ready for some action.
75
76 Explicit Indexing
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
77 -----------------
55daf57 @krestenkrab Initial commit
authored
78
79
80 The simple indexer now works for the `person` bucket, by interpreting
81 links on `/riak/person/XXX` objects that have tags starting with
82 `idx@`. The special `idx@` prefix is recognized by the indexer, and
83 it will create and maintain a link in the opposite direction, tagged
84 with whatever comes after the `idx@` prefix.
85
86 Let's say we add me:
87
88 curl -X PUT \
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
89 -H 'Link: </riak/company/Trifork>; riaktag="idx@employs"' \
55daf57 @krestenkrab Initial commit
authored
90 -H 'Content-Type: application/json' \
91 --data '{ "name": "Kresten Krab Thorup", "employer":"Trifork" }' \
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
92 http://127.0.0.1:8091/riak/person/kresten
55daf57 @krestenkrab Initial commit
authored
93
94 As this gets written to Riak, the indexer will then
95 create an object by the name of `/riak/company/Trifork`,
96 which has a link pointing back to me:
97
98 curl -v -X GET http://127.0.0.1:8091/riak/company/Trifork
99 < 200 OK
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
100 < Link: </riak/person/Kresten>; riaktag="employs"
55daf57 @krestenkrab Initial commit
authored
101 < Content-Length: 0
102
103 If there was already an object at `/company/Trifork`, then the indexer
104 would leave the contents alone, but still add the reverse link. If no
105 such object existed, then it would be created with empty contents.
106
107 Link Walking
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
108 ------------
55daf57 @krestenkrab Initial commit
authored
109
110 The beauty of this is that you can now do link-walk queries to find
111 your stuff. For instance, this link query should give you a list of
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
112 person employed at Trifork. Lucky them :-)
55daf57 @krestenkrab Initial commit
authored
113
114 curl http://localhost:8091/riak/company/Trifork/_,_,employs
115
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
116 Using a `link_index` hook
117 -------------------------
55daf57 @krestenkrab Initial commit
authored
118
119 You can also install an index hook as a bucket property, which designates
120 a function that can be used to decide which index records to create. This way
121 you can keep the index creation on the server side; and also more easily
122 generate some more indexes.
123
124 You install the index hook the same way you install a pre-commit hook; and the
125 hook can be written in either Erlang or JavaScript, just like precommits.
126
127 // Return list of [Bucket,Key] that will link to me
128 function employmentIndexing(metaData, contents) {
129 personData = JSON.parse(contents);
130 if(personData.employer) {
131 return [ ['company', personData.employer] ];
132 } else {
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
133 return [];
55daf57 @krestenkrab Initial commit
authored
134 }
135 }
136
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
137 Assume you have that code in `/tmp/js_source/my_indexer.s`, and
138 configured `{js_source_dir, "/tmp/js_source"}` in the `riak_kv`
139 section of your `etc/app.config`.
140
141 Then, to install it as an indexer, you need to get install it as a
142 bucket property in the person bucket. You can have more indexes, so
143 it's a list of functions. Link-Index hooks can also be erlang
144 functions.
55daf57 @krestenkrab Initial commit
authored
145
146 prompt$ cat > bucket_props.json
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
147 { "props" : {
148 "link_index" : [{"name": "employmentIndexing",
149 "tag" : "employs"}],
150 }}
55daf57 @krestenkrab Initial commit
authored
151 ^D
152 prompt$ curl -X PUT --data @bucket_props.json \
153 -H 'Content-Type: application/json' \
154 http://127.0.0.1:8091/riak/person
155
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
156 Notice, that the link index also needs a `tag` property. You can
157 install multiple index functions, but they should all have separate
158 tags. Any `idx@...` tagged links that do not correspond to a
159 registered link index are processed as "explicit indexing. In fact,
160 the link_index hook is just a convenient way to have code insert the
2e6386f @krestenkrab Make riak_link_index use vset module to manage conflits
authored
161 `idx@`-links for you.
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
162
55daf57 @krestenkrab Initial commit
authored
163 Now, we can add objects to the person bucket *without* having to put
164 the `idx@employs` link on the object. The index hook will do it for
165 you. Happy you!
166
ae23c07 @krestenkrab First version that actually works. Happy me.
authored
167 curl -X POST \
168 -H 'Content-Type: application/json' \
169 --data '{ "name": "Justin Sheehy", "employer":"Basho" }' \
170 http://127.0.0.1:8091/riak/person
171
2e6386f @krestenkrab Make riak_link_index use vset module to manage conflits
authored
172 > While you can have multiple `link_index`'es, it is important that
173 each `link_index` as its own distinguished tag, because
174 `riak_link_index` will process each link index hook by first deleting
175 any links with said tag, and then recomputing them based on the new
176 content.
177
55daf57 @krestenkrab Initial commit
authored
178
082d40e @krestenkrab Update readme.
authored
179 Consistency
180 -----------
55daf57 @krestenkrab Initial commit
authored
181
082d40e @krestenkrab Update readme.
authored
182 The indexer will handle delete/update of your records as appropriate,
183 and should work fine with `allow_mult` buckets too. In fact, it is
184 recommended to enable a `allow_mult=true` on the buckets containing
2e6386f @krestenkrab Make riak_link_index use vset module to manage conflits
authored
185 the company objects (company in my example above), otherwise
186 conflicting updates may be lost.
187
188 The indexer also manages conflicting updates to the link objects;
189 which is pretty cool. Say, at the same time someone deletes some
190 person object, and another process creates a new person object. In
191 that case, the index object (in the company bucket) may end up with a
192 conflicting update (i.e. get siblings); which would normally mean that
193 someone has to take action on resolving the conflict. To manage this
194 situation, `riak_link_index` stores a [vclock-backed
195 set](src/vset.erl) in the content part of the index object (the
196 company object), which is a set abstraction, which allows automatic
197 merging based on each element in the set having its own vector clock.
198 So, if someone adds a link, and someone else deletes a different link,
199 then the result is quite easy to handle.
55daf57 @krestenkrab Initial commit
authored
200
201
202
Something went wrong with that request. Please try again.