Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

First version that actually works. Happy me.

  • Loading branch information...
commit ae23c0711813fff24025411049b5fa7c95d7b94e 1 parent 55daf57
@krestenkrab authored
Showing with 109 additions and 65 deletions.
  1. +58 −31 README.md
  2. +51 −34 src/riak_link_index.erl
View
89 README.md
@@ -5,7 +5,7 @@
This module allows you to create simple secondary indexes
in Riak based on Riaks' link model. The basic idea is thus:
-Assume we model people and companies as separate buckets:
+Assume we model person and companies as separate buckets:
/riak/person/Name
/riak/company/Name
@@ -13,7 +13,7 @@ Assume we model people and companies as separate buckets:
When you store a `/riak/person/Kresten` object, you describe the
employment relation by including this link in the Kresten object:
- X-Riak-Link: </riak/company/Trifork>; riaktag="idx@employs"
+ Link: </riak/company/Trifork>; riaktag="idx@employs"
The magic is that `riak_link_index` will then automatically add a link
in the opposite direction; from `Trifork` to `Kresten`, and that link
@@ -24,13 +24,22 @@ Whenever you update a person object, you can pass in new (or multiple)
such links, and the old reverse links will automatically be
deleted/updated as appropriate.
+> The objects that contain the reverse links (in this case
+e.g. `/riak/company/Trifork`) will have empty contents. So you cannot
+use those bucket/keys for storing data!
+
This module also allows you to install an `index_hook`, which can be
used to extract links from your objects. Index hooks can be written in
both JavaScript for Erlang.
Installation
-============
+------------
+
+> Notice! this only works on the `master` branch of Riak; this
+> does not work on `riak-0.14.*` releases, because it depends on the
+> pre- and post-commit hooks to both run in the same internal process
+> (the `riak_kv_put_fsm`, if you must know).
To install, you need to make the `ebin` directory containing
`riak_link_index.beam` accessible to your Riak install. You can do that
@@ -52,22 +61,20 @@ Next, you configure a bucket to support indexing. This involves two things:
If your bucket is name `person`, it could be done thus:
-<pre>
-prompt$ cat > bucket_props.json
-{
- "precommit" : [{"mod": "riak_link_index", "fun": "precommit"}],
- "postcommit" : [{"mod": "riak_link_index", "fun": "postcommit"}]
-}
-^D
-prompt$ curl -X PUT --data @bucket_props.json \
- -H 'Content-Type: application/json' \
- http://127.0.0.1:8091/riak/person
-</pre>
+ prompt$ cat > bucket_props.json
+ { "props" : {
+ "precommit" : [{"mod": "riak_link_index", "fun": "precommit"}],
+ "postcommit" : [{"mod": "riak_link_index", "fun": "postcommit"}]
+ }}
+ ^D
+ prompt$ curl -X PUT --data @bucket_props.json \
+ -H 'Content-Type: application/json' \
+ http://127.0.0.1:8091/riak/person
There you go: you're ready for some action.
Explicit Indexing
-=================
+-----------------
The simple indexer now works for the `person` bucket, by interpreting
@@ -79,10 +86,10 @@ with whatever comes after the `idx@` prefix.
Let's say we add me:
curl -X PUT \
- -H 'X-Riak-Link: </riak/company/Trifork>; riaktag="idx@employs"' \
+ -H 'Link: </riak/company/Trifork>; riaktag="idx@employs"' \
-H 'Content-Type: application/json' \
--data '{ "name": "Kresten Krab Thorup", "employer":"Trifork" }' \
- http://127.0.0.1:8091/riak/people/kresten
+ http://127.0.0.1:8091/riak/person/kresten
As this gets written to Riak, the indexer will then
create an object by the name of `/riak/company/Trifork`,
@@ -90,7 +97,7 @@ which has a link pointing back to me:
curl -v -X GET http://127.0.0.1:8091/riak/company/Trifork
< 200 OK
- < X-Riak-Links: </riak/people/Kresten>; riaktag="employs"
+ < Link: </riak/person/Kresten>; riaktag="employs"
< Content-Length: 0
If there was already an object at `/company/Trifork`, then the indexer
@@ -98,16 +105,16 @@ would leave the contents alone, but still add the reverse link. If no
such object existed, then it would be created with empty contents.
Link Walking
-============
+------------
The beauty of this is that you can now do link-walk queries to find
your stuff. For instance, this link query should give you a list of
-people employed at Trifork. Lucky them :-)
+person employed at Trifork. Lucky them :-)
curl http://localhost:8091/riak/company/Trifork/_,_,employs
-Using a link_index hook
-=======================
+Using a `link_index` hook
+-------------------------
You can also install an index hook as a bucket property, which designates
a function that can be used to decide which index records to create. This way
@@ -117,35 +124,51 @@ generate some more indexes.
You install the index hook the same way you install a pre-commit hook; and the
hook can be written in either Erlang or JavaScript, just like precommits.
-Assume you have this installed inside `priv/` in your riak setup
-
// Return list of [Bucket,Key] that will link to me
function employmentIndexing(metaData, contents) {
personData = JSON.parse(contents);
if(personData.employer) {
return [ ['company', personData.employer] ];
} else {
- reuturn [];
+ return [];
}
}
-To install it as an indexer, you need to get install it as a bucket
-property in the person bucket. You can have more indexes, so it's a
-list of functions. Link-Index hooks can also be erlang functions.
+Assume you have that code in `/tmp/js_source/my_indexer.s`, and
+configured `{js_source_dir, "/tmp/js_source"}` in the `riak_kv`
+section of your `etc/app.config`.
+
+Then, to install it as an indexer, you need to get install it as a
+bucket property in the person bucket. You can have more indexes, so
+it's a list of functions. Link-Index hooks can also be erlang
+functions.
prompt$ cat > bucket_props.json
- {
- "link_index" : [{"name": "employmentIndexing", "tag": "employs"}],
- }
+ { "props" : {
+ "link_index" : [{"name": "employmentIndexing",
+ "tag" : "employs"}],
+ }}
^D
prompt$ curl -X PUT --data @bucket_props.json \
-H 'Content-Type: application/json' \
http://127.0.0.1:8091/riak/person
+Notice, that the link index also needs a `tag` property. You can
+install multiple index functions, but they should all have separate
+tags. Any `idx@...` tagged links that do not correspond to a
+registered link index are processed as "explicit indexing. In fact,
+the link_index hook is just a convenient way to have code insert the
+`idx@`-links on your behalf.
+
Now, we can add objects to the person bucket *without* having to put
the `idx@employs` link on the object. The index hook will do it for
you. Happy you!
+ curl -X POST \
+ -H 'Content-Type: application/json' \
+ --data '{ "name": "Justin Sheehy", "employer":"Basho" }' \
+ http://127.0.0.1:8091/riak/person
+
Maintenance
===========
@@ -153,6 +176,10 @@ Maintenance
The indexer will also handle delete/update of your records as
appropriate, and should work fine with `allow_mult` buckets too.
+In case of a net-split, the current implementation may loose deletes;
+i.e. your index may have some stale links. That'll be fixed one day,
+but involves stoing a replica of the links in the body of the indexes
+encoded as a vector map.
Status
======
View
85 src/riak_link_index.erl
@@ -26,9 +26,17 @@
-define(IDX_PREFIX,"idx@").
-define(JSPOOL_HOOK, riak_kv_js_hook).
+-ifdef(DEBUG).
+-define(debug(A,B),error_logger:info_msg(A,B)).
+-else.
+-define(debug(A,B),ok).
+-endif.
+
+
precommit(Object) ->
- {ok, StorageMod} = riak:local_client(),
+ ?debug("precommit in ~p", [Object]),
+
Bucket = riak_object:bucket(Object),
Key = riak_object:key(Object),
@@ -42,24 +50,23 @@ precommit(Object) ->
%% <riak/Bucket/Key>; riaktag="Tag"
%%
- case riak_object:is_updated(Object) of
+ case is_updated(Object) of
true ->
OldLinksToMe = get_index_links(riak_object:get_metadatas(Object)),
- [{MD,_Value}] = index_contents(StorageMod,
- Bucket,
+ [{MD,_Value}] = index_contents(Bucket,
[{ riak_object:get_update_metadata(Object),
riak_object:get_update_value(Object) }]),
IndexedObject = riak_object:update_metadata(Object, MD);
false ->
+ {ok, StorageMod} = riak:local_client(),
case StorageMod:get(Bucket, Key) of
{ok, OldRO} ->
OldLinksToMe = get_index_links(riak_object:get_metadatas(OldRO));
_ ->
OldLinksToMe = []
end,
- MDVs = index_contents(StorageMod,
- Bucket,
+ MDVs = index_contents(Bucket,
riak_object:get_contents(Object)),
IndexedObject = riak_object:set_contents(Object, MDVs)
end,
@@ -72,6 +79,8 @@ precommit(Object) ->
%% this only works in recent riak_kv master branch
put(?MODULE, {LinksToAdd, LinksToRemove}),
+ ?debug("precommit out ~p", [IndexedObject]),
+
IndexedObject.
postcommit(Object) ->
@@ -150,7 +159,7 @@ get_index_links(MDList) ->
get_all_links(Object) when element(1,Object) =:= r_object ->
get_all_links
- (case riak_object:is_updated(Object) of
+ (case is_updated(Object) of
true ->
[riak_object:get_update_metadata(Object)]
++ riak_object:get_metadatas(Object);
@@ -159,23 +168,23 @@ get_all_links(Object) when element(1,Object) =:= r_object ->
end);
get_all_links(MetaDatas) when is_list(MetaDatas) ->
- Links = lists:fold(fun(MetaData, Acc) ->
- case dict:find(?MD_LINKS, MetaData) of
- error ->
- Acc;
- {ok, LinksList} ->
- LinksList ++ Acc
- end
- end,
- [],
- MetaDatas),
+ Links = lists:foldl(fun(MetaData, Acc) ->
+ case dict:find(?MD_LINKS, MetaData) of
+ error ->
+ Acc;
+ {ok, LinksList} ->
+ LinksList ++ Acc
+ end
+ end,
+ [],
+ MetaDatas),
ordsets:from_list(Links).
-index_contents(StorageMod, Bucket, Contents) ->
+index_contents(Bucket, Contents) ->
%% grab indexes from bucket properties
- {ok, IndexHooks} = get_index_hooks(StorageMod, Bucket),
+ {ok, IndexHooks} = get_index_hooks(Bucket),
lists:map
(fun({MD,Value}) ->
@@ -210,18 +219,14 @@ remove_idx_links(MD) ->
compute_indexed_md(MD, Value, IndexHooks) ->
lists:foldl
(fun({struct, PropList}=IndexHook, MDAcc) ->
- {<<"tag">>, Tag} = proplists:lookup(PropList, <<"tag">>),
+ {<<"tag">>, Tag} = proplists:lookup(<<"tag">>, PropList),
Links = case dict:find(?MD_LINKS, MDAcc) of
error -> [];
{ok, MDLinks} -> MDLinks
end,
IdxTag = <<?IDX_PREFIX,Tag/binary>>,
KeepLinks =
- lists:filter(fun({{_,_}, TagValue}) when TagValue =:= IdxTag ->
- false;
- (_) ->
- true
- end,
+ lists:filter(fun({{_,_}, TagValue}) -> TagValue =/= IdxTag end,
Links),
NewLinksSansTag =
try apply_index_hook(IndexHook, MD, Value) of
@@ -242,9 +247,9 @@ compute_indexed_md(MD, Value, IndexHooks) ->
ResultLinks =
lists:map(fun({Bucket,Key}) when is_binary(Bucket), is_binary(Key) ->
- {{Bucket, Key}, Tag};
+ {{Bucket, Key}, IdxTag};
([Bucket, Key]) when is_binary(Bucket), is_binary(Key) ->
- {{Bucket, Key}, Tag}
+ {{Bucket, Key}, IdxTag}
end,
NewLinksSansTag)
++
@@ -259,7 +264,7 @@ compute_indexed_md(MD, Value, IndexHooks) ->
%%%%%% code from riak_kv_put_fsm %%%%%%
-get_index_hooks(_StorageMod, Bucket) ->
+get_index_hooks(Bucket) ->
{ok,Ring} = riak_core_ring_manager:get_my_ring(),
BucketProps = riak_core_bucket:get_bucket(Bucket, Ring),
@@ -267,17 +272,17 @@ get_index_hooks(_StorageMod, Bucket) ->
IndexHooks = proplists:get_value(link_index, BucketProps, []),
case IndexHooks of
<<"none">> ->
- [];
+ {ok, []};
{struct, Hook} ->
- [{struct, Hook}];
+ {ok, [{struct, Hook}]};
IndexHooks when is_list(IndexHooks) ->
- IndexHooks
+ {ok, IndexHooks};
+ V ->
+ error_logger:error_msg("bad value in bucket_prop ~p:link_index: ~p", [Bucket,V]),
+ {ok, []}
end.
--spec apply_index_hook(_,MD::dict(),Value::term()) ->
- [{{binary(),binary()},binary()}].
-
apply_index_hook({struct, Hook}, MD, Value) ->
Mod = proplists:get_value(<<"mod">>, Hook),
Fun = proplists:get_value(<<"fun">>, Hook),
@@ -344,3 +349,15 @@ jsonify_metadata_list(List) ->
string -> list_to_binary(List);
array -> List
end.
+
+is_updated(O) ->
+ M = riak_object:get_update_metadata(O),
+ V = riak_object:get_update_value(O),
+ case dict:find(clean, M) of
+ error -> true;
+ {ok,_} ->
+ case V of
+ undefined -> false;
+ _ -> true
+ end
+ end.
Please sign in to comment.
Something went wrong with that request. Please try again.