Permalink
Find file
Fetching contributors…
Cannot retrieve contributors at this time
202 lines (147 sloc) 6.47 KB
=======
Notes
=======
- Brazenly using 'couchdb' as the application name and module prefix. This
might be off-putting. CouchDB uses 'couch' and we may cause confusion. The
thinking, however, is that Erlang developers (the community that would likely
care about namespace issues, aside from the CouchDB dev team) won't care
about close collisions as they'll use the `couchdb` interface.
- couch_app starts a bunch of apps. This is typically handled elsewhere --
usually when starting an Erlang release (dependecies are automatically
started using dependencies in .app file).
- We want to keep the dependencies to a mininum, so not requiring oauth, ssl,
ibrowse, and mochiweb (maybe others).
- Using application config to locate ini files.
- Not possible to start couch_server without config -- breaks in init/1 looking
for "max_db_open" (no default vals). We'll cover for this using a hack in
couchdb_sup. The general idea is that we should work with sensible defaults
without any external config.
- couch_uuids:new/0 looks crufty - it's a gen_server call and not used. The
server is never started by couch, so I'm guessing couch_uuids:random/0 is the
correct function (but there are calls to new/0, possibly also not used).
- This has nothing to do with the project, but it's surprising that the content
type for a json response over http is text/plain and not application/json.
(Update - need to provide an accepts header to get the expected mime type.)
- delayed_commit is really a big deal - no lie. Docs/sec around 20 with it off,
up to 280 with it on. Bulk writes bumped up to over 2000. These are all
consistent with http://books.couchdb.org/relax/reference/high-performance.
- couch_db:delete_doc/3 doesn't work, probably not used anywhere.
- couch_icu_driver comes up when couch_util:collate/3 is called (e.g. new keys
are added to a btree index).
- Borrowed heavily from hovercraft for the select support.
- A number of couchdb functions don't work unless the db is re-opened. I'm
currently doing this implicitly by pulling the dbname off the db record. I
think this is the right approach as the user doesn't have to worry about what
to pass in as Db -- it's always what's returned by open. It also lets us
preserve/reuse options specified in the origial open.
- How do we want to provude "docs" to Erlang functions? Name/value list is one
way, but we could also provide them as #doc records and let users work with
the couchdoc and couchterm modules.
Services
========
The "core" services we have are as small as possible to enable basic embedded
functions: db ops, doc ops, view queries. Everything else is moved into
"optional" services. This is currently completely trial and error.
We don't get *anywhere* without config and server. The db_update_event
service is needed for various operations and is non-optional.
Logging so far looks optional -- all of the events go into the ether if it's
not started.
httpd is needed as soon as you want to access CouchDB over REST calls or use
Futon. This *really* requires a correctly setup `ini` file -- nothing really
works without the right setup. (TODO: provide a minimal `ini.in` for using
Futon.)
uuids is used by Futon to generate IDs for new documents. We use the non proc
calls.
I'm not sure how far we can get without view or task_status. task_status is
started as a core service by CouchDB itself, though we've gotten this far
without it.
query_servers is used to manage support for map/reduce and is needed for views.
Services not addressed yet:
- couch_replication_server
- couch_external_manager
- couch_db_update_notifier_sup
- couch_stats_aggregator
- couch_stats_collector
- couch_auth_cache
HTTP Views
==========
Working:
| /
| _all_dbs
| _config
| _session
| _utils
| _uuids
| /_active_tasks
Not tested:
| _status
| _log
| _replicate (almost certainly doesn't work)
| _oauth (surely not working)
ICU Driver and Ports
====================
couch_util:drv_port/0 uses the process dictionary to cache the driver
port. This is a big problem for anyone using couchdb:select from multiple
processes - each process gets its own port.
The driver port should be registered under a well known name.
See drv_port.patch
If for whatever reason we can't get this in, a more elaborate work around would
be to move our API into a gen_server, which would give us control over the
process dictionary. This might have some other benefits as well (rate control,
pooling, etc.) for concurrent access - all TDB.
Erlang Views
============
I'd like to see a more natural use of Erlang for building views.
The module couch_native_process provides the road map.
When a view is updated, the process is sent:
[<<"reset">>,{[{<<"reduce_limit">>,true}]}]
[<<"add_fun">>,
<<"fun({Doc}) ->\n Emit(proplists:get_value(<<\"msg\">>, Doc), null)\nend.">>]
[<<"map_doc">>,
{[{<<"_id">>,<<"2eb38be2a696d1772651d02965ce67b1">>},
{<<"_rev">>,<<"1-5a6392aec3909c19d62f40aa01f416b4">>},
{msg,"hi"}]}]
I'm not sure how many processes are created at this point. One per view, or one
per database? Maybe one overall, where reset/add_fun is called for each map
"job" that's run (whenever a view is read that needs updating).
Rather than store an Erlang function as a string in the document, I'd like to
see something else.
Here's a map example as you'd expect to see in Erlang:
7> F = fun(Doc, Acc) ->
7> case proplists:get_value(msg, Doc) of
7> undefined -> Acc;
7> Val -> [Val|Acc]
7> end
7> end.
#Fun<erl_eval.12.113037538>
8> D1 = [{msg, "hi"}].
[{msg,"hi"}]
9> D2 = [].
[]
10> D3 = [{msg, "there"}].
[{msg,"there"}]
11> lists:foldl(F, [], [D1, D2, D3]).
["there","hi"]
I'd life to be able to do this:
1> couchdb:put("_design/default", couchview:new(F)).
All done!
This would be stored as a binary encoded term.
There's no reason for Emit or Send, though some of the informational functions
like GetRow (surprisingly complex - not sure what it is yet) could be provided
as an optional context arg to the map fun,
1> F = fun(Doc, Acc, C) ->
1> case couchdb_c:get_row(C) of
1> SomeRow -> Acc;
1> _ -> [1|Acc]
1> end
1> end.
This is admitedly a perfect case for using a parameterized module:
case C:get_row() of
...
end
(I might have to eat my words about the One True Way.)
Selecting from Views
====================
Currently the API only supports *keys* and it looks like it could support doc
IDs as well, though I'm not sure how that works. The HTTP interface accepts
startkey_docid and endkey_docid, but I'm not sure how it works.