Skip to content

Commit

Permalink
modified: doc/datastore.rst
Browse files Browse the repository at this point in the history
Changed datastore.rst to detail most recent datastore api changes
  • Loading branch information
danieljohnlewis authored and rufuspollock committed Aug 13, 2012
1 parent e5d95b7 commit 5501b13
Showing 1 changed file with 17 additions and 60 deletions.
77 changes: 17 additions & 60 deletions doc/datastore.rst
Expand Up @@ -33,80 +33,42 @@ queries over the spreadsheet contents.
The DataStore Data API
======================

The DataStore's Data API, which derives from the underlying ElasticSearch
data-table, is RESTful and JSON-based with extensive query capabilities.
The DataStore's Data API, which derives from the underlying data-table, is RESTful and JSON-based with extensive query capabilities.

Each resource in a CKAN instance has an associated DataStore 'table'. This
table will be accessible via a web interface at::

/api/data/{resource-id}

This interface to this data is *exactly* the same as that provided by
ElasticSearch to documents of a specific type in one of its indices.

For a detailed tutorial on using this API see :doc:`using-data-api`.

Installation and Configuration
==============================

The DataStore uses ElasticSearch_ as the persistence and query layer with CKAN
wrapping this with a thin authorization and authentication layer.

It also requires the use of Nginx as your webserver as its XSendfile_ feature
is used to transparently hand off data requests to ElasticSeach internally.

.. _ElasticSearch: http://www.elasticsearch.org/
.. _XSendfile: http://wiki.nginx.org/XSendfile

1. Install ElasticSearch_
-------------------------

Please see the ElasticSearch_ documentation.

2. Configure Nginx
------------------
The DataStore in previous lives required a custom setup of ElasticSearch and Nginx, but that is no more, as it can use any relational database management system (PostgreSQL for example).

As previously mentioned, Nginx will be used on top of CKAN to forward
requests to Elastic Search. CKAN will still be served by Apache or the
development server (Paster), but all requests will be forwarded to it
by Ngnix.

This is an example of an Nginx configuration file. Note the two locations
defined, `/` will point to the server running CKAN (Apache or Paster), and
`/elastic/` to the Elastic Search instance::
To enable datastore features in CKAN
------------------------------------

server {
listen 80 default;
server_name localhost;
In your config file...

access_log /var/log/nginx/localhost.access.log;
Enable the ckan datastore::

location / {
# location of apache or ckan under paster
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
}
location /elastic/ {
internal;
# location of elastic search
proxy_pass http://:127.0.0.1:9200/;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
ckan.datastore.enabled = 1
Ensure that the datastore extension is enabled::

.. note:: update the proxy_pass field value to point to your ElasticSearch
instance (if it is not localhost and default port).
ckan.plugins = datastore
Ensure that the ckan.datastore_write_url variable is set::

Remember that after setting up Nginx, you need to access CKAN via its port
(80), not the Apache or Paster (5000) one, otherwise the DataStore won't work.
ckan.datastore_write_url = postgresql://ckanuser:pass@localhost/ckantest
To test you can create a new datastore, so on linux command line do::

3. Enable datastore features in CKAN
------------------------------------
curl -X POST http://127.0.0.1:5000/api/3/action/datastore_create -H "Authorization: {YOUR-API-KEY}" -d "{\"resource_id\": \"{PRE-EXISTING-RESOURCE-ID}\", \"fields\": [ {\"id\": \"a\"}, {\"id\": \"b\"} ], \"records\": [ { \"a\": 1, \"b\": \"xyz\"}, {\"a\": 2, \"b\": \"zzz\"} ]}"

In your config file set::

ckan.datastore.enabled = 1

.. _datastorer:

Expand All @@ -130,11 +92,6 @@ How It Works (Technically)

1. Request arrives at e.g. /dataset/{id}/resource/{resource-id}/data
2. CKAN checks authentication and authorization.
3. (Assuming OK) CKAN hands (internally) to ElasticSearch which handles the
3. (Assuming OK) CKAN hands (internally) to the database querying system which handles the
request

* To do this we use Nginx's Sendfile / Accel-Redirect feature. This allows
us to hand off a user request *directly* to ElasticSearch after the
authentication and authorization. This avoids the need to proxy the
request and results through CKAN code.

0 comments on commit 5501b13

Please sign in to comment.