From 5d96df7c3fe069dea00ba0e396ad5bde358113be Mon Sep 17 00:00:00 2001 From: Will Kahn-Greene Date: Fri, 8 Jun 2012 17:38:48 -0400 Subject: [PATCH] Overhaul docs This adds contributor docs and cleans up some of the elasticutils usage docs, too. It adds a lot of examples and expands on the existing text. This also adds a requirements-dev.txt which makes it a little easier to install for hacking. Still a lot of docs-work to do, but I think this is a nice first pass. --- docs/configuration.rst | 61 +++++++ docs/dev_conventions.rst | 12 ++ docs/dev_documentation.rst | 26 +++ docs/dev_testing.rst | 29 ++++ docs/django.rst | 13 +- docs/hacking_howto.rst | 35 ++++ docs/index.rst | 58 ++++--- docs/installation.rst | 65 ++------ docs/join.rst | 32 ++++ docs/queries.rst | 317 +++++++++++++++++++++++++++---------- docs/testing.rst | 27 ---- requirements-dev.txt | 8 + requirements.txt | 1 - 13 files changed, 490 insertions(+), 194 deletions(-) create mode 100644 docs/configuration.rst create mode 100644 docs/dev_conventions.rst create mode 100644 docs/dev_documentation.rst create mode 100644 docs/dev_testing.rst create mode 100644 docs/hacking_howto.rst create mode 100644 docs/join.rst create mode 100644 requirements-dev.txt diff --git a/docs/configuration.rst b/docs/configuration.rst new file mode 100644 index 0000000..a646199 --- /dev/null +++ b/docs/configuration.rst @@ -0,0 +1,61 @@ +============= +Configuration +============= + +ElasticUtils depends on the following settings: + +.. module:: django.conf.settings + +.. data:: ES_DISABLED + + Disables talking to ElasticSearch from your app. Any method + wrapped with `es_required` will return and log a warning. This is + useful while developing, so you don't have to have ElasticSearch + running. + +.. data:: ES_DUMP_CURL + + If set to a path all the requests that `ElasticUtils` makes will + be dumped into the designated file. + + .. note:: Python does not write this file until the process is + finished. + + +.. data:: ES_HOSTS + + This is a list of hosts. In development this will look like:: + + ES_HOSTS = ['127.0.0.1:9200'] + +.. data:: ES_INDEXES + + This is a mapping of doctypes to indexes. A `default` mapping is + required for types that don't have a specific index. + + When ElasticUtils queries the index for a model, it derives the + doctype from `Model._meta.db_table`. When you build your indexes + and doctypes, make sure to name them after your model db_table. + + Example 1:: + + ES_INDEXES = {'default': 'main_index'} + + This only has a default, so ElasticUtils queries will look in + `main_index` for all doctypes. + + Example 2:: + + ES_INDEXES = {'default': 'main_index', + 'splugs': 'splugs_index'} + + Assuming you have a `Splug` model which has a + `Splug._meta.db_table` value of `splugs`, then ElasticUtils will + run queries for `Splug` in the `splugs_index`. ElasticUtils will + run queries for other models in `main_index` because that's the + default. + +.. data:: ES_TIMEOUT + + Defines the timeout for the `ES` connection. This defaults to 1 + second. diff --git a/docs/dev_conventions.rst b/docs/dev_conventions.rst new file mode 100644 index 0000000..b16cb4a --- /dev/null +++ b/docs/dev_conventions.rst @@ -0,0 +1,12 @@ +=========== +Conventions +=========== + +We follow the code conventions listed in the `coding conventions page +of the webdev bootcamp guide +`_. This covers +all the Python code. + +We use git and follow the conventions listed in the `git and github +conventions page of the webdev bootcamp guide +`_. diff --git a/docs/dev_documentation.rst b/docs/dev_documentation.rst new file mode 100644 index 0000000..0e71744 --- /dev/null +++ b/docs/dev_documentation.rst @@ -0,0 +1,26 @@ +============= +Documentation +============= + +Conventions +=========== + +See the `docmentation page in the webdev bootcamp guide +`_ for +documentation conventions. + +The documentation is available in HTML and PDF forms at +``_. This tracks documentation +in the master branch of the git repository. Because of this, it is +always up to date. + + +Building the docs +================= + +The documentation in `docs/` is built with `Sphinx +`_. To build HTML version of the +documentation, do:: + + $ cd docs/ + $ make html diff --git a/docs/dev_testing.rst b/docs/dev_testing.rst new file mode 100644 index 0000000..149adc7 --- /dev/null +++ b/docs/dev_testing.rst @@ -0,0 +1,29 @@ +========================= +Running and writing tests +========================= + +Running the tests +================= + +To run the tests, do:: + + DJANGO_SETTINGS_MODULE=es_settings nosetests -w tests + + +.. Note:: + + If you need to adjust the settings, copy ``es_settings.py`` to a + new file (like ``es_settings_local.py``), edit the file, and pass + that in as the value for ``DJANGO_SETTINGS_MODULE``. + + This is helpful if you need to change the value of ``ES_HOSTS`` to + match the ip address or port that elasticsearch is listening on. + + +Writing tests +============= + +Tests are located in `tests/`. + +We use `nose `_ for test utilities +and running tests. diff --git a/docs/django.rst b/docs/django.rst index 7662892..28906f5 100644 --- a/docs/django.rst +++ b/docs/django.rst @@ -2,13 +2,14 @@ Django Model Integration ======================== -Django Models and ElasticSearch indices make a natural fit. -It would be terribly useful if a Django Model knew how to add and remove itself -from ElasticSearch. -This is where the :class:`elasticutils.models.SearchMixin` comes in. +Django Models and ElasticSearch indices make a natural fit. It would +be terribly useful if a Django Model knew how to add and remove itself +from ElasticSearch. This is where the +:class:`elasticutils.models.SearchMixin` comes in. -You can then utilize things such as :func:`~elasticutils.tasks.index_objects` to -automatically index all new items. +You can then utilize things such as +:func:`~elasticutils.tasks.index_objects` to automatically index all +new items. .. autoclass:: elasticutils.models.SearchMixin :members: diff --git a/docs/hacking_howto.rst b/docs/hacking_howto.rst new file mode 100644 index 0000000..018e2e0 --- /dev/null +++ b/docs/hacking_howto.rst @@ -0,0 +1,35 @@ +.. _hacking-howto-chapter: + +============= +Hacking HOWTO +============= + +This covers setting up a development environment for developing on +ElasticUtils. If you're interested in using ElasticUtils, then you +should check out :ref:`users-guide`. + + +External requirements +===================== + +You should have `elasticsearch `_ installed +and running. + + +Get dependencies +================ + +Run:: + + $ virtualenv ./venv/ + $ . ./venv/bin/activate + $ pip install -r requirements-dev.txt + + +This sets up all the required dependencies for development of +ElasticUtils. + +.. Note:: + + You don't have to put your virtual environment in ``./venv/``. Feel + free to put it anywhere. diff --git a/docs/index.rst b/docs/index.rst index 30c47ed..c0c0d12 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -2,43 +2,49 @@ ElasticUtils ============ -ElasticUtils provides tools to: +.. _project-details: -* Query ElasticSearch within python -* Maintain a single `pyes.ES` object -* Test code that is dependent on ElasticSearch +About ElasticUtils +================== +ElasticUtils is a Python library that gives you a Django queryset-like +API for `elasticsearch `_ as well as some +other tools for making it easier to integrate elasticsearch into your +application. -Project details -=============== +:Code: https://github.com/mozilla/elasticutils +:License: BSD; see LICENSE file +:Issues: https://github.com/mozilla/elasticutils/issues +:Documentation: http://elasticutils.readthedocs.org/ +:IRC: #elasticutils on irc.mozilla.org -Code: - http://github.com/mozilla/elasticutils -Documentation: - http://elasticutils.rtfd.org +.. _users-guide: -Issue tracker: - https://github.com/mozilla/elasticutils/issues +User's Guide +============ -IRC: - ``#elasticutils`` on irc.mozilla.org +.. toctree:: + :maxdepth: 1 -License: - BSD 3-clause; see LICENSE file + installation + configuration + es + queries + django + testing + debugging -Contents -======== +Contributor's Guide +=================== .. toctree:: - :maxdepth: 2 - - installation - es - django - queries - testing - debugging + :maxdepth: 1 + join + hacking_howto + dev_conventions + dev_documentation + dev_testing diff --git a/docs/installation.rst b/docs/installation.rst index 27f96f4..1c16a8e 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -4,66 +4,23 @@ Installation ============ -Download --------- +There are a few ways to install ElasticUtils. -Clone it from https://github.com/mozilla/elasticutils . +From PyPI +========= -Configure ---------- +Do:: -`elasticutils` depends on the following settings: + $ pip install elasticutils -.. module:: django.conf.settings -.. data:: ES_DISABLED +From git +======== - Disables talking to ElasticSearch from your app. Any method - wrapped with `es_required` will return and log a warning. This is - useful while developing, so you don't have to have ElasticSearch - running. +Do:: -.. data:: ES_TIMEOUT + $ git clone git://github.com/mozilla/elasticutils.git - Defines the timeout for the `ES` connection. This defaults to 1 second. - -.. data:: ES_DUMP_CURL - - If set to a path all the requests that `ElasticUtils` makes will be dumped - into the designated file. - - .. note:: Python does not write this file until the process is finished. - - -.. data:: ES_HOSTS - - This is a list of hosts. In development this will look like:: - - ES_HOSTS = ['127.0.0.1:9200'] - -.. data:: ES_INDEXES - - This is a mapping of doctypes to indexes. A `default` mapping is required - for types that don't have a specific index. - - When ElasticUtils queries the index for a model, it derives the doctype - from `Model._meta.db_table`. When you build your indexes and doctypes, - make sure to name them after your model db_table. - - Example 1:: - - ES_INDEXES = {'default': 'main_index'} - - This only has a default, so ElasticUtils queries will look in `main_index` - for all doctypes. - - Example 2:: - - ES_INDEXES = {'default': 'main_index', - 'splugs': 'splugs_index'} - - Assuming you have a `Splug` model which has a `Splug._meta.db_table` - value of `splugs`, then ElasticUtils will run queries for `Splug` in - the `splugs_index`. ElasticUtils will run queries for other models in - `main_index` because that's the default. +For other ways to clone, see +``_. diff --git a/docs/join.rst b/docs/join.rst new file mode 100644 index 0000000..32c1b77 --- /dev/null +++ b/docs/join.rst @@ -0,0 +1,32 @@ +================== +Join this project! +================== + +Interested in working on a Python library for using elasticsearch? +Interested in using it? Then you should be interested in this project! + + +Want to help? +============= + +Here are things we need help with: + +* **fixing bugs listed in the issue tracker** + +* **writing tests** + +* **writing documentation**: We could use help writing better + documentation for ElasticUtils. + +* **spreading the word**: Do you know other people who would like this + software? If so, tell them about ElasticUtils! + +* **project infrastructure**: Is there infrastructure that's missing + in this project that would make it easier for you to collaborate? If + so, what? + + +Are you thinking, "That list is makes me want to go shopping for bumper +stickers!" That's ok! Hop on IRC, say hi and we can go from there! + +For project details, see :ref:`project-details`. diff --git a/docs/queries.rst b/docs/queries.rst index ed3d2c7..e646cc1 100644 --- a/docs/queries.rst +++ b/docs/queries.rst @@ -3,106 +3,260 @@ Querying with ElasticUtils ========================== ElasticUtils makes querying and filtering and collecting facets from -ElasticSearch simple :: +ElasticSearch simple. +For example: - q = (S(model).query(title='Example') - .filter(product='firefox') - .filter(version='4.0', platform='all') - .facet(products={'field':'product', 'global': True}) - .facet(versions={'field': 'version'}) - .facet(platforms={'field': 'platform'}) - .facet(types={'field': 'type'})) +.. code-block:: python + q = (S(model).query(title='Example') + .filter(product='firefox') + .filter(version='4.0', platform='all') + .facet(products={'field':'product', 'global': True}) + .facet(versions={'field': 'version'}) + .facet(platforms={'field': 'platform'}) + .facet(types={'field': 'type'})) -Where ``model`` is a Django-model class. -.. note:: +Where ``model`` is a Django ORM model class. - If you're not using Django, you can create stub-models. See the tests for - more details. +Each call to ``query``, ``filter``, ``facet``, ``sort_by``, etc will +create a new S object with the accumulated search criteria. -Search All ----------- +.. Note:: + + If you're not using Django, you can create stub-models. See the + tests for more details. + + +Match All +========= By default ``S(Model)`` will do a ``match_all`` query in ElasticSearch. Search Query ------------- +============ + +The query is specified by keyword arguments to the ``query()`` +method. The key of the keyword argument is parsed splitting on ``__`` +(that's two underscores) with the first part as the "field" and the +second part as the "field action". + +For example: + +.. code-block:: python + + q = S(Model).query(title='taco trucks') + + +will do an elasticsearch term query for "taco trucks" in the title field. -``S(Model).query(title='taco trucks')`` will do a term query for "taco trucks" -in the title. +And: -The query parameters can define different kinds of queries to do, for -example: ``S(Model).query(title__text='taco trucks')`` will do a text -query instead of a term query. +.. code-block:: python -* ``title__text``: a Text_ query -* ``title__startswith``: a Prefix_ query -* ``title__gt``: a Range_ query (includes ``gt``, ``gte``, ``lt``, ``lte``) -* ``title__fuzzy``: a Fuzzy_ query -* ``title``: or no query type, will do a Term_ query + q = S(Model).query(title__text='taco trucks') + + +will do a text query instead of a term query. + +There are many different field actions to choose from: + +================ =================== +field action elasticsearch query +================ =================== +text Text_ query +startswith Prefix_ query +gt, gte, lt, lte Range_ query +fuzzy Fuzzy_ query +(no action) Term_ query +================ =================== Filters -------- +======= + +.. code-block:: python + + q = (S(Model).query(title='taco trucks') + .filter(style='korean')) + + +will do a query for "taco trucks" in the title field and filter on the +style field for 'korean'. This is how we find Korean Taco Trucks. + +As with ``query()``, ``filter()`` allow for you to specify field +actions for the filters: + +================ ==================== +field action elasticsearch filter +================ ==================== +in Terms_ filter +gt, gte, lt, lte Range_ filter +(no action) Term_ filter +================ ==================== + +See the `elasticsearch docs on queries and filters +`_. + + +Advanced filters +================ + +Calling filter multiple times is equivalent to an "and"ing of the +filters. + +For example: + +.. code-block:: python + + q = (S(Model).filter(style='korean') + .filter(price='FREE')) + +will do a query for style 'korean' AND price 'FREE'. Anything that has +a style other than 'korean' or a price other than 'FREE' is removed +from the result set. + +This translates to: + +.. code-block:: javascript + + {'filter': { + 'and': [ + {'term': {'style': 'korean'}}, + {'term': {'price': 'FREE'}} + ]}, + 'fields': ['id']} + + +in elasticutils JSON. + +You can do the same thing by putting both filters in the same +``.filter()`` call. -``S(Model).query(title='taco trucks').filter(style='korean')`` will do a query -for "taco trucks" filtering on the attribute ``style``. This is how we find -Korean Taco Trucks. +For example: -.. note:: +.. code-block:: python - Each call to ``query``, ``filter``, ``facet``, or ``sort_by`` will - create new S objects, with the results combined. + q = S(Model).filter(style='korean', price='FREE') -As with Queries, Filters allow for you to specify the kind of filter to -do. -* ``style__in=['korean', 'mexican']``: a Terms_ filter -* ``style__gt``: a Range_ filter ((includes ``gt``, ``gte``, ``lt``, ``lte``) -* ``style``: or no filter type, will do a Term_ filter +that also translates to: +.. code-block:: javascript -Multiple Filters -~~~~~~~~~~~~~~~~ -:: + {'filter': { + 'and': [ + {'term': {'style': 'korean'}}, + {'term': {'price': 'FREE'}} + ]}, + 'fields': ['id']} - S(Model).query(title='taco trucks').filter(style='korean', price='FREE') -will do a query for "taco trucks" that are "korean" style and have a price of -"FREE". +in elasticutils JSON. +Suppose you want either Korean or Mexican food. For that, you need an +"or". -Complicated Filtering -~~~~~~~~~~~~~~~~~~~~~ +You can do something like this: -Sometimes you want something complicated. For that we have the ``F`` (filter) -object:: +.. code-block:: python - S(Model).query(title='taco trucks').filter(F(style='korean') | - F(style='thai')) + q = S(Model).filter(or_={'style': 'korean', 'style'='mexican'}) -will find you "thai" or "korean" style taco trucks. -Let's say you only want "korean" tacos if you can get it for "FREE" or "thai" -tacos at any price:: +That translates to: - S('taco trucks').filter(F(style='korean', price='FREE') | F(style='thai')) +.. code-block:: javascript -.. note:: + {'filter': { + 'or': [ + {'term': {'style': 'korean'}}, + {'term': {'style': 'mexican'}} + ]}, + 'fields': ['id']} - ``F`` objects support AND, OR, and NOT operators. + +But, that's kind of icky looking. + +So, we've also got an ``F`` class that makes this sort of thing +easier. + +You can do the previous example with ``F`` like this: + +.. code-block:: python + + q = S(Model).filter(F(style='korean') | F(style='mexican')) + + +will get you all the search results that are either "korean" or +"mexican" style. + +That translates to: + +.. code-block:: javascript + + {'filter': { + 'or': [ + {'term': {'style': 'korean'}}, + {'term': {'style': 'mexican'}} + ]}, + 'fields': ['id']} + + +What if you want Mexican food, but only if it's FREE, otherwise you +want Korean? + +.. code-block:: python + + q = S(Model).filter(F(style='mexican', price='FREE') | F(style='korean')) + + +That translates to: + +.. code-block:: javascript + + {'filter': { + 'or': [ + {'and': [ + {'term': {'price': 'FREE'}}, + {'term': {'style': 'mexican'}} + ]}, + {'term': {'style': 'korean'}} + ]}, + 'fields': ['id']} + + +``F`` supports AND, OR, and NOT operators. Facets ------- +====== + +.. code-block:: python + + q = (S(Model).query(title='taco trucks') + .facet(styles={'field': 'style'}, + locations={'field':'location'})) + + +will do a query for "taco trucks" and return facets for the ``style`` +and ``location`` fields. The facets are available from the ``facets`` +properties. + +That translates to: + +.. code-block:: javascript + + {'query': { + 'term': {'title': 'taco trucks'}}, + 'facets': { + 'styles': {'field': 'style'}, + 'locations': {'field': 'location'} + }, + 'fields': ['id']} -``S(Model).query(title='taco trucks').facet(styles={'field': 'style'}, -locations={'field':'location'})`` will do a query for "taco trucks" and return -facets for the ``style`` and ``location`` fields. The facets are -available from the ``facets`` properties. Facets can also be scripted_:: @@ -111,40 +265,43 @@ Facets can also be scripted_:: 'script': 'term == korean ? true : false' }) -.. note:: - Unless the ``facet_filter`` property is specified on each facet, - all the filters will be used for the facet_filter by default. +.. Note:: + + Unless the ``facet_filter`` property is specified on each facet, + all the filters will be used for the facet_filter by default. -Results -------- -Results are lazy-loaded, so the query will not be made until you try to -access an item or some other attribute requiring the data. +Counts +====== -Total hits can be found by doing:: +Total hits can be found by doing: + +.. code-block:: python r = S(Model).query(title='taco trucks') r.count() - # or len(r) -Results-types -------------- + +Results +======= + +Results are lazy-loaded, so the query will not be made until you try +to access an item or some other attribute requiring the data. By default, results will be returned as instances of the Model class -provided in the constructor. However, you can get the results back as a -list or dictionaries or tuples, if you'd rather:: +provided in the constructor. However, you can get the results back as +a list or dictionaries or tuples, if you'd rather: - S(Model).query(type='taco trucks').values('title') - > [(1, 'De La Tacos',), (2, 'Oriental Tacos',),] +>>> S(Model).query(type='taco trucks').values('title') +[(1, 'De La Tacos',), (2, 'Oriental Tacos',),] +>>> S(Model).query(type='taco trucks').values_dict('title') +[{'id': 1, 'title': 'De La Tacos'}, {'id': 2, 'title': 'Oriental Tacos'}] - S(Model).query(type='taco trucks').values_dict('title') - > [{'id': 1, 'title': 'De La Tacos'}, {'id': 2, 'title': 'Oriental - Tacos'}] -Arguments passed to ``values`` or ``values_dict`` will select the fields -that are returned, including the ``id``. +Arguments passed to ``values`` or ``values_dict`` will select the +fields that are returned, including the ``id``. .. _Text: http://www.elasticsearch.org/guide/reference/query-dsl/text-query.html diff --git a/docs/testing.rst b/docs/testing.rst index 8535053..86bfdbc 100644 --- a/docs/testing.rst +++ b/docs/testing.rst @@ -12,30 +12,3 @@ It does the following: * If `ES_HOSTS` is empty it raises a `SkipTest`. * `self.es` is available from the `ESTestCase` class and any subclasses. * At the end of the Test Case the index is destroyed. - - -Testing elasticutils itself -=========================== - -Testing elasticutils requires pyes_ and nose_. The easiest way to test is -to set up a new virtualenv with those packages installed:: - - mkvirtualenv elasticutils - workon elasticutils - pip install -r requirements-extra.txt - -Then from the elasticutils base directory run:: - - DJANGO_SETTINGS_MODULE=es_settings nosetests -w tests - -.. Note:: - - If you need to adjust the settings, copy ``es_settings.py`` to a - new file (like ``es_settings_local.py``), edit the file, and pass - that in as the value for ``DJANGO_SETTINGS_MODULE``. - - This is helpful if you need to change the value of ``ES_HOSTS`` to - match the ip address or port that elasticsearch is listening on. - -.. _pyes: http://pypi.python.org/pypi/pyes/ -.. _nose: http://somethingaboutorange.com/mrl/projects/nose/ diff --git a/requirements-dev.txt b/requirements-dev.txt new file mode 100644 index 0000000..047546c --- /dev/null +++ b/requirements-dev.txt @@ -0,0 +1,8 @@ +-r requirements-extra.txt +# This includes everything you need to develop on ElasticUtils. + +# nose for tests +nose + +# Sphinx for documentation +Sphinx diff --git a/requirements.txt b/requirements.txt index dd9db17..c6bcc43 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,2 @@ -nose # pyes 0.15, other versions... not so great. -e git://github.com/aparo/pyes.git@27d00eac9030cc9c4dfce9231ad1094f1470a3ca#egg=pyes