Skip to content

Commit

Permalink
Merge pull request #3109 from hypothesis/architecture-decisions
Browse files Browse the repository at this point in the history
Architecture decision records!
  • Loading branch information
robertknight committed Mar 18, 2016
2 parents 1aa41da + 22adebd commit d67fb8b
Show file tree
Hide file tree
Showing 2 changed files with 134 additions and 0 deletions.
80 changes: 80 additions & 0 deletions docs/arch/adr-001.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
ADR 1: PostgreSQL persistence for annotations
=============================================

Context
-------

The annotations stored by the Hypothesis web service are arguably its most
critical data. Until now they have been stored in an Elasticsearch index,
primarily as a result of historical accident (this is how `annotator-store`_,
which was originally intended as a demonstrator application, stored
annotations). Alongside, we store "document metadata" which describes
relationships between different URIs, as scraped from metadata within annotated
pages.

While storing annotation data directly in Elasticsearch makes for a very simple
JSON API (data is passed essentially unaltered by the web application straight
to Elasticsearch) it has a number of disadvantages, including:

1. The persistence guarantees made by Elasticsearch are weak relative to most
databases, and while many `data loss bugs`_ have been fixed, it is not
unreasonable to have ongoing concerns about durability of data in
Elasticsearch.

2. The lack of database-enforced schema validation means that maintaining data
validity becomes an application-layer concern. The fact that Elasticsearch
also lacks transactional write capabilities makes certain kinds of validation
checks nearly impossible to implement correctly.

3. Serving as both primary persistence store and search index causes tension
between the desire to keep data normalised (to simplify the process of
ensuring data consistency), and to keep data in a format suitable for
efficient search, which usually implies denormalisation.

4. As requirements for search and query change, it is desirable to be able to
iterate on the format of the search index. When the search index is also the
primary data store, this introduces additional risks which typically deter or
at least increase the cost of such iteration.

5. Lastly, making changes to the internal schema of annotation data in
Elasticsearch requires the creation of custom in-house data migration tools.
In contrast, most relational database systems have established schema and
data migration libraries available.

.. _annotator-store: https://github.com/openannotation/annotator-store.
.. _data loss bugs: https://aphyr.com/posts/317-jepsen-elasticsearch

Decision
--------

We will migrate all annotation data, and all associated document metadata, into
a PostgreSQL database, which will serve as the primary data store for such data.

We will continue to use Elasticsearch as a search index, but the data stored
within will be "ephemeral" -- that is, we will always be able to regenerate it
from data stored in PostgreSQL.

The internal schemas of the data stored in PostgreSQL will be designed to
simplify data manipulation while ensuring self-consistency.

We will build appropriate tools to ensure that the Elasticsearch index is kept
up-to-date as data in the PostgreSQL database changes.

Status
------

Accepted.

Consequences
------------

These changes will make it easier and safer to iterate on the internal schemas
of annotation storage, thanks to improved migration tooling for PostgreSQL and
the presence of transactional updates.

They will also make it easier and safe to iterate on the format of the search
index used to search annotations, thanks to the ephemeral nature of the data in
the search index.

The potential future minimal requirements for a program which reuses the code
which serves our "annotation API" now include PostgreSQL.
54 changes: 54 additions & 0 deletions docs/arch/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Architecture decision records
=============================

Here you will find documents which describe significant architectural decisions
made or proposed when developing the Hypothesis software. We record these in
order to provide a reference for the history, motivation, and rationale for past
decisions.

ADRs
----

.. toctree::
:maxdepth: 1
:glob:

adr-*

What are ADRs?
--------------

Quoting from the `blog post which inspired this repository`_, an architecture
decision record, or ADR, is:

...a short text file in a [specific] format...[which] describes a set of
forces and a single decision in response to those forces. Note that the
decision is the central piece here, so specific forces may appear in
multiple ADRs.

The standard sections of an ADR are:

**Title** These documents have names that are short noun phrases. For
example, "ADR 1: Deployment on Ruby on Rails 3.0.10" or "ADR 9: LDAP for
Multitenant Integration"

**Context** This section describes the forces at play, including
technological, political, social, and project local. These forces are
probably in tension, and should be called out as such. The language in this
section is value-neutral. It is simply describing facts.

**Decision** This section describes our response to these forces. It is
stated in full sentences, with active voice. "We will ..."

**Status** A decision may be "proposed" if the project stakeholders haven't
agreed with it yet, or "accepted" once it is agreed. If a later ADR changes
or reverses a decision, it may be marked as "deprecated" or "superseded"
with a reference to its replacement.

**Consequences** This section describes the resulting context, after
applying the decision. All consequences should be listed here, not just the
"positive" ones. A particular decision may have positive, negative, and
neutral consequences, but all of them affect the team and project in the
future.

.. _blog post which inspired this repository: http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions

0 comments on commit d67fb8b

Please sign in to comment.