View
@@ -0,0 +1,80 @@
ADR 1: PostgreSQL persistence for annotations
=============================================
Context
-------
The annotations stored by the Hypothesis web service are arguably its most
critical data. Until now they have been stored in an Elasticsearch index,
primarily as a result of historical accident (this is how `annotator-store`_,
which was originally intended as a demonstrator application, stored
annotations). Alongside, we store "document metadata" which describes
relationships between different URIs, as scraped from metadata within annotated
pages.
While storing annotation data directly in Elasticsearch makes for a very simple
JSON API (data is passed essentially unaltered by the web application straight
to Elasticsearch) it has a number of disadvantages, including:
1. The persistence guarantees made by Elasticsearch are weak relative to most
databases, and while many `data loss bugs`_ have been fixed, it is not
unreasonable to have ongoing concerns about durability of data in
Elasticsearch.
2. The lack of database-enforced schema validation means that maintaining data
validity becomes an application-layer concern. The fact that Elasticsearch
also lacks transactional write capabilities makes certain kinds of validation
checks nearly impossible to implement correctly.
3. Serving as both primary persistence store and search index causes tension
between the desire to keep data normalised (to simplify the process of
ensuring data consistency), and to keep data in a format suitable for
efficient search, which usually implies denormalisation.
4. As requirements for search and query change, it is desirable to be able to
iterate on the format of the search index. When the search index is also the
primary data store, this introduces additional risks which typically deter or
at least increase the cost of such iteration.
5. Lastly, making changes to the internal schema of annotation data in
Elasticsearch requires the creation of custom in-house data migration tools.
In contrast, most relational database systems have established schema and
data migration libraries available.
.. _annotator-store: https://github.com/openannotation/annotator-store.
.. _data loss bugs: https://aphyr.com/posts/317-jepsen-elasticsearch
Decision
--------
We will migrate all annotation data, and all associated document metadata, into
a PostgreSQL database, which will serve as the primary data store for such data.
We will continue to use Elasticsearch as a search index, but the data stored
within will be "ephemeral" -- that is, we will always be able to regenerate it
from data stored in PostgreSQL.
The internal schemas of the data stored in PostgreSQL will be designed to
simplify data manipulation while ensuring self-consistency.
We will build appropriate tools to ensure that the Elasticsearch index is kept
up-to-date as data in the PostgreSQL database changes.
Status
------
Accepted.
Consequences
------------
These changes will make it easier and safer to iterate on the internal schemas
of annotation storage, thanks to improved migration tooling for PostgreSQL and
the presence of transactional updates.
They will also make it easier and safe to iterate on the format of the search
index used to search annotations, thanks to the ephemeral nature of the data in
the search index.
The potential future minimal requirements for a program which reuses the code
which serves our "annotation API" now include PostgreSQL.
View
@@ -0,0 +1,88 @@
ADR 2: Service layer for testable business logic
================================================
Context
-------
As we are currently using it, Pyramid is a model-view-template (MVT) web
application framework. Models describe domain objects and manage their
persistence, views handle HTTP requests, and templates define the user
interface.
"Business logic" is a shorthand for the heart of what the application actually
does. It is the code that manages the interactions of our domain objects, rather
than code that handles generic concerns such as HTTP request handling or SQL
generation.
It is not always clear where to put "business logic" in an MVT application:
- Some logic can live with its associated domain object(s) in the models layer,
but this quickly gets complicated when dealing with multiple models from
different parts of the system. It is easy to create circular import
dependencies.
- Putting logic in the views typically makes them extremely hard to test, as
this makes a single component responsible for receiving and validating data
from the client, performing business logic operations, and preparing response
data.
There are other problems associated with encapsulating business logic in views.
Business logic typically interacts directly with the model layer. This means
that either a) all view tests (including those which don't test business logic)
need a database, or b) we stub out the models layer for some or all view tests.
Stubbing out the database layer in a way that doesn't couple tests to the view
implementation is exceedingly difficult, in part due to the large interface of
SQLAlchemy.
One way to resolve this problem is to introduce a "services layer" between views
and the rest of the application, which is intended to encapsulate the bulk of
application business logic and hide persistence concerns from the views.
`This blog post`_ by Nando Farestan may help provide additional background
on the motivation for a "services layer."
.. _This blog post: http://dev.nando.audio/2014/04/01/large_apps_with_sqlalchemy__architecture.html
Decision
--------
We will employ a "services layer" to encapsulate business logic that satifies
one or both of the following conditions:
1. The logic is of "non-trivial" complexity. This is clearly open to
interpretation. As a rule of thumb: if you have to ask yourself the question
"is this trivial" then it is probably not.
2. The business logic handles more than one type of domain objects.
The services layer will be tested independently of views, and used from both
views and other parts of the application which have access to a request object.
Services will take the form of instances with some defined interface which are
associated with a request and can be retrieved from the request object.
Status
------
Accepted.
Consequences
------------
We hope that adding a services layer will substantially simplify the process of
writing and, in particular, testing view code.
Views tests will likely be able to run faster, as they can be unit tested
against a stubbed service, rather than having to hit the database.
We will no longer need to stub or mock SQLAlchemy interfaces for testing, thus
reducing the extent to which tests are coupled to the implementation of the
system under test.
To achieve these things we are introducing additional concepts ("service",
"service factory") the purpose of which may not be immediately apparent,
especially to programmers new to the codebase.
There will likely be non-service-based views code in the codebase for some time,
thus we are potentially introducing inconsistency between different parts of the
code.
View
@@ -0,0 +1,88 @@
ADR 3: Resource-oriented, hierarchical API
==========================================
Context
-------
The problem
~~~~~~~~~~~
We don't currently have any explicitly agreed conventions or patterns for how we
structure our API. As different parts of our API are implemented in different
ways, people looking to integrate with our API are having a harder time than
necessary learning how it works.
The lack of agreed conventions also slows us down when designing new
functionality, as we don't have a standard set of patterns to draw from.
Extra context
~~~~~~~~~~~~~
The part of our API that deals with creating, retrieving and modifying
annotations is currently resource-based, with ``GET``, ``POST``, ``PATCH`` and
``DELETE`` requests to suitable URLs. This is also the style used in the W3C
`Web Annotation Protocol`_.
.. _Web Annotation Protocol: https://www.w3.org/TR/annotation-protocol/
Some examples of other APIs that could provide inspiration:
- The `GitHub API <https://developer.github.com/v3/>`_ (resource-based,
hierarchical style)
- The `Slack Web API <https://api.slack.com/web>`_ (RPC style)
Decision
--------
We're going to build our API in a resource-oriented, broadly RESTful style (with
standard HTTP verbs operating on resources at URLs).
We'll nest resources liberally. For instance, when flagging an annotation for a
moderator's attention, we would use a ``PUT`` request to a sub-resource of the
annotation::
PUT /api/annotations/<annid>/flag
Content-Type: application/json
{"reason": "spam"}
One advantage of this method is that parameters can often be made mandatory by
construction: in the example above, it becomes impossible to flag an annotation
without providing the annotation ID. This is similar to the approach GitHub
takes for `locking issues`_.
.. _locking issues: https://developer.github.com/v3/issues/#lock-an-issue
The operation to remove such a flag would then be expressed with a ``DELETE``
request to the same URL::
DELETE /api/annotations/<annid>/flag
An example of a pattern we are *not* choosing to follow is to post these flags
as top-level entities::
POST /api/flags
Content-Type: application/json
{"annotation": "<annid>"}
While this is a reasonable approach to take, we think that the more hierarchical
approach will make the relationships between different entities (in this case,
annotations and their associated flags) easier to understand.
Status
------
Accepted.
Consequences
------------
With a set of default patterns for how we implement new functionality (or
restructure existing functionality), this should result in a more consistent API
and an easier time for those trying to integrate with our systems.
Another benefit should be that this speeds up development, by giving us a base
set of patterns from which to design new functionality. This should also make it
easier for reviewers, as they can more confidently review a proposal if it
follows these conventions, or look for more context if it doesn't.
View
@@ -0,0 +1,56 @@
:orphan:
Architecture decision records
=============================
Here you will find documents which describe significant architectural decisions
made or proposed when developing the Hypothesis software. We record these in
order to provide a reference for the history, motivation, and rationale for past
decisions.
ADRs
----
.. toctree::
:maxdepth: 1
:glob:
adr-*
What are ADRs?
--------------
Quoting from the `blog post which inspired this repository`_, an architecture
decision record, or ADR, is:
...a short text file in a [specific] format...[which] describes a set of
forces and a single decision in response to those forces. Note that the
decision is the central piece here, so specific forces may appear in
multiple ADRs.
The standard sections of an ADR are:
**Title** These documents have names that are short noun phrases. For
example, "ADR 1: Deployment on Ruby on Rails 3.0.10" or "ADR 9: LDAP for
Multitenant Integration"
**Context** This section describes the forces at play, including
technological, political, social, and project local. These forces are
probably in tension, and should be called out as such. The language in this
section is value-neutral. It is simply describing facts.
**Decision** This section describes our response to these forces. It is
stated in full sentences, with active voice. "We will ..."
**Status** A decision may be "proposed" if the project stakeholders haven't
agreed with it yet, or "accepted" once it is agreed. If a later ADR changes
or reverses a decision, it may be marked as "deprecated" or "superseded"
with a reference to its replacement.
**Consequences** This section describes the resulting context, after
applying the decision. All consequences should be listed here, not just the
"positive" ones. A particular decision may have positive, negative, and
neutral consequences, but all of them affect the team and project in the
future.
.. _blog post which inspired this repository: http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions
View
@@ -0,0 +1,20 @@
The Hypothesis community
========================
Please be courteous and respectful in your communication on Slack (`request an invite`_ or `log in once you've created an account`_), IRC
(`#hypothes.is`_ on `freenode.net`_), the mailing list (`subscribe`_,
`archive`_), and `GitHub`_. Humor is appreciated, but remember that some nuance
may be lost in the medium and plan accordingly.
.. _`request an invite`: https://slack.hypothes.is
.. _`log in once you've created an account`: https://hypothesis-open.slack.com/
.. _#hypothes.is: http://webchat.freenode.net/?channels=hypothes.is
.. _freenode.net: http://freenode.net/
.. _subscribe: mailto:dev+subscribe@list.hypothes.is
.. _archive: https://groups.google.com/a/list.hypothes.is/forum/#!forum/dev
.. _GitHub: http://github.com/hypothesis/h
If you plan to be an active contributor please join our mailing list
to coordinate development effort. This coordination helps us avoid
duplicating efforts and raises the level of collaboration. For small
fixes, feel free to open a pull request without any prior discussion.
View
@@ -1,7 +1,7 @@
# -*- coding: utf-8 -*-
# pylint: disable=invalid-name
#
# The Hypothesis Annotation Framework documentation build configuration file, created by
# The h documentation build configuration file, created by
# sphinx-quickstart on Fri Oct 12 19:21:42 2012.
#
# This file is execfile()d with the current directory set to its containing dir.
@@ -13,6 +13,9 @@
# serve to show the default.
import sys, os
from datetime import datetime
CURRENT_YEAR = datetime.now().year
on_rtd = os.environ.get('READTHEDOCS') == 'True'
@@ -33,7 +36,6 @@
'sphinx.ext.intersphinx',
'sphinx.ext.viewcode',
'sphinx.ext.todo',
'sphinxcontrib.httpdomain',
]
# Render .. todo:: directives in the output.
@@ -52,8 +54,8 @@
master_doc = 'index'
# General information about the project.
project = u'The Hypothesis Annotation Framework'
copyright = u'2012, Hypothes.is Project and contributors'
project = u'h'
copyright = u'2012-{}, Hypothes.is Project and contributors'.format(CURRENT_YEAR)
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
@@ -76,7 +78,9 @@
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
exclude_patterns = [
'_build',
]
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
@@ -141,6 +145,7 @@ def setup(app):
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
#html_static_path = ['_static']
html_extra_path = ['_extra']
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
@@ -184,7 +189,7 @@ def setup(app):
#html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'TheHypothesisAnnotationFrameworkdoc'
htmlhelp_basename = 'h'
# -- Options for LaTeX output --------------------------------------------------
@@ -203,7 +208,7 @@ def setup(app):
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
('index', 'TheHypothesisAnnotationFramework.tex', u'The Hypothesis Annotation Framework Documentation',
('index', 'h.tex', u'The h Documentation',
u'Hypothes.is Project and contributors', 'manual'),
]
@@ -233,7 +238,7 @@ def setup(app):
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
('index', 'thehypothesisannotationframework', u'The Hypothesis Annotation Framework Documentation',
('index', 'h', u'The h Documentation',
[u'Hypothes.is Project and contributors'], 1)
]
@@ -247,8 +252,8 @@ def setup(app):
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'TheHypothesisAnnotationFramework', u'The Hypothesis Annotation Framework Documentation',
u'Hypothes.is Project and contributors', 'TheHypothesisAnnotationFramework', 'One line description of project.',
('index', 'h', u'The h Documentation',
u'Hypothes.is Project and contributors', 'h', 'One line description of project.',
'Miscellaneous'),
]
@@ -261,6 +266,6 @@ def setup(app):
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#texinfo_show_urls = 'footnote'
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'http://docs.python.org/': None}
intersphinx_mapping = {
'client': ('https://h.readthedocs.io/projects/client/en/latest/', None),
}
View
@@ -0,0 +1,19 @@
Accessing the admin interface
-----------------------------
To access the admin interface, a user must be logged in and have admin
permissions. To grant admin permissions to a user, run the following command:
.. code-block:: bash
hypothesis user admin <username>
For example, to make the user 'joe' an admin in the development environment:
.. code-block:: bash
hypothesis --dev user admin joe
When this user signs in they can now access the adminstration panel at
``/admin``. The administration panel has options for managing users and optional
features.
View
@@ -0,0 +1,5 @@
:orphan:
The Hypothesis browser extensions now live in `their own repository`_.
.. _their own repository: https://github.com/hypothesis/browser-extension
View
@@ -0,0 +1,30 @@
Contributor License Agreement
=============================
Before submitting significant contributions, we ask that you sign one of
our Contributor License Agreements. This practice ensures that the
rights of contributors to their contributions are preserved and
protects the ongoing availability of the project and a commitment to
make it available for anyone to use with as few restrictions as
possible.
If contributing as an individual please sign the CLA for individuals:
- `CLA for individuals, HTML <http://hypothes.is/contribute/individual-cla>`_
- `CLA for individuals, PDF <https://d242fdlp0qlcia.cloudfront.net/uploads/2015/11/03161955/Hypothes.is-Project-Individual.pdf>`_
If making contributions on behalf of an employer, please sign the CLA for
employees:
- `CLA for employers, HTML <http://hypothes.is/contribute/entity-cla>`_
- `CLA for employers, PDF <https://d242fdlp0qlcia.cloudfront.net/uploads/2015/11/03161955/Hypothes.is-Project-Entity.pdf>`_
A completed form can either be sent by electronic mail to
license@hypothes.is or via conventional mail at the address below. If
you have any questions, please contact us.
::
Hypothes.is Project
2261 Market St #632
SF, CA 94114
View
@@ -0,0 +1,78 @@
Code style
==========
This section contains some code style guidelines for the different programming
languages used in the project.
Python
------
Follow `PEP 8 <https://www.python.org/dev/peps/pep-0008/>`_, the linting tools
below can find PEP 8 problems for you automatically.
Docstrings
``````````
All public modules, functions, classes, and methods should normally have
docstrings. See `PEP 257 <https://www.python.org/dev/peps/pep-0257/>`_ for
general advice on how to write docstrings (although we don't write module
docstrings that describe every object exported by the module).
The ``pep257`` tool (which is run by ``prospector``, see below) can point out
PEP 257 violations for you.
It's good to use Sphinx references in docstrings because they can be syntax
highlighted and hyperlinked when the docstrings are extracted by Sphinx into
HTML documentation, and because Sphinx can print warnings for references that
are no longer correct:
* Use `Sphinx Python cross-references <http://www.sphinx-doc.org/en/stable/domains.html#cross-referencing-python-objects>`_
to reference other Python modules, functions etc. from docstrings (there are
also Sphinx domains for referencing
objects from other programming languages, such as
`JavaScript <http://www.sphinx-doc.org/en/stable/domains.html#the-javascript-domain>`_).
* Use `Sphinx info field lists <http://www.sphinx-doc.org/en/stable/domains.html#info-field-lists>`_
to document parameters, return values and exceptions that might be raised.
* You can also use `reStructuredText <http://www.sphinx-doc.org/en/stable/rest.html>`_
to add markup (bold, code samples, lists, etc) to docstrings.
Linting
```````
We use `Flake8 <https://pypi.python.org/pypi/flake8>`_ for linting Python code.
Lint checks are run as part of our continuous integration builds and can be run
locally using ``make lint``. You may find it helpful to use a flake8 plugin for
your editor to get live feedback as you make changes.
Automated code formatting
`````````````````````````
You can use `YAPF <https://github.com/google/yapf>`_ (along with the YAPF
configuration in this git repo) to automatically reformat Python code.
We don't strictly adhere to YAPF-generated formatting but it can be a useful
convenience.
Additional reading
``````````````````
* Although we don't strictly follow all of it, the
`Google Python Style Guide <https://google.github.io/styleguide/pyguide.html>`_
contains a lot of good advice.
Front-end Development
---------------------
See the `Hypothesis Front-end Toolkit`_ repository for documentation on code
style and tooling for JavaScript, CSS and HTML.
We use `ESLint <https://eslint.org>`_ for linting front-end code. Use ``gulp
lint`` to run ESlint locally. You may find it helpful to install an ESLint
plugin for your editor to get live feedback as you make changes.
.. _Hypothesis Front-end Toolkit: https://github.com/hypothesis/frontend-toolkit
View
@@ -0,0 +1,34 @@
Writing documentation
=====================
To build the documentation issue the ``make dirhtml`` command from the ``docs``
directory:
.. code-block:: bash
cd docs
make dirhtml
When the build finishes you can view the documentation by running a static
web server in the newly generated ``_build/dirhtml`` directory. For example:
.. code-block:: bash
cd _build/dirhtml; python -m SimpleHTTPServer; cd -
------------------------------------
API Documentation
------------------------------------
The Hypothesis API documentation is rendered using `ReDoc <https://github.com/Rebilly/ReDoc>`_,
a JavaScript tool for generating OpenAPI/Swagger reference documentation.
The documentation-building process above will regenerate API documentation output without intervention,
but if you are making changes to the API specification (`hypothesis.yaml`), you may find it
convenient to use the `ReDoc CLI tool <https://github.com/Rebilly/ReDoc/blob/master/cli/README.md>`_,
which can watch the spec file for changes:
.. code-block:: bash
npm install -g redoc-cli
redoc-cli serve [path-to-spec] --watch
View
@@ -0,0 +1,21 @@
Environment Variables
=====================
This section documents the environment variables supported by h.
.. envvar:: CLIENT_URL
The URL at which the Hypothesis client code is hosted.
This is the URL to the client entrypoint script, by default
https://cdn.hypothes.is/hypothesis.
.. envvar:: CLIENT_OAUTH_ID
The OAuth client ID for the Hypothesis client on pages that embed it using
the service's /embed.js script.
.. envvar:: CLIENT_RPC_ALLOWED_ORIGINS
The list of origins that the client will respond to cross-origin RPC
requests from. A space-separated list of origins. For example:
``https://lti.hypothes.is https://example.com http://localhost.com:8001``.
View
@@ -0,0 +1,20 @@
Developing Hypothesis
=====================
The following sections document how to setup a development environment for h
and how to contribute code or documentation to the project.
.. toctree::
:maxdepth: 1
cla
install
administration
introduction
submitting-a-pr
code-style
testing
documentation
ssl
making-changes-to-model-code
envvars
View
@@ -0,0 +1,334 @@
Website dev install
===================
The code for the https://hypothes.is/ website and API lives in a
`Git repo named h`_. To get this code running in a local development
environment the first thing you need to do is install h's system dependencies.
.. seealso::
This page documents how to setup a development install of h.
For installing the Hypothesis client for development see
https://github.com/hypothesis/client/, and for the browser extension
see https://github.com/hypothesis/browser-extension.
Follow either the
`Installing the system dependencies on Ubuntu`_ or the
`Installing the system dependencies on macOS`_ section below, depending on which
operating system you're using, then move on to `Getting the h source code from GitHub`_ and
the sections that follow it.
Installing the system dependencies on Ubuntu
--------------------------------------------
This section describes how to install h's system dependencies on Ubuntu.
These steps will also probably work with few or no changes on other versions
of Ubuntu, Debian, or other Debian-based GNU/Linux distributions.
Install the following packages:
.. code-block:: bash
sudo apt-get install -y --no-install-recommends \
build-essential \
git \
libevent-dev \
libffi-dev \
libfontconfig \
libpq-dev \
libssl-dev \
python-dev \
python-pip \
python-virtualenv
Install node by following the
`instructions on nodejs.org <https://nodejs.org/en/download/package-manager/>`_
(the version of the nodejs package in the standard Ubuntu repositories is too
old).
Upgrade pip, virtualenv and npm:
.. code-block:: bash
sudo pip install -U pip virtualenv
sudo npm install -g npm
Installing the system dependencies on macOS
-------------------------------------------
This section describes how to install h's system dependencies on macOS.
The instructions that follow assume you have previously installed Homebrew_.
.. _Homebrew: http://brew.sh/
Install the following packages:
.. code-block:: bash
brew install \
libevent \
libffi \
node \
postgresql \
python
.. note:: Unfortunately you need to install the ``postgresql`` package, because
Homebrew does not currently provide a standalone ``libpq`` package.
Upgrade pip and virtualenv:
.. code-block:: bash
pip install -U pip virtualenv
Getting the h source code from GitHub
-------------------------------------
Use ``git`` to download the h source code:
.. code-block:: bash
git clone https://github.com/hypothesis/h.git
This will download the code into an ``h`` directory in your current working
directory.
Change into the ``h`` directory from the remainder of the installation
process:
.. code-block:: bash
cd h
Installing the services
-----------------------
h requires the following external services:
- PostgreSQL_ 9.4+
- Elasticsearch_ v6, with the `Elasticsearch ICU Analysis`_ plugin
- RabbitMQ_ v3.5+
.. _PostgreSQL: http://www.postgresql.org/
.. _Elasticsearch: https://www.elastic.co/
.. _Elasticsearch ICU Analysis: https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html
.. _RabbitMQ: https://rabbitmq.com/
You can install these services however you want, but the easiest way is by using
Docker and Docker Compose. This should work on any operating system that Docker
can be installed on:
1. Install Docker and Docker Compose by following the instructions on the
`Docker website`_.
2. Run Docker Compose:
.. code-block:: bash
docker-compose up
You'll now have some Docker containers running the PostgreSQL, RabbitMQ, and
Elasticsearch services. You should be able to see them by running ``docker
ps``. You should also be able to visit your Elasticsearch service by opening
http://localhost:9200/ in a browser, and connect to your PostgreSQL by
running ``psql postgresql://postgres@localhost/postgres`` (if you have psql
installed).
.. note::
If at any point you want to shut the containers down, you can
interrupt the ``docker-compose`` command. If you want to run the
containers in the background, you can run ``docker-compose up -d``.
3. Create the `htest` database in the ``postgres`` container. This is needed
to run the h tests:
.. code-block:: bash
docker-compose exec postgres psql -U postgres -c "CREATE DATABASE htest;"
.. tip::
You can use Docker Compose image to open a psql shell in your Dockerized
database container without having to install psql on your host machine. Do:
.. code-block:: bash
docker-compose exec postgres psql -U postgres
.. tip::
Use the ``docker-compose logs`` command to see what's going on inside your
Docker containers, for example:
.. code-block:: bash
docker-compose logs rabbit
For more on how to use Docker and Docker Compose see the `Docker website`_.
.. _Docker website: https://docs.docker.com/compose/install/
Installing the gulp command
---------------------------
Install ``gulp-cli`` to get the ``gulp`` command:
.. code-block:: bash
sudo npm install -g gulp-cli
Creating a Python virtual environment
-------------------------------------
Create a Python virtual environment to install and run the h Python code and
Python dependencies in:
.. code-block:: bash
virtualenv .venv
.. _activating_your_virtual_environment:
Activating your virtual environment
-----------------------------------
Activate the virtual environment that you've created:
.. code-block:: bash
source .venv/bin/activate
.. tip::
You'll need to re-activate this virtualenv with the
``source .venv/bin/activate`` command each time you open a new terminal,
before running h.
See the `Virtual Environments`_ section in the Hitchhiker's guide to
Python for an introduction to Python virtual environments.
.. _Virtual Environments: http://docs.python-guide.org/en/latest/dev/virtualenvs/
Running h
---------
Start a development server:
.. code-block:: bash
make dev
The first time you run ``make dev`` it might take a while to start because
it'll need to install the application dependencies and build the client assets.
This will start the server on port 5000 (http://localhost:5000), reload the
application whenever changes are made to the source code, and restart it should
it crash for some reason.
.. _running-the-tests:
Running h's tests
-----------------
There are test suites for both the frontend and backend code. To run the
complete set of tests, run:
.. code-block:: bash
make test
To run the frontend test suite only, run the appropriate test task with gulp.
For example:
.. code-block:: bash
gulp test
When working on the front-end code, you can run the Karma test runner in
auto-watch mode which will re-run the tests whenever a change is made to the
source code. To start the test runner in auto-watch mode, run:
.. code-block:: bash
gulp test-watch
To run only a subset of tests for front-end code, use the ``--grep``
argument or mocha's `.only()`_ modifier.
.. code-block:: bash
gulp test-watch --grep <pattern>
.. _.only(): http://jaketrent.com/post/run-single-mocha-test/
Debugging h
-----------
The `pyramid_debugtoolbar`_ package is loaded by default in the development
environment. This will provide stack traces for exceptions and allow basic
debugging. A more advanced profiler can also be accessed at the /_debug_toolbar
path.
http://localhost:5000/_debug_toolbar/
Check out the `pyramid_debugtoolbar documentation`_ for information on how to
use and configure it.
.. _pyramid_debugtoolbar: https://github.com/Pylons/pyramid_debugtoolbar
.. _pyramid_debugtoolbar documentation: http://docs.pylonsproject.org/projects/pyramid-debugtoolbar/en/latest/
You can turn on SQL query logging by setting the ``DEBUG_QUERY``
environment variable (to any value). Set it to the special value ``trace`` to
turn on result set logging as well.
Feature flags
-------------
Features flags allow admins to enable or disable features for certain groups
of users. You can enable or disable them from the Administration Dashboard.
To access the Administration Dashboard, you will need to first create a
user account in your local instance of H and then give that account
admin access rights using H's command-line tools.
See the :doc:`/developing/administration` documentation for information
on how to give the initial user admin rights and access the Administration
Dashboard.
Troubleshooting
---------------
Cannot connect to the Docker daemon
```````````````````````````````````
If you get an error that looks like this when trying to run ``docker``
commands::
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Error: failed to start containers: postgres
it could be because you don't have permission to access the Unix socket that
the docker daemon is bound to. On some operating systems (e.g. Linux) you need
to either:
* Take additional steps during Docker installation to give your Unix user
access to the Docker daemon's port (consult the installation
instructions for your operating system on the `Docker website`_), or
* Prefix all ``docker`` commands with ``sudo``.
.. _Git repo named h: https://github.com/hypothesis/h/
View
@@ -0,0 +1,86 @@
=================================
An introduction to the h codebase
=================================
If you're new to the team, or to the Hypothesis project, you probably want to
get up to speed as quickly as possible so you can make meaningful improvements
to ``h``. This document is intended to serve as a brief "orientation guide" to
help you find your way around the codebase.
This document is a living guide, and is at risk of becoming outdated as we
continually improve the software. If you spot things that are out of date,
please submit a pull request to update this document.
**This guide was last updated on 11 Apr 2017.**
----------------------------
A lightning guide to Pyramid
----------------------------
The ``h`` codebase is principally a Pyramid_ web application. Pyramid is more of
a library of utilities than a "framework" in the sense of Django or Rails. As
such, the structure (or lack thereof) in our application is provided by our own
conventions, and not the framework itself.
Important things to know about Pyramid that may differ from other web
application frameworks you've used:
- Application setup is handled explicitly by a distinct configuration step at
boot. You'll note ``includeme`` functions in some modules -- these are part of
that configuration system.
- The ``request`` object is passed into views explicitly rather than through a
threadlocal (AKA "global variable"), and is often passed around explicitly to
provide request context to other parts of the application. This has a number
of advantages but can get a bit messy if not managed appropriately.
You can read more about the distinguishing features of Pyramid in the `excellent
Pyramid documentation`_.
.. _Pyramid: https://trypyramid.com
.. _excellent Pyramid documentation: http://docs.pylonsproject.org/projects/pyramid/en/latest/narr/introduction.html
----------------------
Application components
----------------------
The important parts of the ``h`` application can be broken down into:
Models
SQLAlchemy_ models representing the data objects that live in our database.
These live in ``h.models``.
Views (and templates)
Views are code that is called in response to a particular request. Templates
can be used to render the output of a particular view, typically as HTML.
With a few exceptions, views live in ``h.views``, and templates live
in the ``h/templates/`` directory.
Services
Putting business logic in views can quickly lead to views that are difficult
to test. Putting business logic in models can lead to model objects with a
large number of responsibilities.
As such, we put most business logic into so-called "services." These are
objects with behaviour and (optionally) state, which can be retrieved from
the ``request`` object.
Services live in ``h.services``.
Tasks
Tasks are bits of code that run in background workers and which can be
easily triggered from within the context of a request.
We use Celery_ for background tasks, and task definitions can be found in
``h.tasks``.
There are a number of other modules and packages in the ``h`` repository. Some
of these (e.g. ``h.auth``, ``h.settings``) do one-off setup for a
booting application. Others may be business logic that dates from before we
introduced the `services pattern`_, and thus might be more appropriately moved
into a service in the future.
.. _SQLAlchemy: http://www.sqlalchemy.org/
.. _Celery: http://www.celeryproject.org/
.. _services pattern: https://h.readthedocs.io/en/latest/arch/adr-002/
View
@@ -0,0 +1,197 @@
============================
Making changes to model code
============================
---------------------------------
Guidelines for writing model code
---------------------------------
No length limits on database columns
====================================
Don't put any length limits on your database columns (for example
``sqlalchemy.Column(sqlalchemy.Unicode(30), ...)``). These can cause painful
database migrations.
Always use ``sqlalchemy.UnicodeText()`` with no length limit as the type for
text columns in the database (you can also use ``sqlalchemy.Text()`` if you're
sure the column will never receive non-ASCII characters).
When necessary validate the lengths of strings in Python code instead.
This can be done using `SQLAlchemy validators <http://docs.sqlalchemy.org/en/rel_1_0/orm/mapped_attributes.html>`_
in model code.
View callables for HTML forms should also use Colander schemas to validate user
input, in addition to any validation done in the model code, because Colander
supports returning per-field errors to the user.
------------------------------------
Creating a database migration script
------------------------------------
If you've made any changes to the database schema (for example: added or
removed a SQLAlchemy ORM class, or added, removed or modified a
``sqlalchemy.Column`` on an ORM class) then you need to create a database
migration script that can be used to upgrade the production database from the
previous to your new schema.
We use `Alembic <https://alembic.readthedocs.io/en/latest/>`_ to create and run
migration scripts. See the Alembic docs (and look at existing scripts in
`h/migrations/versions <https://github.com/hypothesis/h/tree/master/h/migrations/versions>`_)
for details. The ``hypothesis migrate`` command is a wrapper around Alembic. The
steps to create a new migration script for h are:
1. Create the revision script by running ``bin/hypothesis migrate revision``, for example:
.. code-block:: bash
bin/hypothesis migrate revision -m "Add the foobar table"
This will create a new script in ``h/migrations/versions/``.
2. Edit the generated script, fill in the ``upgrade()`` and ``downgrade()``
methods.
See https://alembic.readthedocs.io/en/latest/ops.html#ops for details.
.. note::
Not every migration should have a ``downgrade()`` method. For example if
the upgrade removes a max length constraint on a text field, so that
values longer than the previous max length can now be entered, then a
downgrade that adds the constraint back may not work with data created
using the updated schema.
3. Stamp your database.
Before running any upgrades or downgrades you need to stamp the database
with its current revision, so Alembic knows which migration scripts to run:
.. code-block:: bash
bin/hypothesis migrate stamp <revision_id>
``<revision_id>`` should be the revision corresponding to the version of the
code that was present when the current database was created. The will
usually be the ``down_revision`` from the migration script that you've just
generated.
4. Test your ``upgrade()`` function by upgrading your database to the most
recent revision. This will run all migration scripts newer than the revision
that your db is currently stamped with, which usually means just your new
revision script:
.. code-block:: bash
bin/hypothesis migrate upgrade head
After running this command inspect your database's schema to check that it's
as expected, and run h to check that everything is working.
.. note::
You should make sure that there's some repesentative data in the relevant
columns of the database before testing upgrading and downgrading it.
Some migration script crashes will only happen when there's data present.
5. Test your ``downgrade()`` function:
.. code-block:: bash
bin/hypothesis migrate downgrade -1
After running this command inspect your database's schema to check that it's
as expected. You can then upgrade it again:
.. code-block:: bash
bin/hypothesis migrate upgrade +1
Batch deletes and updates in migration scripts
==============================================
It's important that migration scripts don't lock database tables for too long,
so that when the script is run on the production database concurrent database
transactions from web requests aren't held up.
An SQL ``DELETE`` command acquires a ``FOR UPDATE`` row-level lock on the
rows that it selects to delete. An ``UPDATE`` acquires a ``FOR UPDATE`` lock on
the selected rows *if the update modifies any columns that have a unique index
on them that can be used in a foreign key*. While held this ``FOR UPDATE`` lock
prevents any concurrent transactions from modifying or deleting the selected
rows.
So if your migration script is going to ``DELETE`` or ``UPDATE`` a large number
of rows at once and committing that transaction is going to take a long time
(longer than 100ms) then you should instead do multiple ``DELETE``\s or
``UPDATE``\s of smaller numbers of rows, committing each as a separate
transaction. This will allow concurrent transactions to be sequenced in-between
your migration script's transactions.
For example, here's some Python code that deletes all the rows that match a
query in batches of 25:
.. code-block:: python
query = <some sqlalchemy query>
query = query.limit(25)
while True:
if query.count() == 0:
break
for row in query:
session.delete(row)
session.commit()
Separate data and schema migrations
===================================
It's easier for deployment if you do *data migrations* (code that creates,
updates or deletes rows) and *schema migrations* (code that modifies the
database *schema*, for example adding a new column to a table) in separate
migration scripts instead of combining them into one script. If you have a
single migration that needs to modify some data and then make a schema change,
implement it as two consecutive migration scripts instead.
Don't import model classes into migration scripts
=================================================
Don't import model classes, for example ``from h.models import Annotation``,
in migration scripts. Instead copy and paste the ``Annotation`` class into your
migration script.
This is because the script needs the schema of the ``Annotation`` class
as it was at a particular point in time, which may be different from the
schema in ``h.models.Annotation`` when the script is run in the future.
The script's copy of the class usually only needs to contain the definitions of
the primary key column(s) and any other columns that the script uses, and only
needs the name and type attributes of these columns. Other attributes of the
columns, columns that the script doesn't use, and methods can usually be left
out of the script's copy of the model class.
Troubleshooting migration scripts
=================================
(sqlite3.OperationalError) near "ALTER"
---------------------------------------
SQLite doesn't support ``ALTER TABLE``. To get around this, use
`Alembic's batch mode <https://alembic.readthedocs.io/en/latest/batch.html>`_.
Cannot add a NOT NULL column with default value NULL
----------------------------------------------------
If you're adding a column to the model with ``nullable=False`` then when the
database is upgraded it needs to insert values into this column for each of
the already existing rows in the table, and it can't just insert ``NULL`` as it
normally would. So you need to tell the database what default value to insert
here.
``default=`` isn't enough (that's only used when the application is creating
data, not when migration scripts are running), you need to add a
``server_default=`` argument to your ``add_column()`` call.
See the existing migration scripts for examples.
View
@@ -0,0 +1,77 @@
=================================
Serving h over SSL in development
=================================
If you want to annotate a site that's served over HTTPS then you'll need to
serve h over HTTPS as well, since the browser will refuse to load external
scripts (eg. H's bookmarklet) via HTTP on a page served via HTTPS.
To serve your local dev instance of h over HTTPS:
1. Generate a private key and certificate signing request::
openssl req -newkey rsa:1024 -nodes -keyout .tlskey.pem -out .tlscsr.pem
2. Generate a self-signed certificate::
openssl x509 -req -in .tlscsr.pem -signkey .tlskey.pem -out .tlscert.pem
3. Run ``hypothesis devserver`` with the ``--https`` option::
hypothesis devserver --https
4. Since the certificate is self-signed, you will need to instruct your browser to
trust it explicitly by visiting https://localhost:5000 and selecting the option
to bypass the validation error.
---------------
Troubleshooting
---------------
Insecure Response errors in the console
=======================================
The sidebar fails to load and you see ``net::ERR_INSECURE_RESPONSE`` errors in
the console. You need to open https://localhost:5000 and tell the browser to allow
access to the site even though the certificate isn't known.
Server not found, the connection was reset
==========================================
When you're serving h over SSL in development making non-SSL requests to h
won't work.
If you get an error like **Server not found** or **The connection was reset**
in your browser (it varies from browser to browser), possibly accompanied by a
gunicorn crash with
``AttributeError: 'NoneType' object has no attribute 'uri'``, make sure that
you're loading https://localhost:5000 in your browser, not ``http://``.
WebSocket closed abnormally, code: 1006
=======================================
If you see the error message
**Error: WebSocket closed abnormally, code: 1006** in your browser,
possibly accompanied by another error message like
**Firefox can't establish a connection to the server at wss://localhost:5001/ws**,
this can be because you need to add a security exception to allow your browser
to connect to the websocket. Visit https://localhost:5001 in a browser tab and
add a security exception then try again.
403 response when connecting to WebSocket
=========================================
If your browser is getting a 403 response when trying to connect to the
WebSocket along with error messages like these:
* WebSocket connection to 'wss://localhost:5001/ws' failed: Error during WebSocket handshake: Unexpected response code: 403
* Check that your H service is configured to allow WebSocket connections from https://127.0.0.1:5000
* WebSocket closed abnormally, code: 1006
* WebSocket closed abnormally, code: 1001
* Firefox can't establish a connection to the server at wss://localhost:5001/ws
make sure that you're opening https://localhost:5000 in your browser and
*not* https://127.0.0.1:5000.
View
@@ -0,0 +1,57 @@
Submitting a Pull Request
=========================
To submit code or documentation to h you should submit a pull request.
For trivial changes, such as documentation changes or minor errors,
PRs may be submitted directly to master. This also applies to changes
made through the GitHub editing interface. Authors do not need to
sign the CLA for these, or follow fork or branch naming guidelines.
For any non-trivial changes, please create a branch for review. Fork
the main repository and create a local branch. Later, when the branch
is ready for review, push it to a fork and submit a pull request.
Discussion and review in the pull request is normal and expected. By
using a separate branch, it is possible to push new commits to the
pull request branch without mixing new commits from other features or
mainline development.
Some things to remember when submitting or reviewing a pull request:
- Your pull request should contain one logically separate piece of work, and
not any unrelated changes.
- When writing commit messages, please bear the following in mind:
* http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
* https://github.com/blog/831-issues-2-0-the-next-generation
Please minimize issue gardening by using the GitHub syntax for closing
issues with commit messages.
- We recommend giving your branch a relatively short, descriptive,
hyphen-delimited name. ``fix-editor-lists`` and ``tabbed-sidebar`` are good
examples of this convention.
- Don't merge on feature branches. Feature branches should merge into upstream
branches, but never contain merge commits in the other direction.
Consider using ``--rebase`` when pulling if you must keep a long-running
branch up to date. It's better to start a new branch and, if applicable, a
new pull request when performing this action on branches you have published.
- Code should follow our :doc:`coding standards <code-style>`.
- All pull requests should come with code comments. For Python code these
should be in the form of Python `docstrings`_. For AngularJS code please use
`ngdoc`_. Other documentation can be put into the ``docs/`` subdirectory, but
is not required for acceptance.
- All pull requests should come with unit tests. For the time being, functional
and integration tests should be considered optional if the project does not
have any harness set up yet.
For how to run the tests, see :ref:`running-the-tests`.
.. _docstrings: http://legacy.python.org/dev/peps/pep-0257/
.. _ngdoc: https://github.com/angular/angular.js/wiki/Writing-AngularJS-Documentation
View
@@ -0,0 +1,49 @@
Testing
=======
This section covers writing tests for the ``h`` codebase.
Getting started
---------------
Sean Hammond has written up a `guide to getting started`_ running and writing
our tests, which covers some of the tools we use (``tox`` and ``pytest``) and
some of the testing techniques they provide (factories and parametrization).
.. _guide to getting started: https://www.seanh.cc/posts/running-the-h-tests
Unit and functional tests
-------------------------
We keep our functional tests separate from our unit tests, in the
``tests/functional`` directory. Because these are slow to run, we will usually
write one or two functional tests to check a new feature works in the common
case, and unit tests for all the other cases.
Using mock objects
------------------
The ``mock`` library lets us construct fake versions of our objects to help with
testing. While this can make it easier to write fast, isolated tests, it also
makes it easier to write tests that don't reflect reality.
In an ideal world, we would always be able to use real objects instead of stubs
or mocks, but sometimes this can result in:
- complicated test setup code
- slow tests
- coupling of test assertions to non-interface implementation details
For new code, it's usually a good idea to design the code so that it's easy to
test with "real" objects, rather than stubs or mocks. It can help to make
extensive use of `value objects`_ in tested interfaces (using
``collections.namedtuple`` from the standard library, for example) and apply
the `functional core, imperative shell`_ pattern.
For older code which doesn't make testing so easy, or for code that is part of
the "imperative shell" (see link in previous paragraph) it can sometimes be
hard to test what you need without resorting to stubs or mock objects, and
that's fine.
.. _value objects: https://martinfowler.com/bliki/ValueObject.html
.. _functional core, imperative shell: https://www.destroyallsoftware.com/talks/boundaries
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View
@@ -1,21 +1,26 @@
Hypothesis
==========
Welcome to the h Documentation!
===============================
Contents:
`h <https://github.com/hypothesis/h>`_ is the web app that serves most of the
https://hypothes.is/ website, including the web annotations API at
https://hypothes.is/api/.
The `Hypothesis client <https://github.com/hypothesis/client>`_
is a browser-based annotator that is a client for h's API, see
`the client's own documentation site <https://h.readthedocs.io/projects/client/>`_
for docs about the client.
.. toctree::
:maxdepth: 2
This documentation is for:
INSTALL
administration
hacking/index
internals
api
CHANGES
* Developers working with data stored in h
* Contributors to h
Contents
--------
Indices and tables
==================
.. toctree::
:maxdepth: 1
* :ref:`genindex`
* :ref:`search`
community
publishers/index
api/index
developing/index
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
View

This file was deleted.

Oops, something went wrong.
Oops, something went wrong.