Postgres convergent evolution + Annotation presentation layer #3011

chdorner · 2016-02-24T14:44:34Z

I didn't anticipate this many changes, so the branch name doesn't really reflect the code changes anymore.

Notable changes:

Recreate Postgres model interface in the Elasticsearch models

To be able to use models regardless whether the data originally came from Elasticsearch or Postgres, I've re-created the same class structure in the Elasticsearch models. Which means we're having in-memory-only representations of Document / DocumentMeta / DocumentURI classes for Elasticsearch.

An Elasticsearch annotation, which is base structure is a dict gets automatically parsed into the aforementioned objects as soon as these properties are accessed.

The normalisation code that used to live in he Postgres migration script (scripts/migrate.py) has also moved into the Elasticsearch model, with added tests now.

API Presenter layer

There is a new module h.api.presenters with JSON presenters for the four model classes (Annotation, Document, DocumentMeta, DocumentURI).
These are built to return the same JSON structure as we do now (minus the few occasions where we normalise the data, see above).

Important: This means that regardless of feature flag, as soon as these code changes get deployed, the /api/annotations/:id endpoint will go through the new presentation layer.

Feature flag for reading from Postgres

There is a feature flag that for reading from Postgres. This currently only has an effect on the h.api.storage.fetch_annotation function, all other places (e.g. searching, or loading the created annotation before returning it) will still go Elasticsearch.

seanh · 2016-02-24T14:58:44Z

h/api/storage.py

+
+
+def _postgres_enabled():
+    return pyramid.threadlocal.get_current_request().feature('postgres')


It's a shame that you have to do the threadlocal.get_current_request() thing (I think of that as an advanced/internal Pyramid detail, better to avoid if you can), but I guess that is the right way, unless we want to add request as a param to each function in storage

I agree. According to @nickstenning, it's bad but okay in this case as it will only be in the code base for a short amount of time. As soon as we migrated and switched over the production service, this will be removed (along with a lot of other code).

@seanh we now pass the request into the storage functions because we also want to query database specifically from the passed-in DB session 👍

nickstenning · 2016-02-25T14:46:23Z

h/api/models/document.py

    @classmethod
-    def find_or_create_by_uris(cls, claimant_uri, uris,
-                               created=None, updated=None):
+    def find_by_uris(cls, claimant_uri, uris=[]):


Is there any point making a distinction between claimant_uri and uris in this method? It's needed in the find_or_create version, but I don't think it's necessary here.

Good point. Fixed the method to only accept a list of uris.

nickstenning · 2016-02-25T15:08:24Z

h/api/presenters.py

+    def __init__(self, annotation):
+        self.annotation = annotation
+
+    def dict(self):


asdict would be the conventional/idiomatic name for this method.

codecov-io · 2016-02-26T11:21:57Z

Current coverage is `69.63%`

Merging #3011 into master will increase coverage by +1.91% as of 121a7cf

@@            master   #3011   diff @@
======================================
  Files          113     114     +1
  Stmts         4059    4354   +295
  Branches       442     501    +59
  Methods          0       0       
======================================
+ Hit           2749    3032   +283
- Partial         68      81    +13
+ Missed        1242    1241     -1

Review entire Coverage Diff as of 121a7cf

Powered by Codecov. Updated on successful CI builds.

nickstenning · 2016-02-29T17:46:55Z

h/api/presenters.py

+    @property
+    def target(self):
+        return [{'source': self.annotation.target_uri,
+                 'selector': self.annotation.target_selectors or {}}]


Should default to empty list rather than empty dict.

Which will make the `h.api.storage` module switch between using Elasticsearch and PostgreSQL as the primary annotation storage.

Split it up into two methods, since we need the finding functionality in other places, but without automatically creating a document when none could be found.

For getting either the annotation's document or None when it can't be found.

This also moves the normalisation code that was previously in the Postgres migration script into the model, with added tests.

And create new elastic models for DocumentMeta and DocumentURI.

Since timestamps are stored in UTC without timezone information, we don't try to convert the datetimes and just append UTC (`+00:00`) to the end of the string.

Which merges two dictionaries recursively.

SQLAlchemy seems to figure out on its own when to flush to the database. Keeping this one flush in just to make sure that when we handle a new annotation that the previous document objects have definitely been flushed to the database and are ready to be queried.

nickstenning · 2016-03-03T11:17:17Z

LGTM. So many things we have coming up will be made easier by the presenter layer work you've done here. Awesome stuff!

Postgres convergent evolution + Annotation presentation layer

It got introduced in 3845f5a. Specifically, `transform.set_group_if_reply` should not get the request, that's why the functools.partial is there.

Due to Elasticsearch's dynamic mapping creation, we can't normalise the type string of an elastic DocumentMeta. Creating an annotation would still work since it doesn't run this code, but updating an existing annotation fails when it tries do create or update the Document ES document because it uses the normalised version which then clashes with the mapping that Elasticsearch automatically created based on old data. I've moved the normalisation part into its own method (`normalized_type`) in order to still be able to use it in the migration, and thus automatically clean up our document metadata when we migrate it from Elasticsearch to Postgres. This bug was introduced when we merged #3011 and used the new presentation layer for rendering Elasticsearch annotations. Fixes #3062

seanh reviewed Feb 24, 2016
View reviewed changes

seanh added the WIP label Feb 24, 2016

chdorner force-pushed the postgres-read-write branch 3 times, most recently from ad26749 to 178486e Compare February 25, 2016 13:55

nickstenning reviewed Feb 25, 2016
View reviewed changes

chdorner force-pushed the postgres-read-write branch from 178486e to 6bcfc40 Compare February 25, 2016 15:06

nickstenning reviewed Feb 25, 2016
View reviewed changes

chdorner force-pushed the postgres-read-write branch 4 times, most recently from 59d9ba1 to 58b8bac Compare February 26, 2016 11:14

chdorner force-pushed the postgres-read-write branch 8 times, most recently from beae598 to e70018e Compare February 29, 2016 17:20

nickstenning reviewed Feb 29, 2016
View reviewed changes

chdorner force-pushed the postgres-read-write branch 4 times, most recently from d7ff101 to cc437d3 Compare February 29, 2016 18:42

chdorner added 18 commits March 2, 2016 18:44

Add postgres feature

10e5c6a

Which will make the `h.api.storage` module switch between using Elasticsearch and PostgreSQL as the primary annotation storage.

Add feature flag support to fetching annotations

6c8aca3

Implement Pyramid ACL for Annotation model

29f3ac5

Automatically handle Annotation's normalized target uri

9eb4f2c

Refactor Document.find_or_create_by_uris

dc46714

Split it up into two methods, since we need the finding functionality in other places, but without automatically creating a document when none could be found.

Add Annotation.document() method to Postgres model

787d661

For getting either the annotation's document or None when it can't be found.

Add PG property interface to elastic Annotation model

e053dd7

This also moves the normalisation code that was previously in the Postgres migration script into the model, with added tests.

Add PG property interface to elastic Document model

587d546

And create new elastic models for DocumentMeta and DocumentURI.

Use Annotation property methods for annotations page

4efc811

Add api presenters module with iso8601 helper function

596fc63

Since timestamps are stored in UTC without timezone information, we don't try to convert the datetimes and just append UTC (`+00:00`) to the end of the string.

Add deep_merge_dict util function

b86dada

Which merges two dictionaries recursively.

Add API DocumentURIJSONPresenter

1e86830

Add API DocumentMetaJSONPresenter

6fa43a7

Add API DocumentJSONPresenter

dfd98ad

Add API AnnotationJSONPresenter

1e3a8d3

Use AnnotationJSONPresenter for api read view

62830ff

Use AnnotationJSONPresenter for rendering create view

989b086

Use AnnotationJSONPresenter for rendering update view

94b0932

chdorner force-pushed the postgres-read-write branch from e20b283 to 94b0932 Compare March 2, 2016 17:45

chdorner added 2 commits March 3, 2016 11:44

Warn when updating DocumentURI/Meta with mismatched document

bca42cb

nickstenning added a commit that referenced this pull request Mar 3, 2016

Merge pull request #3011 from hypothesis/postgres-read-write

70463f7

Postgres convergent evolution + Annotation presentation layer

nickstenning merged commit 70463f7 into master Mar 3, 2016

nickstenning deleted the postgres-read-write branch March 3, 2016 11:17

chdorner mentioned this pull request Mar 3, 2016

Use new Document API when expanding URIs #3048

Closed

chdorner added a commit that referenced this pull request Mar 3, 2016

Fix typo that got introduced during rebasing of #3011.

661e193

It got introduced in 3845f5a. Specifically, `transform.set_group_if_reply` should not get the request, that's why the functools.partial is there.

chdorner mentioned this pull request Mar 11, 2016

Split out elastic.DocumentMeta type normalisation #3090

Merged

chdorner mentioned this pull request Mar 31, 2016

Postgres Write 2/4: Model code for creating new annotations in Postgres #3151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres convergent evolution + Annotation presentation layer #3011

Postgres convergent evolution + Annotation presentation layer #3011

chdorner commented Feb 24, 2016

seanh Feb 24, 2016

chdorner Feb 24, 2016

chdorner Mar 1, 2016

nickstenning Feb 25, 2016

chdorner Feb 29, 2016

nickstenning Feb 25, 2016

chdorner Feb 25, 2016

codecov-io commented Feb 26, 2016

nickstenning Feb 29, 2016

chdorner Feb 29, 2016

nickstenning commented Mar 3, 2016



		def _postgres_enabled():
		return pyramid.threadlocal.get_current_request().feature('postgres')

Postgres convergent evolution + Annotation presentation layer #3011

Postgres convergent evolution + Annotation presentation layer #3011

Conversation

chdorner commented Feb 24, 2016

Recreate Postgres model interface in the Elasticsearch models

API Presenter layer

Feature flag for reading from Postgres

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Feb 26, 2016

Current coverage is 69.63%

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nickstenning commented Mar 3, 2016

Current coverage is `69.63%`