Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres convergent evolution + Annotation presentation layer #3011

Merged
merged 22 commits into from
Mar 3, 2016

Conversation

chdorner
Copy link
Contributor

I didn't anticipate this many changes, so the branch name doesn't really reflect the code changes anymore.

Notable changes:

Recreate Postgres model interface in the Elasticsearch models

To be able to use models regardless whether the data originally came from Elasticsearch or Postgres, I've re-created the same class structure in the Elasticsearch models. Which means we're having in-memory-only representations of Document / DocumentMeta / DocumentURI classes for Elasticsearch.

An Elasticsearch annotation, which is base structure is a dict gets automatically parsed into the aforementioned objects as soon as these properties are accessed.

The normalisation code that used to live in he Postgres migration script (scripts/migrate.py) has also moved into the Elasticsearch model, with added tests now.

API Presenter layer

There is a new module h.api.presenters with JSON presenters for the four model classes (Annotation, Document, DocumentMeta, DocumentURI).
These are built to return the same JSON structure as we do now (minus the few occasions where we normalise the data, see above).

Important: This means that regardless of feature flag, as soon as these code changes get deployed, the /api/annotations/:id endpoint will go through the new presentation layer.

Feature flag for reading from Postgres

There is a feature flag that for reading from Postgres. This currently only has an effect on the h.api.storage.fetch_annotation function, all other places (e.g. searching, or loading the created annotation before returning it) will still go Elasticsearch.



def _postgres_enabled():
return pyramid.threadlocal.get_current_request().feature('postgres')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame that you have to do the threadlocal.get_current_request() thing (I think of that as an advanced/internal Pyramid detail, better to avoid if you can), but I guess that is the right way, unless we want to add request as a param to each function in storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. According to @nickstenning, it's bad but okay in this case as it will only be in the code base for a short amount of time. As soon as we migrated and switched over the production service, this will be removed (along with a lot of other code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanh we now pass the request into the storage functions because we also want to query database specifically from the passed-in DB session 👍

@seanh seanh added the WIP label Feb 24, 2016
@chdorner chdorner force-pushed the postgres-read-write branch 3 times, most recently from ad26749 to 178486e Compare February 25, 2016 13:55
@classmethod
def find_or_create_by_uris(cls, claimant_uri, uris,
created=None, updated=None):
def find_by_uris(cls, claimant_uri, uris=[]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any point making a distinction between claimant_uri and uris in this method? It's needed in the find_or_create version, but I don't think it's necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed the method to only accept a list of uris.

def __init__(self, annotation):
self.annotation = annotation

def dict(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asdict would be the conventional/idiomatic name for this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@chdorner chdorner force-pushed the postgres-read-write branch 4 times, most recently from 59d9ba1 to 58b8bac Compare February 26, 2016 11:14
@codecov-io
Copy link

Current coverage is 69.63%

Merging #3011 into master will increase coverage by +1.91% as of 121a7cf

@@            master   #3011   diff @@
======================================
  Files          113     114     +1
  Stmts         4059    4354   +295
  Branches       442     501    +59
  Methods          0       0       
======================================
+ Hit           2749    3032   +283
- Partial         68      81    +13
+ Missed        1242    1241     -1

Review entire Coverage Diff as of 121a7cf

Powered by Codecov. Updated on successful CI builds.

@chdorner chdorner force-pushed the postgres-read-write branch 8 times, most recently from beae598 to e70018e Compare February 29, 2016 17:20
@property
def target(self):
return [{'source': self.annotation.target_uri,
'selector': self.annotation.target_selectors or {}}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should default to empty list rather than empty dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@chdorner chdorner force-pushed the postgres-read-write branch 4 times, most recently from d7ff101 to cc437d3 Compare February 29, 2016 18:42
Which will make the `h.api.storage` module switch between using
Elasticsearch and PostgreSQL as the primary annotation storage.
Split it up into two methods, since we need the finding functionality in
other places, but without automatically creating a document when none
could be found.
For getting either the annotation's document or None when it can't be
found.
This also moves the normalisation code that was previously in the
Postgres migration script into the model, with added tests.
And create new elastic models for DocumentMeta and DocumentURI.
Since timestamps are stored in UTC without timezone information, we
don't try to convert the datetimes and just append UTC (`+00:00`) to the
end of the string.
Which merges two dictionaries recursively.
SQLAlchemy seems to figure out on its own when to flush to the database.
Keeping this one flush in just to make sure that when we handle a new
annotation that the previous document objects have definitely been
flushed to the database and are ready to be queried.
@nickstenning
Copy link
Contributor

LGTM. So many things we have coming up will be made easier by the presenter layer work you've done here. Awesome stuff!

nickstenning added a commit that referenced this pull request Mar 3, 2016
Postgres convergent evolution + Annotation presentation layer
@nickstenning nickstenning merged commit 70463f7 into master Mar 3, 2016
@nickstenning nickstenning deleted the postgres-read-write branch March 3, 2016 11:17
chdorner added a commit that referenced this pull request Mar 3, 2016
It got introduced in 3845f5a.
Specifically, `transform.set_group_if_reply` should not get the request,
that's why the functools.partial is there.
chdorner added a commit that referenced this pull request Mar 11, 2016
Due to Elasticsearch's dynamic mapping creation, we can't normalise the
type string of an elastic DocumentMeta. Creating an annotation would
still work since it doesn't run this code, but updating an existing
annotation fails when it tries do create or update the Document ES
document because it uses the normalised version which then clashes with
the mapping that Elasticsearch automatically created based on old data.

I've moved the normalisation part into its own method
(`normalized_type`) in order to still be able to use it in the
migration, and thus automatically clean up our document metadata when we
migrate it from Elasticsearch to Postgres.

This bug was introduced when we merged #3011 and used the new
presentation layer for rendering Elasticsearch annotations.

Fixes #3062
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants