-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres convergent evolution + Annotation presentation layer #3011
Conversation
|
||
|
||
def _postgres_enabled(): | ||
return pyramid.threadlocal.get_current_request().feature('postgres') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame that you have to do the threadlocal.get_current_request()
thing (I think of that as an advanced/internal Pyramid detail, better to avoid if you can), but I guess that is the right way, unless we want to add request
as a param to each function in storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. According to @nickstenning, it's bad but okay in this case as it will only be in the code base for a short amount of time. As soon as we migrated and switched over the production service, this will be removed (along with a lot of other code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanh we now pass the request
into the storage functions because we also want to query database specifically from the passed-in DB session 👍
ad26749
to
178486e
Compare
@classmethod | ||
def find_or_create_by_uris(cls, claimant_uri, uris, | ||
created=None, updated=None): | ||
def find_by_uris(cls, claimant_uri, uris=[]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any point making a distinction between claimant_uri
and uris
in this method? It's needed in the find_or_create
version, but I don't think it's necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed the method to only accept a list of uris.
178486e
to
6bcfc40
Compare
def __init__(self, annotation): | ||
self.annotation = annotation | ||
|
||
def dict(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asdict
would be the conventional/idiomatic name for this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
59d9ba1
to
58b8bac
Compare
Current coverage is
|
beae598
to
e70018e
Compare
@property | ||
def target(self): | ||
return [{'source': self.annotation.target_uri, | ||
'selector': self.annotation.target_selectors or {}}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should default to empty list rather than empty dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
d7ff101
to
cc437d3
Compare
Which will make the `h.api.storage` module switch between using Elasticsearch and PostgreSQL as the primary annotation storage.
Split it up into two methods, since we need the finding functionality in other places, but without automatically creating a document when none could be found.
For getting either the annotation's document or None when it can't be found.
This also moves the normalisation code that was previously in the Postgres migration script into the model, with added tests.
And create new elastic models for DocumentMeta and DocumentURI.
Since timestamps are stored in UTC without timezone information, we don't try to convert the datetimes and just append UTC (`+00:00`) to the end of the string.
Which merges two dictionaries recursively.
e20b283
to
94b0932
Compare
SQLAlchemy seems to figure out on its own when to flush to the database. Keeping this one flush in just to make sure that when we handle a new annotation that the previous document objects have definitely been flushed to the database and are ready to be queried.
LGTM. So many things we have coming up will be made easier by the presenter layer work you've done here. Awesome stuff! |
Postgres convergent evolution + Annotation presentation layer
It got introduced in 3845f5a. Specifically, `transform.set_group_if_reply` should not get the request, that's why the functools.partial is there.
Due to Elasticsearch's dynamic mapping creation, we can't normalise the type string of an elastic DocumentMeta. Creating an annotation would still work since it doesn't run this code, but updating an existing annotation fails when it tries do create or update the Document ES document because it uses the normalised version which then clashes with the mapping that Elasticsearch automatically created based on old data. I've moved the normalisation part into its own method (`normalized_type`) in order to still be able to use it in the migration, and thus automatically clean up our document metadata when we migrate it from Elasticsearch to Postgres. This bug was introduced when we merged #3011 and used the new presentation layer for rendering Elasticsearch annotations. Fixes #3062
I didn't anticipate this many changes, so the branch name doesn't really reflect the code changes anymore.
Notable changes:
Recreate Postgres model interface in the Elasticsearch models
To be able to use models regardless whether the data originally came from Elasticsearch or Postgres, I've re-created the same class structure in the Elasticsearch models. Which means we're having in-memory-only representations of
Document
/DocumentMeta
/DocumentURI
classes for Elasticsearch.An Elasticsearch annotation, which is base structure is a
dict
gets automatically parsed into the aforementioned objects as soon as these properties are accessed.The normalisation code that used to live in he Postgres migration script (
scripts/migrate.py
) has also moved into the Elasticsearch model, with added tests now.API Presenter layer
There is a new module
h.api.presenters
with JSON presenters for the four model classes (Annotation
,Document
,DocumentMeta
,DocumentURI
).These are built to return the same JSON structure as we do now (minus the few occasions where we normalise the data, see above).
Important: This means that regardless of feature flag, as soon as these code changes get deployed, the
/api/annotations/:id
endpoint will go through the new presentation layer.Feature flag for reading from Postgres
There is a feature flag that for reading from Postgres. This currently only has an effect on the
h.api.storage.fetch_annotation
function, all other places (e.g. searching, or loading the created annotation before returning it) will still go Elasticsearch.