New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync PG annotations to ES #3242

Merged
merged 7 commits into from May 4, 2016

Conversation

Projects
None yet
4 participants
@chdorner
Contributor

chdorner commented Apr 22, 2016

This adds a new celery worker for syncing annotations created in Postgres to the new Elasticsearch index.

This PR depends on the merge of #3238, at which point the first commit needs to be removed (I just made sure that it doesn't raise an error, it isn't really fixing all the issues).
Update: #3238 is now merged, so this is ready to get merged as well.

This is now waiting for #3250 to be merged.
Update: #3250 is now merged.

@chdorner chdorner added the WIP label Apr 22, 2016

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Apr 22, 2016

Current coverage is 74.60%

Merging #3242 into master will decrease coverage by -0.22%

  1. 4 files (not in diff) in h were modified. more
    • Misses +8
    • Partials +3
    • Hits +17
  2. File h/app.py was modified. more
    • Misses +9
    • Partials 0
    • Hits 0
@@             master      #3242   diff @@
==========================================
  Files           125        125          
  Lines          4644       4693    +49   
  Methods           0          0          
  Messages          0          0          
  Branches        550        554     +4   
==========================================
+ Hits           3475       3501    +26   
- Misses         1089       1108    +19   
- Partials         80         84     +4   

Powered by Codecov. Last updated by a7e521f...3988171

codecov-io commented Apr 22, 2016

Current coverage is 74.60%

Merging #3242 into master will decrease coverage by -0.22%

  1. 4 files (not in diff) in h were modified. more
    • Misses +8
    • Partials +3
    • Hits +17
  2. File h/app.py was modified. more
    • Misses +9
    • Partials 0
    • Hits 0
@@             master      #3242   diff @@
==========================================
  Files           125        125          
  Lines          4644       4693    +49   
  Methods           0          0          
  Messages          0          0          
  Branches        550        554     +4   
==========================================
+ Hits           3475       3501    +26   
- Misses         1089       1108    +19   
- Partials         80         84     +4   

Powered by Codecov. Last updated by a7e521f...3988171

@chdorner chdorner removed the WIP label Apr 25, 2016

@celery.task
def add_annotation(id_):
annotation = storage.fetch_annotation(celery.request, id_)

This comment has been minimized.

@nickstenning

nickstenning Apr 25, 2016

Contributor

Ah, ok. This is a good idea, but you're going to need to be a bit careful if you do this. Because the view function is called inside a transaction, that means that you're going to be calling add_annotation.delay(id) while the transaction is still open.

While not likely in practice, that means that in theory that message could race to rabbit, get dispatched to a worker, and then do a SELECT against the database before the transaction is committed. Which would be Bad (™️).

So, while this is absolutely fine, you'll need to change either the subscriber to trigger the job in a response callback.

In fact, we should probably trigger all AnnotationEvents inside response callbacks (being sure not to do so if the response is an error), so perhaps it would be better to do it once in h.api.views. This would ensure that:

  • AnnotationEvents can't modify annotations, as they don't have access to the session
  • they only get fired once the data is successfully written to the database
  • errors in AnnotationEvent subscribers don't cause the transaction to abort (which, given we've already got code to ensure this in one case, seems like the right thing to do)
@nickstenning

nickstenning Apr 25, 2016

Contributor

Ah, ok. This is a good idea, but you're going to need to be a bit careful if you do this. Because the view function is called inside a transaction, that means that you're going to be calling add_annotation.delay(id) while the transaction is still open.

While not likely in practice, that means that in theory that message could race to rabbit, get dispatched to a worker, and then do a SELECT against the database before the transaction is committed. Which would be Bad (™️).

So, while this is absolutely fine, you'll need to change either the subscriber to trigger the job in a response callback.

In fact, we should probably trigger all AnnotationEvents inside response callbacks (being sure not to do so if the response is an error), so perhaps it would be better to do it once in h.api.views. This would ensure that:

  • AnnotationEvents can't modify annotations, as they don't have access to the session
  • they only get fired once the data is successfully written to the database
  • errors in AnnotationEvent subscribers don't cause the transaction to abort (which, given we've already got code to ensure this in one case, seems like the right thing to do)

This comment has been minimized.

@nickstenning

nickstenning Apr 25, 2016

Contributor

And yes, response callbacks are called after the transaction is committed (pyramid_tm is implemented as a tween).

@nickstenning

nickstenning Apr 25, 2016

Contributor

And yes, response callbacks are called after the transaction is committed (pyramid_tm is implemented as a tween).

This comment has been minimized.

@chdorner

chdorner Apr 26, 2016

Contributor

Will work on this in a separate PR (#3250), and marking this PR as WIP until the other one is merged.

@chdorner

chdorner Apr 26, 2016

Contributor

Will work on this in a separate PR (#3250), and marking this PR as WIP until the other one is merged.

This comment has been minimized.

@chdorner

chdorner May 4, 2016

Contributor

#3250 is now merged. I've rebased this branch and fixed a small issue due to the AnnotationEvent not containing the h.api.model.Annotation object anymore but the serialized version of an annotation. This is now ready for review again.

@chdorner

chdorner May 4, 2016

Contributor

#3250 is now merged. I've rebased this branch and fixed a small issue due to the AnnotationEvent not containing the h.api.model.Annotation object anymore but the serialized version of an annotation. This is now ready for review again.

@chdorner

This comment has been minimized.

Show comment
Hide comment
Contributor

chdorner commented Apr 28, 2016

@seanh

This comment has been minimized.

Show comment
Hide comment
@seanh

seanh Apr 29, 2016

Contributor

I've had a look over this as well, all looks good to me

Contributor

seanh commented Apr 29, 2016

I've had a look over this as well, all looks good to me

chdorner added some commits Apr 22, 2016

Add elastic.Annotation.target_uri_normalized property
To further unify the API of elastic.Annotation and
annotation.Annotation.
Remove in-flight annotation search indexing
We should remove this for now until we have a proper solution for
configuring the in-flight indexing to be on/off. In general, updating
the search index in-flight is a bad practice, it's good for local
development, but shouldn't never be on for a production deployment.

@chdorner chdorner removed the WIP label May 4, 2016

Show outdated Hide outdated h/indexer.py
@nickstenning

This comment has been minimized.

Show comment
Hide comment
@nickstenning

nickstenning May 4, 2016

Contributor

Ok, all good except we should be reindexing on update, too. I know the postgres code for update isn't done yet, but I think we should have the indexer responding to those events already otherwise we're going to forget it.

Contributor

nickstenning commented May 4, 2016

Ok, all good except we should be reindexing on update, too. I know the postgres code for update isn't done yet, but I think we should have the indexer responding to those events already otherwise we're going to forget it.

@chdorner

This comment has been minimized.

Show comment
Hide comment
@chdorner

chdorner May 4, 2016

Contributor

Just addressed the last comments and also defined the celery and indexer queues in h.celery.

Contributor

chdorner commented May 4, 2016

Just addressed the last comments and also defined the celery and indexer queues in h.celery.

@nickstenning

This comment has been minimized.

Show comment
Hide comment
@nickstenning

nickstenning May 4, 2016

Contributor

LGTM.

Contributor

nickstenning commented May 4, 2016

LGTM.

@nickstenning nickstenning merged commit 71708a0 into master May 4, 2016

4 of 6 checks passed

codecov/changes 5 files have unexpected coverage changes not visible in diff.
Details
codecov/project 74.60% (-0.22%) compared to a7e521f
Details
codecov/patch Coverage not affected when comparing a7e521f...77bdd40
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
hound No violations found. Woof!

@nickstenning nickstenning deleted the sync-pg-annotations-to-es branch May 4, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment