Check for uncommitted DB changes when requests finish #4728

robertknight · 2018-01-03T16:42:36Z

Extend the logging added in 844a60d to check whether a request ends with uncommitted DB changes to also check for:

Additions and removals of objects (present in session.new and session.deleted but not session.dirty) which have not been flushed or committed.
Changes which have been flushed to the DB but not yet committed

The purpose of this is to check whether the missing DB annotations reported in #4704 are missing because the DB session was not committed or missing because the DB session was rolled back. I strongly suspect the former, but want to rule out the latter.

In addition to logging via Sentry, the issue is logged via our normal logging so that we can see details of events which have been filtered out via Sentry's rate limiting.

Testing

The easiest way to test this is to comment out the pyramid_tm include in app.py which will have the effect of preventing sessions from being committed at the end of a request or rolled back on error. If you then create annotations or other entities you'll see log entries output at the end of the request with details of the uncommitted changes. If you re-enabled that line, the log messages should stop appearing.

robertknight · 2018-01-03T16:43:26Z

Marked as WIP because I want to do some more local testing of this before merging.

codecov · 2018-01-03T16:52:41Z

Codecov Report

Merging #4728 into master will decrease coverage by <.01%.
The diff coverage is 91.34%.

@@            Coverage Diff            @@
##           master   #4728      +/-   ##
=========================================
- Coverage   96.51%   96.5%   -0.01%     
=========================================
  Files         373     373              
  Lines       20576   20493      -83     
  Branches     1166    1160       -6     
=========================================
- Hits        19858   19777      -81     
+ Misses        608     607       -1     
+ Partials      110     109       -1

Impacted Files	Coverage Δ
h/config.py	`64.7% <ø> (+5.88%)`	⬆️
h/db/__init__.py	`55.93% <10%> (-10.31%)`	⬇️
h/util/session_tracker.py	`100% <100%> (ø)`	⬆️
tests/h/util/session_tracker_test.py	`100% <100%> (ø)`	⬆️
tests/h/config_test.py	`100% <100%> (ø)`	⬆️
h/emails/reply_notification.py	`100% <0%> (ø)`	⬆️
tests/h/emails/reply_notification_test.py	`100% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9662122...a611c6a. Read the comment docs.

dhwthompson

As far as I can tell from looking through the internals of SQLAlchemy’s ORM, I’m reasonably confident this will do what we want it to do.

One thing I’m wondering: is it worth feature-flagging this change, so that we can easily flip back to the old dirty behaviour if there turns out to be some obscure problem, or if it uses up our Sentry quota even faster than before?

I’ve also got a few comments and questions, mostly about how the tests work and ways to make it more comprehensible to future developers (or future versions of existing developers).

dhwthompson · 2018-01-04T11:54:45Z

h/util/session_tracker.py

+    This is intended to be useful in eg. log output.
+    """
+    primary_key = inspect(db_obj).identity
+    return (type(db_obj), primary_key)


SQLAlchemy also provides the identity_key property, which seems like it could simplify this code while returning the same value. Would that work?

dhwthompson · 2018-01-04T12:23:44Z

tests/h/util/session_tracker_test.py

+    def expected_changes(self, changed_ann_id, deleted_ann_id):
+        added_entry = (Annotation, None, ObjectState.ADDED)
+
+        # FIXME - Figure out why the PK is a tuple rather than a primitive


If I had to guess, I’d say it’s to let SQLAlchemy support composite primary keys without needing to have separate types for the single-column and the multi-column cases.

dhwthompson · 2018-01-04T12:35:24Z

h/util/session_tracker.py

+        Return a list of changes to the `Session` which have not yet been
+        _committed_ to the DB.
+
+        The result is a list of (model class, primary key, ObjectState) tuples.


The fact that these values are getting passed around together suggests to me that it might be worth making that data type explicit as, say, a namedtuple, rather than remembering which fields they have and in what order.

dhwthompson · 2018-01-04T12:37:22Z

tests/h/util/session_tracker_test.py

+from h.util.session_tracker import Tracker, ObjectState
+
+
+def generate_ann_id():


What’s the reason here for explicitly generating the annotation IDs, rather than letting the ORM handle it?

I'm reusing the same setup and expectations for each test, and this makes it easier to get the IDs in the expected_changes fixture.

To recap, at the end of a request, a DB session could be in four possible states:

Changes not flushed or committed (indicates a bug)

Changes flushed but not committed (indicates a bug)

Changes committed

Changes rolled back

For each of those four states, there are 3 types of possible change to consider: Added, changed and deleted entries relative to the previous committed state. To avoid having to write 12 separate tests I've reused the same setup and expectations for each.

dhwthompson · 2018-01-04T13:05:29Z

tests/h/util/session_tracker_test.py

+        return generate_ann_id()
+
+    @pytest.fixture
+    def session(self, db_session, changed_ann_id, deleted_ann_id):


I’m finding the use of the single mono-fixture tricky to follow: it makes it more difficult for me to see the intent of any individual test. Do you think there would be a way to break out individual test fixtures so that they’re a little more self-contained, or would that result in an unreasonable amount of duplication?

dhwthompson · 2018-01-04T13:12:38Z

h/util/session_tracker.py

+
+        return changes
+
+    def _after_rollback(self, session):


Might it be worth underscoring out the session arguments here and in the method below, to make it clearer that they’re not used in the methods themselves? You could even go a step further and *args them, to make it even more apparent that you don’t care what’s being passed into these handlers.

robertknight · 2018-01-04T13:25:13Z

One thing I’m wondering: is it worth feature-flagging this change, so that we can easily flip back to the old dirty behaviour if there turns out to be some obscure problem, or if it uses up our Sentry quota even faster than before?

Feature flagging does sound like a good idea, but our existing feature flagging mechanism relies on DB accesses to look up flags, which seems unwise in the context of the "finished" callback. Either we would need to preload the flag state when creating the session or perhaps use some other flagging mechanism, eg. a setting key from an env var - that will mean an EB env configuration update to change the flag. Thoughts?

judell · 2018-01-05T06:06:28Z

our existing feature flagging mechanism relies on DB accesses to look up flags, which seems unwise in the context of the "finished" callback

Heisenberg and Schrödinger would probably agree.

Thoughts?

Just ship the probe? We really need to find out what's going on. Retracting it won't be hard.

dhwthompson · 2018-01-05T12:03:22Z

Either we would need to preload the flag state when creating the session or perhaps use some other flagging mechanism, eg. a setting key from an env var - that will mean an EB env configuration update to change the flag. Thoughts?

Hmm. Good point. I think we’ve got two good options here:

Use a feature flag at the beginning of the request to decide whether to go with the old-style dirty tracking or the new-style Tracker, using some kind of common abstraction for the shared functionality. I think by loading in the feature flag at the very beginning to decide which strategy we’re going to use, we can avoid getting into any particularly horrendous tangles. I can see how that abstraction would take some amount of work to figure out, though.
As @judell says, we could decide that’s not worth the extra effort, ship it, and trust in our ability to roll the change back if there’s a problem with it.

I don’t have a particularly strong opinion on this, so I’d say it’s your call, @robertknight.

dhwthompson · 2018-01-05T14:53:43Z

h/util/session_tracker.py

+        Return a list of changes to the `Session` which have not yet been
+        _committed_ to the DB.
+
+        The result is a list of (model class, primary key, ObjectState) tuples.


Am I right in thinking this comment is now out of date, since 855daa8 changed the return type to an (identifier, state) tuple?

(Tangential note: I see in the Pyramid docs that the identity_key tuple gets an extra element in version 1.2, so let’s make sure we don’t end up relying on the structure of the key if we can avoid it.)

Yes indeed. The result of identity_key is treated as an opaque value here and in the tests so I think we should be OK wrt. changes in future SQLAlchemy releases.

dhwthompson

Cool. I’m happy with the code here, particularly given we’ve got a switch to be able to turn it off if need be.

I think all this now needs before it gets merged is a thorough rebasing into a more consistent set of commits, for the sake of future Hypothesis folk. Then we can get it merged and deployed.

Nice work, @robertknight. 👍 🍰

Following the discussion in https://groups.google.com/forum/#!topic/sqlalchemy/eGxpQBChXQw implement a utility class which can be used to check whether any changes have been made in a `Session` but not yet committed to the DB. This is part of an investigation into #4704

…hanges The previous `session.dirty` check only tested for _changes_ to existing DB objects which have not been flushed. It did not check for added or deleted model objects which had not yet been flushed to the DB, or for changes which have been flushed inside a transaction but not yet _committed_. When debugging #4704 we want to specifically catch additions (of annotations) to the DB which have been flushed (by `storage.create_annotation`) but not yet committed.

…ends In case the DB session checks cause unexpected problems in production, make it possible to disable them via an EB configuration update by setting `DB_SESSION_CHECKS=False`.

robertknight changed the title ~~Improve dirty session tracking~~ Check for uncommitted DB changes when requests finish Jan 3, 2018

robertknight added the WIP label Jan 3, 2018

dhwthompson reviewed Jan 4, 2018

View reviewed changes

robertknight removed the WIP label Jan 5, 2018

dhwthompson reviewed Jan 5, 2018

View reviewed changes

robertknight added 3 commits January 8, 2018 10:48

Add app-level feature flag to disable DB session checks when request …

a611c6a

…ends In case the DB session checks cause unexpected problems in production, make it possible to disable them via an EB configuration update by setting `DB_SESSION_CHECKS=False`.

robertknight force-pushed the improve-dirty-session-tracking branch from 34962a5 to a611c6a Compare January 8, 2018 10:48

dhwthompson approved these changes Jan 8, 2018

View reviewed changes

robertknight merged commit 7366614 into master Jan 8, 2018

robertknight deleted the improve-dirty-session-tracking branch January 8, 2018 11:56

robertknight mentioned this pull request Jan 9, 2018

SPIKE: Research Annotations not created in DB despite "successful" POST /api/annotations request #4704

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for uncommitted DB changes when requests finish #4728

Check for uncommitted DB changes when requests finish #4728

robertknight commented Jan 3, 2018 •

edited

robertknight commented Jan 3, 2018

codecov bot commented Jan 3, 2018 •

edited

dhwthompson left a comment

dhwthompson Jan 4, 2018

dhwthompson Jan 4, 2018

dhwthompson Jan 4, 2018

dhwthompson Jan 4, 2018

robertknight Jan 5, 2018

dhwthompson Jan 4, 2018

dhwthompson Jan 4, 2018

robertknight commented Jan 4, 2018

judell commented Jan 5, 2018 •

edited

dhwthompson commented Jan 5, 2018

dhwthompson Jan 5, 2018

robertknight Jan 5, 2018

dhwthompson left a comment

		from h.util.session_tracker import Tracker, ObjectState


		def generate_ann_id():

Check for uncommitted DB changes when requests finish #4728

Check for uncommitted DB changes when requests finish #4728

Conversation

robertknight commented Jan 3, 2018 • edited

robertknight commented Jan 3, 2018

codecov bot commented Jan 3, 2018 • edited

Codecov Report

dhwthompson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertknight commented Jan 4, 2018

judell commented Jan 5, 2018 • edited

dhwthompson commented Jan 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhwthompson left a comment

Choose a reason for hiding this comment

robertknight commented Jan 3, 2018 •

edited

codecov bot commented Jan 3, 2018 •

edited

judell commented Jan 5, 2018 •

edited