New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for uncommitted DB changes when requests finish #4728
Conversation
Marked as WIP because I want to do some more local testing of this before merging. |
Codecov Report
@@ Coverage Diff @@
## master #4728 +/- ##
=========================================
- Coverage 96.51% 96.5% -0.01%
=========================================
Files 373 373
Lines 20576 20493 -83
Branches 1166 1160 -6
=========================================
- Hits 19858 19777 -81
+ Misses 608 607 -1
+ Partials 110 109 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell from looking through the internals of SQLAlchemy’s ORM, I’m reasonably confident this will do what we want it to do.
One thing I’m wondering: is it worth feature-flagging this change, so that we can easily flip back to the old dirty
behaviour if there turns out to be some obscure problem, or if it uses up our Sentry quota even faster than before?
I’ve also got a few comments and questions, mostly about how the tests work and ways to make it more comprehensible to future developers (or future versions of existing developers).
h/util/session_tracker.py
Outdated
This is intended to be useful in eg. log output. | ||
""" | ||
primary_key = inspect(db_obj).identity | ||
return (type(db_obj), primary_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLAlchemy also provides the identity_key
property, which seems like it could simplify this code while returning the same value. Would that work?
tests/h/util/session_tracker_test.py
Outdated
def expected_changes(self, changed_ann_id, deleted_ann_id): | ||
added_entry = (Annotation, None, ObjectState.ADDED) | ||
|
||
# FIXME - Figure out why the PK is a tuple rather than a primitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I had to guess, I’d say it’s to let SQLAlchemy support composite primary keys without needing to have separate types for the single-column and the multi-column cases.
h/util/session_tracker.py
Outdated
Return a list of changes to the `Session` which have not yet been | ||
_committed_ to the DB. | ||
|
||
The result is a list of (model class, primary key, ObjectState) tuples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that these values are getting passed around together suggests to me that it might be worth making that data type explicit as, say, a namedtuple
, rather than remembering which fields they have and in what order.
from h.util.session_tracker import Tracker, ObjectState | ||
|
||
|
||
def generate_ann_id(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the reason here for explicitly generating the annotation IDs, rather than letting the ORM handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reusing the same setup and expectations for each test, and this makes it easier to get the IDs in the expected_changes
fixture.
To recap, at the end of a request, a DB session could be in four possible states:
- Changes not flushed or committed (indicates a bug)
- Changes flushed but not committed (indicates a bug)
- Changes committed
- Changes rolled back
For each of those four states, there are 3 types of possible change to consider: Added, changed and deleted entries relative to the previous committed state. To avoid having to write 12 separate tests I've reused the same setup and expectations for each.
tests/h/util/session_tracker_test.py
Outdated
return generate_ann_id() | ||
|
||
@pytest.fixture | ||
def session(self, db_session, changed_ann_id, deleted_ann_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m finding the use of the single mono-fixture tricky to follow: it makes it more difficult for me to see the intent of any individual test. Do you think there would be a way to break out individual test fixtures so that they’re a little more self-contained, or would that result in an unreasonable amount of duplication?
h/util/session_tracker.py
Outdated
|
||
return changes | ||
|
||
def _after_rollback(self, session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be worth underscoring out the session
arguments here and in the method below, to make it clearer that they’re not used in the methods themselves? You could even go a step further and *args
them, to make it even more apparent that you don’t care what’s being passed into these handlers.
Feature flagging does sound like a good idea, but our existing feature flagging mechanism relies on DB accesses to look up flags, which seems unwise in the context of the "finished" callback. Either we would need to preload the flag state when creating the session or perhaps use some other flagging mechanism, eg. a setting key from an env var - that will mean an EB env configuration update to change the flag. Thoughts? |
Heisenberg and Schrödinger would probably agree.
Just ship the probe? We really need to find out what's going on. Retracting it won't be hard. |
Hmm. Good point. I think we’ve got two good options here:
I don’t have a particularly strong opinion on this, so I’d say it’s your call, @robertknight. |
h/util/session_tracker.py
Outdated
Return a list of changes to the `Session` which have not yet been | ||
_committed_ to the DB. | ||
|
||
The result is a list of (model class, primary key, ObjectState) tuples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in thinking this comment is now out of date, since 855daa8 changed the return type to an (identifier, state)
tuple?
(Tangential note: I see in the Pyramid docs that the identity_key
tuple gets an extra element in version 1.2, so let’s make sure we don’t end up relying on the structure of the key if we can avoid it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed. The result of identity_key
is treated as an opaque value here and in the tests so I think we should be OK wrt. changes in future SQLAlchemy releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I’m happy with the code here, particularly given we’ve got a switch to be able to turn it off if need be.
I think all this now needs before it gets merged is a thorough rebasing into a more consistent set of commits, for the sake of future Hypothesis folk. Then we can get it merged and deployed.
Nice work, @robertknight. 👍 🍰
Following the discussion in https://groups.google.com/forum/#!topic/sqlalchemy/eGxpQBChXQw implement a utility class which can be used to check whether any changes have been made in a `Session` but not yet committed to the DB. This is part of an investigation into #4704
…hanges The previous `session.dirty` check only tested for _changes_ to existing DB objects which have not been flushed. It did not check for added or deleted model objects which had not yet been flushed to the DB, or for changes which have been flushed inside a transaction but not yet _committed_. When debugging #4704 we want to specifically catch additions (of annotations) to the DB which have been flushed (by `storage.create_annotation`) but not yet committed.
…ends In case the DB session checks cause unexpected problems in production, make it possible to disable them via an EB configuration update by setting `DB_SESSION_CHECKS=False`.
34962a5
to
a611c6a
Compare
Extend the logging added in 844a60d to check whether a request ends with uncommitted DB changes to also check for:
session.new
andsession.deleted
but notsession.dirty
) which have not been flushed or committed.The purpose of this is to check whether the missing DB annotations reported in #4704 are missing because the DB session was not committed or missing because the DB session was rolled back. I strongly suspect the former, but want to rule out the latter.
In addition to logging via Sentry, the issue is logged via our normal logging so that we can see details of events which have been filtered out via Sentry's rate limiting.
Testing
The easiest way to test this is to comment out the pyramid_tm include in app.py which will have the effect of preventing sessions from being committed at the end of a request or rolled back on error. If you then create annotations or other entities you'll see log entries output at the end of the request with details of the uncommitted changes. If you re-enabled that line, the log messages should stop appearing.