Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for uncommitted DB changes when requests finish #4728

Merged
merged 3 commits into from Jan 8, 2018

Conversation

robertknight
Copy link
Member

@robertknight robertknight commented Jan 3, 2018

Extend the logging added in 844a60d to check whether a request ends with uncommitted DB changes to also check for:

  • Additions and removals of objects (present in session.new and session.deleted but not session.dirty) which have not been flushed or committed.
  • Changes which have been flushed to the DB but not yet committed

The purpose of this is to check whether the missing DB annotations reported in #4704 are missing because the DB session was not committed or missing because the DB session was rolled back. I strongly suspect the former, but want to rule out the latter.

In addition to logging via Sentry, the issue is logged via our normal logging so that we can see details of events which have been filtered out via Sentry's rate limiting.


Testing

The easiest way to test this is to comment out the pyramid_tm include in app.py which will have the effect of preventing sessions from being committed at the end of a request or rolled back on error. If you then create annotations or other entities you'll see log entries output at the end of the request with details of the uncommitted changes. If you re-enabled that line, the log messages should stop appearing.

@robertknight robertknight changed the title Improve dirty session tracking Check for uncommitted DB changes when requests finish Jan 3, 2018
@robertknight
Copy link
Member Author

Marked as WIP because I want to do some more local testing of this before merging.

@codecov
Copy link

codecov bot commented Jan 3, 2018

Codecov Report

Merging #4728 into master will decrease coverage by <.01%.
The diff coverage is 91.34%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #4728      +/-   ##
=========================================
- Coverage   96.51%   96.5%   -0.01%     
=========================================
  Files         373     373              
  Lines       20576   20493      -83     
  Branches     1166    1160       -6     
=========================================
- Hits        19858   19777      -81     
+ Misses        608     607       -1     
+ Partials      110     109       -1
Impacted Files Coverage Δ
h/config.py 64.7% <ø> (+5.88%) ⬆️
h/db/__init__.py 55.93% <10%> (-10.31%) ⬇️
h/util/session_tracker.py 100% <100%> (ø) ⬆️
tests/h/util/session_tracker_test.py 100% <100%> (ø) ⬆️
tests/h/config_test.py 100% <100%> (ø) ⬆️
h/emails/reply_notification.py 100% <0%> (ø) ⬆️
tests/h/emails/reply_notification_test.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9662122...a611c6a. Read the comment docs.

Copy link
Contributor

@dhwthompson dhwthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell from looking through the internals of SQLAlchemy’s ORM, I’m reasonably confident this will do what we want it to do.

One thing I’m wondering: is it worth feature-flagging this change, so that we can easily flip back to the old dirty behaviour if there turns out to be some obscure problem, or if it uses up our Sentry quota even faster than before?

I’ve also got a few comments and questions, mostly about how the tests work and ways to make it more comprehensible to future developers (or future versions of existing developers).

This is intended to be useful in eg. log output.
"""
primary_key = inspect(db_obj).identity
return (type(db_obj), primary_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLAlchemy also provides the identity_key property, which seems like it could simplify this code while returning the same value. Would that work?

def expected_changes(self, changed_ann_id, deleted_ann_id):
added_entry = (Annotation, None, ObjectState.ADDED)

# FIXME - Figure out why the PK is a tuple rather than a primitive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had to guess, I’d say it’s to let SQLAlchemy support composite primary keys without needing to have separate types for the single-column and the multi-column cases.

Return a list of changes to the `Session` which have not yet been
_committed_ to the DB.

The result is a list of (model class, primary key, ObjectState) tuples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that these values are getting passed around together suggests to me that it might be worth making that data type explicit as, say, a namedtuple, rather than remembering which fields they have and in what order.

from h.util.session_tracker import Tracker, ObjectState


def generate_ann_id():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the reason here for explicitly generating the annotation IDs, rather than letting the ORM handle it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reusing the same setup and expectations for each test, and this makes it easier to get the IDs in the expected_changes fixture.

To recap, at the end of a request, a DB session could be in four possible states:

  • Changes not flushed or committed (indicates a bug)
  • Changes flushed but not committed (indicates a bug)
  • Changes committed
  • Changes rolled back

For each of those four states, there are 3 types of possible change to consider: Added, changed and deleted entries relative to the previous committed state. To avoid having to write 12 separate tests I've reused the same setup and expectations for each.

return generate_ann_id()

@pytest.fixture
def session(self, db_session, changed_ann_id, deleted_ann_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m finding the use of the single mono-fixture tricky to follow: it makes it more difficult for me to see the intent of any individual test. Do you think there would be a way to break out individual test fixtures so that they’re a little more self-contained, or would that result in an unreasonable amount of duplication?


return changes

def _after_rollback(self, session):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be worth underscoring out the session arguments here and in the method below, to make it clearer that they’re not used in the methods themselves? You could even go a step further and *args them, to make it even more apparent that you don’t care what’s being passed into these handlers.

@robertknight
Copy link
Member Author

One thing I’m wondering: is it worth feature-flagging this change, so that we can easily flip back to the old dirty behaviour if there turns out to be some obscure problem, or if it uses up our Sentry quota even faster than before?

Feature flagging does sound like a good idea, but our existing feature flagging mechanism relies on DB accesses to look up flags, which seems unwise in the context of the "finished" callback. Either we would need to preload the flag state when creating the session or perhaps use some other flagging mechanism, eg. a setting key from an env var - that will mean an EB env configuration update to change the flag. Thoughts?

@judell
Copy link
Contributor

judell commented Jan 5, 2018

our existing feature flagging mechanism relies on DB accesses to look up flags, which seems unwise in the context of the "finished" callback

Heisenberg and Schrödinger would probably agree.

Thoughts?

Just ship the probe? We really need to find out what's going on. Retracting it won't be hard.

@dhwthompson
Copy link
Contributor

Either we would need to preload the flag state when creating the session or perhaps use some other flagging mechanism, eg. a setting key from an env var - that will mean an EB env configuration update to change the flag. Thoughts?

Hmm. Good point. I think we’ve got two good options here:

  • Use a feature flag at the beginning of the request to decide whether to go with the old-style dirty tracking or the new-style Tracker, using some kind of common abstraction for the shared functionality. I think by loading in the feature flag at the very beginning to decide which strategy we’re going to use, we can avoid getting into any particularly horrendous tangles. I can see how that abstraction would take some amount of work to figure out, though.
  • As @judell says, we could decide that’s not worth the extra effort, ship it, and trust in our ability to roll the change back if there’s a problem with it.

I don’t have a particularly strong opinion on this, so I’d say it’s your call, @robertknight.

@robertknight robertknight removed the WIP label Jan 5, 2018
Return a list of changes to the `Session` which have not yet been
_committed_ to the DB.

The result is a list of (model class, primary key, ObjectState) tuples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right in thinking this comment is now out of date, since 855daa8 changed the return type to an (identifier, state) tuple?

(Tangential note: I see in the Pyramid docs that the identity_key tuple gets an extra element in version 1.2, so let’s make sure we don’t end up relying on the structure of the key if we can avoid it.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed. The result of identity_key is treated as an opaque value here and in the tests so I think we should be OK wrt. changes in future SQLAlchemy releases.

Copy link
Contributor

@dhwthompson dhwthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I’m happy with the code here, particularly given we’ve got a switch to be able to turn it off if need be.

I think all this now needs before it gets merged is a thorough rebasing into a more consistent set of commits, for the sake of future Hypothesis folk. Then we can get it merged and deployed.

Nice work, @robertknight. 👍 🍰

Following the discussion in https://groups.google.com/forum/#!topic/sqlalchemy/eGxpQBChXQw
implement a utility class which can be used to check whether any changes
have been made in a `Session` but not yet committed to the DB.

This is part of an investigation into #4704
…hanges

The previous `session.dirty` check only tested for _changes_ to existing
DB objects which have not been flushed. It did not check for added or
deleted model objects which had not yet been flushed to the DB, or for
changes which have been flushed inside a transaction but not yet _committed_.

When debugging #4704 we want to
specifically catch additions (of annotations) to the DB which have been
flushed (by `storage.create_annotation`) but not yet committed.
…ends

In case the DB session checks cause unexpected problems in production,
make it possible to disable them via an EB configuration update by
setting `DB_SESSION_CHECKS=False`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants