You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently discovered a tricky little bug in our system.
We've set things up so that every request, Celery task or manage.py invocation starts off a pghistory.context, which that particular request/task/manage.py-session will then use to add context to pghistory-tracked objects.
Unfortunately, what I found in a few places was that we were running particular code multiple times in a single request (way), in a way where we'd be overriding a key that was previously set.
This is nonsensical in the sense that pghistory is most likely going to track the old value on the payout in a JobEvent table. It's unnecessary really to use pghistory.context() to store the old_payout. But that's not the point.
The point is more that update_payout() might be using pghistory.context() in some more useful way, and might have been written at a time when it was really only called once per request/task. But then someone might have started to call it in a for-loop (for example).
This is unfortunate, since what'll happen is that the last time pghistory.context(old_payout=...) is called, that's the value that's going to end on the context for that whole request, so the same value will be attacked to all jobs.
Another example where this can happen:
defsome_api_endpoint(request):
play_musical_note(request.POST.get('musical_note'))
send_a_small_note(request.POST.get('text_note'))
defplay_musical_note(note):
pghistory.context(note=f"Playing note {note}")
play(note) # This might change Model instances tracked by pghistorydefsend_a_small_note(note):
pghistory.context(note=f"Sending small note: {note}")
send_note(note) # This might change Model instances tracked by pghistory
Possibly solution
To track down the problematic cases, I monkey-patched pghistory.context with something like the following, and then ran our unit-tests:
orig_context=pghistory.contextclassstrict_context(orig_context):
def__init__(self, **metadata):
_tracker=pghistory.tracking._trackerifhasattr(_tracker, 'value'):
keys_already_set=set(metadata.keys()) &set(
_tracker.value.metadata.keys()
)
ifkeys_already_set:
raiseRuntimeError(
f'The following keys have already been set in the 'f'pghistory.context: {keys_already_set}'
)
super().__init__(**metadata)
pghistory.context=strict_context
While pretty crude, it did break a bunch of unit-tests in my codebase. Most of them were somewhat trivial to fix, tho some were harder (eager execution of Celery tasks meant that many of those broke - I did fix that by hacking in sth that re-news the context for those.. not quite an elegant fix, but enough to get unit-tests to pass). I also needed to adjust the HistoryMiddleware class a bit to not make it re-insert the user again (somewhat trivial).
If we were to introduce something that disallows re-insertion of same key into the context, we could also gently introduce it, either by having a settings.py config to control it's default behavior, and/or by adding in a special flag on the context. I.e. the signature of the __init__() could reads sth as:
I like the ability to have a PGHISTORY_STRICT_CONTEXT setting that doesn't allow duplicate keys. I plan on adding this as a configurable setting in a subsequent version or at least warn when it happens. Thanks for the suggestion! Will think about the impacts of it more and won't make it the default though
The problem
I recently discovered a tricky little bug in our system.
We've set things up so that every request, Celery task or manage.py invocation starts off a
pghistory.context
, which that particular request/task/manage.py-session will then use to add context to pghistory-tracked objects.Unfortunately, what I found in a few places was that we were running particular code multiple times in a single request (way), in a way where we'd be overriding a key that was previously set.
Examples
A tivial (and kinda nonsensical) example:
This is nonsensical in the sense that pghistory is most likely going to track the old value on the payout in a
JobEvent
table. It's unnecessary really to usepghistory.context()
to store theold_payout
. But that's not the point.The point is more that
update_payout()
might be usingpghistory.context()
in some more useful way, and might have been written at a time when it was really only called once per request/task. But then someone might have started to call it in a for-loop (for example).This is unfortunate, since what'll happen is that the last time
pghistory.context(old_payout=...)
is called, that's the value that's going to end on the context for that whole request, so the same value will be attacked to all jobs.Another example where this can happen:
Possibly solution
To track down the problematic cases, I monkey-patched
pghistory.context
with something like the following, and then ran our unit-tests:While pretty crude, it did break a bunch of unit-tests in my codebase. Most of them were somewhat trivial to fix, tho some were harder (eager execution of Celery tasks meant that many of those broke - I did fix that by hacking in sth that re-news the context for those.. not quite an elegant fix, but enough to get unit-tests to pass). I also needed to adjust the
HistoryMiddleware
class a bit to not make it re-insert theuser
again (somewhat trivial).If we were to introduce something that disallows re-insertion of same key into the context, we could also gently introduce it, either by having a
settings.py
config to control it's default behavior, and/or by adding in a special flag on the context. I.e. the signature of the__init__()
could reads sth as:or
Any thoughts @wesleykendall ?
The text was updated successfully, but these errors were encountered: