Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce likelihood of unexpected behavior by avoiding overriding of context keys #11

Open
tomage opened this issue May 14, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@tomage
Copy link
Contributor

tomage commented May 14, 2021

The problem

I recently discovered a tricky little bug in our system.

We've set things up so that every request, Celery task or manage.py invocation starts off a pghistory.context, which that particular request/task/manage.py-session will then use to add context to pghistory-tracked objects.

Unfortunately, what I found in a few places was that we were running particular code multiple times in a single request (way), in a way where we'd be overriding a key that was previously set.

Examples

A tivial (and kinda nonsensical) example:

def increase_payout_api_endpoint(request):
    for job in Job.objects.all():
        update_payout(job)

def update_payout(job):
    pghistory.context(old_payout=job.payout)
    job.payout += 10
    job.save()

This is nonsensical in the sense that pghistory is most likely going to track the old value on the payout in a JobEvent table. It's unnecessary really to use pghistory.context() to store the old_payout. But that's not the point.

The point is more that update_payout() might be using pghistory.context() in some more useful way, and might have been written at a time when it was really only called once per request/task. But then someone might have started to call it in a for-loop (for example).

This is unfortunate, since what'll happen is that the last time pghistory.context(old_payout=...) is called, that's the value that's going to end on the context for that whole request, so the same value will be attacked to all jobs.

Another example where this can happen:

def some_api_endpoint(request):
    
    play_musical_note(request.POST.get('musical_note'))
    send_a_small_note(request.POST.get('text_note'))

def play_musical_note(note):
    pghistory.context(note=f"Playing note {note}")
    play(note)  # This might change Model instances tracked by pghistory

def send_a_small_note(note):
    pghistory.context(note=f"Sending small note: {note}")
    send_note(note)  # This might change Model instances tracked by pghistory

Possibly solution

To track down the problematic cases, I monkey-patched pghistory.context with something like the following, and then ran our unit-tests:

orig_context = pghistory.context                                                
                                                                                
class strict_context(orig_context):                                             
    def __init__(self, **metadata):                                             
                                                                                
        _tracker = pghistory.tracking._tracker                                  
        if hasattr(_tracker, 'value'):                                          
            keys_already_set = set(metadata.keys()) & set(                      
                _tracker.value.metadata.keys()                                  
            )                                                                   
            if keys_already_set:                                                
                raise RuntimeError(                                             
                    f'The following keys have already been set in the '         
                    f'pghistory.context: {keys_already_set}'                    
                )                                                               
        super().__init__(**metadata)                                            
                                                                                
pghistory.context = strict_context

While pretty crude, it did break a bunch of unit-tests in my codebase. Most of them were somewhat trivial to fix, tho some were harder (eager execution of Celery tasks meant that many of those broke - I did fix that by hacking in sth that re-news the context for those.. not quite an elegant fix, but enough to get unit-tests to pass). I also needed to adjust the HistoryMiddleware class a bit to not make it re-insert the user again (somewhat trivial).

If we were to introduce something that disallows re-insertion of same key into the context, we could also gently introduce it, either by having a settings.py config to control it's default behavior, and/or by adding in a special flag on the context. I.e. the signature of the __init__() could reads sth as:

def __init__(self, allow_overrides=False, **metadata):
    ...

or

def __init__(self, allow_overrides=None, **metadata):
    allow_overrides = getattr(settings, 'PGHISTORY_ALLOW_OVERRIDES_DEFAULT', False)
    ...

Any thoughts @wesleykendall ?

@wesleykendall
Copy link
Collaborator

I like the ability to have a PGHISTORY_STRICT_CONTEXT setting that doesn't allow duplicate keys. I plan on adding this as a configurable setting in a subsequent version or at least warn when it happens. Thanks for the suggestion! Will think about the impacts of it more and won't make it the default though

@wesleykendall wesleykendall added the enhancement New feature or request label Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants