Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save args, kwargs with JSON endcoding #58

Open
fredley opened this issue Jan 31, 2018 · 8 comments
Open

Save args, kwargs with JSON endcoding #58

fredley opened this issue Jan 31, 2018 · 8 comments

Comments

@fredley
Copy link

fredley commented Jan 31, 2018

Currently, calling a task with arguments and kwarguments results in something like this:

>>> task.args
[True]
>>> task.kwargs
{'arg1': 'some_string', 'arg2': False}

This is a bit of a pain when parsing this information, in particular to send it to a Javascript frontend, since it's almost JSON but not quite.

Could there be a setting to enable saving this information as properly encoded JSON? e.g.

>>> task.args
[true]
>>> task.kwargs
{"arg1": "some_string", "arg2": false}
@fredley fredley changed the title Save args, kwargs as json encoded Save args, kwargs with JSON endcoding Jan 31, 2018
@ShaheedHaque
Copy link

I just came across this, and I think args, kwargs and result should all be saved as JSON-encoded strings. I accept that some data cannot be properly handled, for those just stringify anything that causes json.JSONEncoder.default() to raise a TypeError.

I'd be happy to submit a PR for this if deemed acceptable.

@fredley
Copy link
Author

fredley commented Feb 14, 2018

@ShaheedHaque That sounds fantastic to me if you were happy to do that. I'm not a project maintainer though, so would be good to hear from @jezdez if this was something he would merge. It might be good to add a flag so that anyone's existing parsing of these values is not borken.

@ShaheedHaque
Copy link

Indeed (and actually, I guess there is a conversation to be had about results, since AFAIK, that is allowed to be a non-JSON value, e.g. a bare int).

@jezdez
Copy link
Member

jezdez commented Feb 14, 2018

Task arguments can be any valid Python type and will only be serialized with the configured task serializer when sent between Celery clients and workers.

The goal of this package is to use the task and worker event state to conduct monitoring, which in turn provides its values verbatim without serialization -- by design. So in other words we're erring on the side of correctness instead of convenience. There is also an additional operational risk of converting the arguments to JSON during storing that could lead to monitoring race conditions if for example the conversion to JSON fails and prevents updating the task state in the database.

There are a few options to get what you want nevertheless (with the caveat that you'd be on your own):

  • subclass django_celery_monitor.camera.Camera, override the update_task method (and calling the parent update_task method first to continue the usual functionality) and store the arguments in JSON (or whatever form is convenient for you) in a separate datastorage (e.g. a separate data model)
  • we add a Django signal to this package (e.g. celery_task_monitored) so you can do option 1 without subclassing, the rest stays the same
  • post-process task state updates using Django's post_save signal and convert the arguments to the format you require, and store it in a separate table

@ShaheedHaque
Copy link

Thanks for the quick response. Is there a way to know, for a given event, what serializer was used? I don't see a content_type field in the model, for example?

@ShaheedHaque
Copy link

ShaheedHaque commented Feb 14, 2018

@jezdez I just added this debug into camera.py:

 @@ -85,6 +85,7 @@
                  (task.worker.hostname, task.worker),
              )
  
 +        logger.warning('type(task.kwargs)={}: {}'.format(type(task.kwargs), task.kwargs))
          defaults = {
              'name': task.name,
              'args': task.args,

And the resulting debug indicates that kwargs has, IIUC, already been coerced into a string even before being written to the TextField in the database:

2018-02-14 20:05:12,823 [WARNING] django_celery_monitor.camera: type(task.kwargs)=<class 'str'>: {'client': 8, 'company': 3, 'frequency': 'w1', 'next_T': '2018-10-07'}
2018-02-14 20:05:14,850 [WARNING] django_celery_monitor.camera: type(task.kwargs)=<class 'NoneType'>: None
2018-02-14 20:05:14,854 [WARNING] django_celery_monitor.camera: type(task.kwargs)=<class 'str'>: {'client': 8, 'company': 3, 'frequency': 'w1', 'next_T': '2018-10-07'}
2018-02-14 20:05:14,881 [WARNING] django_celery_monitor.camera: type(task.kwargs)=<class 'NoneType'>: None

Given that kwargs definitely started life as a dict, and what you confirmed about the intent being to use a loss-less on the wire format, this suggests that some unexpected string coercion is going on, right?

Also, given that the value is being stored in a database TextField, are we certain that the stored value would not be reduced to a string by virtue of being stored like this?

@fredley
Copy link
Author

fredley commented Feb 15, 2018

I've come across this before I think. The camera receives args and kwargs already coerced into a string, so your options are to parse JSON-ish django string repr to actual JSON (what I'm doing at the moment), or change celery presumably quite fundamentally somewhere else so that the values arrive in the camera as JSON-encoded strings to begin with.

@ShaheedHaque
Copy link

Yes, that's what I've concluded/done too. Maybe close the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants