-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge TaskResult
into JobResult
objects
#3164
Conversation
nautobot/extras/models/jobs.py
Outdated
content_type = models.CharField( | ||
default="application/x-nautobot-json", | ||
max_length=128, | ||
verbose_name=('Result Content Type'), | ||
help_text='Content type of the result data') | ||
content_encoding = models.CharField( | ||
default="utf-8", | ||
max_length=64, | ||
verbose_name=('Result Encoding'), | ||
help_text='The encoding used to save the task result data') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are currently required by Celery, but since we're always using JSON this is just a passthrough right now. Need to review the base DatabaseBackend
from django-celery-results
and see if we can just always use JSON and eliminate or ignore these fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want to see if I can eliminate these fields since they are 100% no longer necessary. This would be done in JobResultManager.store_result()
.
This is the current issue with encoding, specifically the args/kwargs seem to not being Kombu encoded:
|
d48efd8
to
601ed2d
Compare
JobResult fields renamed: - `job_id` => `task_id` - `created` => `date_created` - `completed` => `date_done` - `name` => `task_name` JobResult fields changed - `status` choices changed to mirror that of `TaskResult.status.choices`
- `JobResult.task_name` now mirrors `TaskResult.task_name` and is used distinctly - There's an issue where `date_done` is being nullified that needs to be addressed - Most of UI is working except for tracking success/done state refresh of results - Unit tests omega broken again because of the status enums
ripped out. - All of it is being handled by the database backend. - Customized the DB backend to manage the `date_done` field for us. - Encoding/decoding task args/kwargs is still funky. Have not figured that out yet. - Tests still broken.
- Refactored `JobResultStatusChoices` to align with Celery task state names - Squashed migrations down into a single file. - Made flake8 and black happy ^_^
ac2b4fc
to
7d95957
Compare
- `JobResult.obj_type` is now nullable, but we might want to revisit this to make it conditional around "system jobs", but this was necessary to make non-Job Celery tasks run at all without an `IntegrityError` - Fixed an encoding issue with the database backend, but still not the double-encoding one. - The `JobResultManager` now sets `name` to match `task_name` if `name` isn't set. - Fixed the table column for `JobResultTable.date_created` to be linkified in the UI. - Caught some field names I missed in the Job Result templates.
def _get_extended_properties(self, request, traceback): | ||
""" | ||
Overload default so that `argsrepr` and `kwargsrepr` aren't used to construct `args` and | ||
`kwargs`. Our need to properly (de)serialize `NautobotFakeRequest` facilitates this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get rid of NautobotFakeRequest
, will that get rid of the need to override this private method? Not saying that has to be done as part of this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. The other problem are the nullable CharField
fields we forklifted. If we want "correctness" I would rather those default to ""
vs. None
but that's actually more invasive than what I had to do here.
@@ -46,7 +46,7 @@ def refresh_datasource_content(model_name, record, request, job_result, delete=F | |||
job_result.log( | |||
f"Error while refreshing {entry.name}: {exc}", level_choice=LogLevelChoices.LOG_FAILURE | |||
) | |||
job_result.set_status(JobResultStatusChoices.STATUS_ERRORED) | |||
job_result.set_status(JobResultStatusChoices.STATUS_FAILURE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this file also need a TODO about removing the set_status
calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I meant to remove all of those. Anywhere explicit status changes are being made must go away. We want Celery to make these decisions, along with any explicit calls to JobResult.save()
, moving those to the NautobotTask
class or the backend or a mix of both, depending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jathanism Do you intend to do this in this PR or open a follow-on story?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-on as a part of the "make run_job
a thing of the past" work.
|
||
except AbortTransaction: | ||
if not job_model.read_only: | ||
job.log_info(message="Database changes have been reverted automatically.") | ||
|
||
except Exception as exc: | ||
stacktrace = traceback.format_exc() | ||
job.log_failure(message=f"An exception occurred: `{type(exc).__name__}: {exc}`\n```\n{stacktrace}\n```") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do exceptions thrown during job execution get recorded now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so I guess we may want to consider enhancing the UI so that (maybe only if DEBUG=True?) we display the .traceback
value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any traceback will result in a failure unless it's defined as a known exception type such as using autoretry_for
Also traceback is currently part of the "Additional Data" tab:
Unless you're thinking of showing it in the job log as well?
finally: | ||
if output: | ||
job.results["output"] += "\n" + str(output) | ||
output = job.post_run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try/except was added specifically for #1807 - are exceptions captured/logged differently now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We should allow all exceptions to bubble up and they are handled by Celery and then the traceback gets captured.
- Celery testing has now been updated to run tasks with `ALWAYS_EAGER=True` so that instead of requiring a single-threaded worker process using the `solo` pool strategy, we can just run them "synchronously" and have them correctly update the state of the `JobResult`. - `nautobot.core.testing.run_job_for_testing` now stores the Celery result from a job on the `JobResult` object it returns as `JobResult.celery_result` - This was necessary due to the fact that `nautobot.extras.jobs.run_job` no longer handles exceptions internally and allows them to bubble up. `run_job` is called synchronously within `run_job_for_testing`, so a need to catch exceptions was introduced as a result. - Now instead of using a custom context manager, `run_job_for_testing`, a dummy `Request` object is created (if one isn't passed) and wrappped using `copy_safe_request` - Added the `traceback` field to the "Additional Data" tab on `JobResult` detail view template - Tests for extras API and views had to be revised for new expectations in that there will ALWAYS be `JobResult` objects now and to filter them by the name of the Job model class. - Revised `nautobot.extras.tests.test_jobs.JobFileUploadTest.test_run_job_fail` to check that an exception was raised by its job run and that the exception matches the expected type. - Placed `TODO` comments across various places in `run_job()` where I had to restore calls to `job_result.save()` to preserve the functionality of how the job model is mutating `JobResult.data` as it goes. This pattern needs to be refactored for this new paradigm as we work to eliminate `run_job` entirely.
user = data.pop("user") | ||
data.pop("_cached_user", None) | ||
if "_user_pk" not in data: | ||
# We have a user but haven't stored its PK yet, so look it up and store it | ||
data["_user_pk"] = data.pop("user").pk | ||
else: | ||
# We already have stored the PK, and we need to AVOID actually resolving the lazy user reference. | ||
data.pop("user") | ||
data["_user_pk"] = user.pk | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if this should be back-ported to 1.5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be necessary, seeing as we don't have the recurring serialize/deserialize issue we have here with the database backend. The _cached_user
only becomes a part of the payload when it is re-serialized from a deserialized payload a 2nd time.
<strong>Traceback</strong> | ||
</div> | ||
<div class="panel-body"> | ||
{% include 'extras/inc/json_data.html' with data=result.traceback format="python" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we should do a check for settings.DEBUG here - similar to how we only present full tracebacks for other exceptions (500 errors) if it's True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO if there is a traceback we should show it. I was thinking that since it's stored on the object, presenting it if it exists is easy. Is there a concern about data exposure or something tangible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was my thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's valid. IMO let's leave that as a v2 UI thing. I've tracked it to add to the backlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great step forward. Not going to hold up the absence of the change/migration documentation (status value changes, created
=> date_created
) but so long as we ensure that gets added to the top of the backlog to do, the longer it sits the more conflicts/blockings we might run into.
# TODO(jathan): Remove this once this body of work is done. This is just useful for debugging but it | ||
# results int a lot of noise and slows things down. | ||
# from celery import signals | ||
# @signals.task_prerun.connect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the TODO
to remove but you disable the connect. I think if it's valuable to have this around for debugging why not just comment the whole thing out or document how to use this, instead of having to go back into the git history to restore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just uncomment it locally when I need it for now. Leaving it in for testing generates a stupendous amount of noise.
I think documenting how to use it would be valuable.
@@ -46,7 +46,7 @@ def refresh_datasource_content(model_name, record, request, job_result, delete=F | |||
job_result.log( | |||
f"Error while refreshing {entry.name}: {exc}", level_choice=LogLevelChoices.LOG_FAILURE | |||
) | |||
job_result.set_status(JobResultStatusChoices.STATUS_ERRORED) | |||
job_result.set_status(JobResultStatusChoices.STATUS_FAILURE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jathanism Do you intend to do this in this PR or open a follow-on story?
status = django_filters.MultipleChoiceFilter(choices=JobResultStatusChoices, null_value=None) | ||
|
||
class Meta: | ||
model = JobResult | ||
fields = ["id", "created", "completed", "status", "user", "obj_type", "name"] | ||
fields = ["id", "date_created", "date_done", "name", "status", "user", "obj_type"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be sure to add either as a requirement to this PR or a follow-on to create the migration docs for all the model changes here.
Probably would be a good pre-release checklist item is to review every migration in next
and ensure we have a migration guide time around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's on my list of items I'll be adding to the epic for wrapping up the database backend work in #1622
|
||
if not commit: | ||
raise AbortTransaction() | ||
raise AbortTransaction("Database changes have been reverted automatically.") | ||
|
||
except AbortTransaction: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help if we completed #2726 shortly after merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that'll just be another one for ripping out run_job
. Leave out the atomic block while we're at it.
- Had to effectively duplicate the base `store_result()` manager method instead of overloading it and calling the base in order to accomplish this, but it should be more efficient overall since it won't result in double saving of an object - Added more verbose commenting in areas of the code that were overloaded where some motives may not have been clear (case in point above). - Addressed some PR feedback for clarity in some other code commennts.
Closes: #3085
What's Changed
WARNING: This is ugly. Getting this out there to show a checkpoint.
JobResult fields renamed:
job_id
=>task_id
created
=>date_created
completed
=>date_done
job_kwargs
=>task_kwargs
JobResult fields changed
status
choices changed to mirror that ofTaskResult.status.choices
TaskResult
toJobResult
-
JobResult.task_name
now mirrorsTaskResult.task_name
and is used distinctlyobj_type
is now nullable so that system jobs likenautobot.extras.tasks.provision_field
can be run and stored in DB, but we might want to revisit this to make it conditional around "system jobs", but this was necessary to make non-Job Celery tasks run at all without anIntegrityError
JobResultManager
now setsname
to matchtask_name
ifname
isn't set.Other changes
ALWAYS_EAGER=True
so that instead of requiring a single-threaded worker process using thesolo
pool strategy, we can just run them "synchronously" and have them correctly update the state of theJobResult
.nautobot.core.testing.run_job_for_testing
now stores the Celery result from a job on theJobResult
object it returns asJobResult.celery_result
nautobot.extras.jobs.run_job
no longer handles exceptions internally and allows them to bubble up.run_job
is called synchronously withinrun_job_for_testing
, so a need to catch exceptions was introduced as a result.run_job_for_testing
, a dummyRequest
object is created (if one isn't passed) and wrappped usingcopy_safe_request
traceback
field to the "Additional Data" tab onJobResult
detail view templateJobResult
objects now and to filter them by the name of the Job model class.nautobot.extras.tests.test_jobs.JobFileUploadTest.test_run_job_fail
to check that an exception was raised by its job run and that the exception matches the expected type.TODO
comments across various places inrun_job()
where I had to restore calls tojob_result.save()
to preserve the functionality of how the job model is mutatingJobResult.data
as it goes. This pattern needs to be refactored for this new paradigm as we work to eliminaterun_job
entirely.content_type
andcontent_encoding
fields fromJobResult
. Had to effectively duplicate the baseJobResultManager.store_result()
manager method instead of overloading it and calling the base in order to accomplish this, but it should be more efficient overall since it won't result in double saving of an object.Outstanding issues
run_job
into db backend/task classesdate_done
is being nullified that needs to be addressed likely because of some race condition betweenrun_job
internals and database backend state engine (see previous bullet)TaskStateChoices
andcelery_states
need to be folded into a revampedJobResultStatusChoices
choice set, so please wait on nitpicking that.TODO