Merge `TaskResult` into `JobResult` objects #3164

jathanism · 2023-01-25T18:39:10Z

Closes: #3085

What's Changed

WARNING: This is ugly. Getting this out there to show a checkpoint.

JobResult fields renamed:

job_id => task_id
created => date_created
completed => date_done
job_kwargs => task_kwargs

JobResult fields changed

status choices changed to mirror that of TaskResult.status.choices
Added all the missing fields found on TaskResult to JobResult
-JobResult.task_name now mirrors TaskResult.task_name and is used distinctly
obj_type is now nullable so that system jobs like nautobot.extras.tasks.provision_field can be run and stored in DB, but we might want to revisit this to make it conditional around "system jobs", but this was necessary to make non-Job Celery tasks run at all without an IntegrityError
The JobResultManager now sets name to match task_name if name isn't set.
Caught some field names I missed in the Job Result templates.

Other changes

Celery testing has now been updated to run tasks with ALWAYS_EAGER=True so that instead of requiring a single-threaded worker process using the solo pool strategy, we can just run them "synchronously" and have them correctly update the state of the JobResult.
nautobot.core.testing.run_job_for_testing now stores the Celery result from a job on the JobResult object it returns as JobResult.celery_result
This was necessary due to the fact that nautobot.extras.jobs.run_job no longer handles exceptions internally and allows them to bubble up. run_job is called synchronously within run_job_for_testing, so a need to catch exceptions was introduced as a result.
Now instead of using a custom context manager, run_job_for_testing, a dummy Request object is created (if one isn't passed) and wrappped using copy_safe_request
Added the traceback field to the "Additional Data" tab on JobResult detail view template
Tests for extras API and views had to be revised for new expectations in that there will ALWAYS be JobResult objects now and to filter them by the name of the Job model class.
Revised nautobot.extras.tests.test_jobs.JobFileUploadTest.test_run_job_fail to check that an exception was raised by its job run and that the exception matches the expected type.
Placed TODO comments across various places in run_job() where I had to restore calls to job_result.save() to preserve the functionality of how the job model is mutating JobResult.data as it goes. This pattern needs to be refactored for this new paradigm as we work to eliminate run_job entirely.
Excluded content_type and content_encoding fields from JobResult. Had to effectively duplicate the base JobResultManager.store_result() manager method instead of overloading it and calling the base in order to accomplish this, but it should be more efficient overall since it won't result in double saving of an object.

Outstanding issues

Still need to migrate business logic from run_job into db backend/task classes
There's an issue where date_done is being nullified that needs to be addressed likely because of some race condition between run_job internals and database backend state engine (see previous bullet)
TaskStateChoices and celery_states need to be folded into a revamped JobResultStatusChoices choice set, so please wait on nitpicking that.
Most of UI is working except for tracking success/done state refresh of results
Unit tests omega broken again because of the status enums

TODO

Explanation of Change(s)
Added change log fragment(s) (for more information see the documentation)
Attached Screenshots, Payload Example
Unit, Integration Tests
Documentation Updates (when adding/changing features)
Example Plugin Updates (when adding/changing features)
Outline Remaining Work, Constraints from Design

nautobot/extras/models/jobs.py

jathanism · 2023-01-25T18:47:38Z

nautobot/extras/models/jobs.py

+    content_type = models.CharField(
+        default="application/x-nautobot-json",
+        max_length=128,
+        verbose_name=('Result Content Type'),
+        help_text='Content type of the result data')
+    content_encoding = models.CharField(
+        default="utf-8",
+        max_length=64,
+        verbose_name=('Result Encoding'),
+        help_text='The encoding used to save the task result data')


These are currently required by Celery, but since we're always using JSON this is just a passthrough right now. Need to review the base DatabaseBackend from django-celery-results and see if we can just always use JSON and eliminate or ignore these fields.

I still want to see if I can eliminate these fields since they are 100% no longer necessary. This would be done in JobResultManager.store_result().

nautobot/extras/models/jobs.py

jathanism · 2023-01-27T20:17:19Z

This is the current issue with encoding, specifically the args/kwargs seem to not being Kombu encoded:

{'_state': <django.db.models.base.ModelState at 0x11843b850>,
 'id': UUID('f9a98b40-77cf-4e50-bf7c-66b0a5175395'),
 '_custom_field_data': {},
 'job_model_id': UUID('64b239ab-71aa-4a59-a99f-002665f642bb'),
 'name': 'plugins/example_plugin.jobs/ExampleLoggingJob',
 'task_name': 'nautobot.extras.jobs.run_job',
 'obj_type_id': 16,
 'date_created': datetime.datetime(2023, 1, 27, 19, 48, 31, 714111, tzinfo=<UTC>),
 'date_done': datetime.datetime(2023, 1, 27, 19, 48, 36, 222092, tzinfo=<UTC>),
 'user_id': UUID('2b12f865-52db-4405-8fc8-1c4d4375c2c5'),
 'status': 'SUCCESS',
 'data': None,
 'periodic_task_name': None,
 'worker': 'celery@jathy-studio.lan',
 'task_args': '"()"',
 'task_kwargs': '"{\'data\': {\'interval\': 4, \'_task_queue\': \'default\'}, \'request\': <nautobot.utilities.utils.NautobotFakeRequest object at 0x1149db280>, \'commit\': True, \'task_queue\': \'default\', \'job_result_pk\': UUID(\'f9a98b40-77cf-4e50-bf7c-66b0a5175395\')}"',
 'content_type': 'application/x-nautobot-json',
 'content_encoding': 'utf-8',
 'result': 'null',
 'traceback': None,
 'meta': '{"children": []}',
 'schedule_id': None,
 'task_id': UUID('7ded996a-e588-4fef-b214-7df7d6e699db'),
 'use_job_logs_db': True}

JobResult fields renamed: - `job_id` => `task_id` - `created` => `date_created` - `completed` => `date_done` - `name` => `task_name` JobResult fields changed - `status` choices changed to mirror that of `TaskResult.status.choices`

- `JobResult.task_name` now mirrors `TaskResult.task_name` and is used distinctly - There's an issue where `date_done` is being nullified that needs to be addressed - Most of UI is working except for tracking success/done state refresh of results - Unit tests omega broken again because of the status enums

ripped out. - All of it is being handled by the database backend. - Customized the DB backend to manage the `date_done` field for us. - Encoding/decoding task args/kwargs is still funky. Have not figured that out yet. - Tests still broken.

- Refactored `JobResultStatusChoices` to align with Celery task state names - Squashed migrations down into a single file. - Made flake8 and black happy ^_^

- `JobResult.obj_type` is now nullable, but we might want to revisit this to make it conditional around "system jobs", but this was necessary to make non-Job Celery tasks run at all without an `IntegrityError` - Fixed an encoding issue with the database backend, but still not the double-encoding one. - The `JobResultManager` now sets `name` to match `task_name` if `name` isn't set. - Fixed the table column for `JobResultTable.date_created` to be linkified in the UI. - Caught some field names I missed in the Job Result templates.

glennmatthews · 2023-02-01T14:06:19Z

nautobot/core/celery/backends.py

+    def _get_extended_properties(self, request, traceback):
+        """
+        Overload default so that `argsrepr` and `kwargsrepr` aren't used to construct `args` and
+        `kwargs`. Our need to properly (de)serialize `NautobotFakeRequest` facilitates this.


If we get rid of NautobotFakeRequest, will that get rid of the need to override this private method? Not saying that has to be done as part of this PR though.

Yes, I think so. The other problem are the nullable CharField fields we forklifted. If we want "correctness" I would rather those default to "" vs. None but that's actually more invasive than what I had to do here.

nautobot/extras/api/views.py

glennmatthews · 2023-02-01T14:11:27Z

nautobot/extras/datasources/registry.py

@@ -46,7 +46,7 @@ def refresh_datasource_content(model_name, record, request, job_result, delete=F
                    job_result.log(
                        f"Error while refreshing {entry.name}: {exc}", level_choice=LogLevelChoices.LOG_FAILURE
                    )
-                    job_result.set_status(JobResultStatusChoices.STATUS_ERRORED)
+                    job_result.set_status(JobResultStatusChoices.STATUS_FAILURE)


Does this file also need a TODO about removing the set_status calls?

Actually I meant to remove all of those. Anywhere explicit status changes are being made must go away. We want Celery to make these decisions, along with any explicit calls to JobResult.save(), moving those to the NautobotTask class or the backend or a mix of both, depending.

@jathanism Do you intend to do this in this PR or open a follow-on story?

Follow-on as a part of the "make run_job a thing of the past" work.

glennmatthews · 2023-02-01T14:13:42Z

nautobot/extras/jobs.py


        except AbortTransaction:
            if not job_model.read_only:
                job.log_info(message="Database changes have been reverted automatically.")

-        except Exception as exc:
-            stacktrace = traceback.format_exc()
-            job.log_failure(message=f"An exception occurred: `{type(exc).__name__}: {exc}`\n```\n{stacktrace}\n```")


How do exceptions thrown during job execution get recorded now?

They get stored on the JobResult.traceback field and unless otherwise handled by decorating the task with a throws= argument, they result in the job status of FAILURE.

OK, so I guess we may want to consider enhancing the UI so that (maybe only if DEBUG=True?) we display the .traceback value?

Any traceback will result in a failure unless it's defined as a known exception type such as using autoretry_for

Also traceback is currently part of the "Additional Data" tab:

Unless you're thinking of showing it in the job log as well?

glennmatthews · 2023-02-01T14:14:31Z

nautobot/extras/jobs.py

-        finally:
-            if output:
-                job.results["output"] += "\n" + str(output)
+        output = job.post_run()


The try/except was added specifically for #1807 - are exceptions captured/logged differently now?

Yes. We should allow all exceptions to bubble up and they are handled by Celery and then the traceback gets captured.

nautobot/extras/managers.py

nautobot/extras/models/jobs.py

- Celery testing has now been updated to run tasks with `ALWAYS_EAGER=True` so that instead of requiring a single-threaded worker process using the `solo` pool strategy, we can just run them "synchronously" and have them correctly update the state of the `JobResult`. - `nautobot.core.testing.run_job_for_testing` now stores the Celery result from a job on the `JobResult` object it returns as `JobResult.celery_result` - This was necessary due to the fact that `nautobot.extras.jobs.run_job` no longer handles exceptions internally and allows them to bubble up. `run_job` is called synchronously within `run_job_for_testing`, so a need to catch exceptions was introduced as a result. - Now instead of using a custom context manager, `run_job_for_testing`, a dummy `Request` object is created (if one isn't passed) and wrappped using `copy_safe_request` - Added the `traceback` field to the "Additional Data" tab on `JobResult` detail view template - Tests for extras API and views had to be revised for new expectations in that there will ALWAYS be `JobResult` objects now and to filter them by the name of the Job model class. - Revised `nautobot.extras.tests.test_jobs.JobFileUploadTest.test_run_job_fail` to check that an exception was raised by its job run and that the exception matches the expected type. - Placed `TODO` comments across various places in `run_job()` where I had to restore calls to `job_result.save()` to preserve the functionality of how the job model is mutating `JobResult.data` as it goes. This pattern needs to be refactored for this new paradigm as we work to eliminate `run_job` entirely.

nautobot/core/testing/__init__.py

glennmatthews · 2023-02-02T17:40:40Z

nautobot/core/utils/requests.py

+        user = data.pop("user")
+        data.pop("_cached_user", None)
        if "_user_pk" not in data:
            # We have a user but haven't stored its PK yet, so look it up and store it
-            data["_user_pk"] = data.pop("user").pk
-        else:
-            # We already have stored the PK, and we need to AVOID actually resolving the lazy user reference.
-            data.pop("user")
+            data["_user_pk"] = user.pk
        return data


Wonder if this should be back-ported to 1.5.

It might not be necessary, seeing as we don't have the recurring serialize/deserialize issue we have here with the database backend. The _cached_user only becomes a part of the payload when it is re-serialized from a deserialized payload a 2nd time.

glennmatthews · 2023-02-02T17:45:30Z

nautobot/extras/templates/extras/jobresult.html

+                        <strong>Traceback</strong>
+                    </div>
+                    <div class="panel-body">
+                        {% include 'extras/inc/json_data.html' with data=result.traceback format="python" %}


Wondering if we should do a check for settings.DEBUG here - similar to how we only present full tracebacks for other exceptions (500 errors) if it's True?

IMO if there is a traceback we should show it. I was thinking that since it's stored on the object, presenting it if it exists is easy. Is there a concern about data exposure or something tangible?

Yeah, that was my thinking.

That's valid. IMO let's leave that as a v2 UI thing. I've tracked it to add to the backlog.

bryanculver

I think this is a great step forward. Not going to hold up the absence of the change/migration documentation (status value changes, created => date_created) but so long as we ensure that gets added to the top of the backlog to do, the longer it sits the more conflicts/blockings we might run into.

bryanculver · 2023-02-02T18:19:10Z

nautobot/core/celery/task.py

+# TODO(jathan): Remove this once this body of work is done. This is just useful for debugging but it
+# results int a lot of noise and slows things down.
+# from celery import signals
+# @signals.task_prerun.connect


You have the TODO to remove but you disable the connect. I think if it's valuable to have this around for debugging why not just comment the whole thing out or document how to use this, instead of having to go back into the git history to restore it?

I just uncomment it locally when I need it for now. Leaving it in for testing generates a stupendous amount of noise.

I think documenting how to use it would be valuable.

nautobot/extras/managers.py

bryanculver · 2023-02-02T18:24:15Z

nautobot/extras/datasources/registry.py

@@ -46,7 +46,7 @@ def refresh_datasource_content(model_name, record, request, job_result, delete=F
                    job_result.log(
                        f"Error while refreshing {entry.name}: {exc}", level_choice=LogLevelChoices.LOG_FAILURE
                    )
-                    job_result.set_status(JobResultStatusChoices.STATUS_ERRORED)
+                    job_result.set_status(JobResultStatusChoices.STATUS_FAILURE)


@jathanism Do you intend to do this in this PR or open a follow-on story?

bryanculver · 2023-02-02T18:25:50Z

nautobot/extras/filters/__init__.py

    status = django_filters.MultipleChoiceFilter(choices=JobResultStatusChoices, null_value=None)

    class Meta:
        model = JobResult
-        fields = ["id", "created", "completed", "status", "user", "obj_type", "name"]
+        fields = ["id", "date_created", "date_done", "name", "status", "user", "obj_type"]


Let's be sure to add either as a requirement to this PR or a follow-on to create the migration docs for all the model changes here.

Probably would be a good pre-release checklist item is to review every migration in next and ensure we have a migration guide time around it.

Yes, it's on my list of items I'll be adding to the epic for wrapping up the database backend work in #1622

bryanculver · 2023-02-02T18:27:51Z

nautobot/extras/jobs.py


                if not commit:
-                    raise AbortTransaction()
+                    raise AbortTransaction("Database changes have been reverted automatically.")

        except AbortTransaction:


I think it would help if we completed #2726 shortly after merging this.

I'm thinking that'll just be another one for ripping out run_job. Leave out the atomic block while we're at it.

- Had to effectively duplicate the base `store_result()` manager method instead of overloading it and calling the base in order to accomplish this, but it should be more efficient overall since it won't result in double saving of an object - Added more verbose commenting in areas of the code that were overloaded where some motives may not have been clear (case in point above). - Addressed some PR feedback for clarity in some other code commennts.

jathanism marked this pull request as draft January 25, 2023 18:39

jathanism commented Jan 25, 2023

View reviewed changes

nautobot/extras/models/jobs.py Outdated Show resolved Hide resolved

jathanism commented Jan 25, 2023

View reviewed changes

nautobot/extras/models/jobs.py Outdated Show resolved Hide resolved

jathanism commented Jan 25, 2023

View reviewed changes

nautobot/extras/models/jobs.py Outdated Show resolved Hide resolved

bryanculver linked an issue Jan 26, 2023 that may be closed by this pull request

Jobs Overhaul: Merge TaskResult and JobResult data models #3085

Closed

10 tasks

jathanism force-pushed the jathanism-celery-merge-models branch from d48efd8 to 601ed2d Compare January 30, 2023 23:46

jathanism added 13 commits January 31, 2023 12:52

Start merging of JobResult/TaskResult models.

b314f89

JobResult fields renamed: - `job_id` => `task_id` - `created` => `date_created` - `completed` => `date_done` - `name` => `task_name` JobResult fields changed - `status` choices changed to mirror that of `TaskResult.status.choices`

Some debug stuff that should be removed before the purge.

afac720

Merge working kind of. Pushing up to change computers.

3e5b821

Remove Sentry secrets. Whoops. Don't worry, they are rotated already.

24adcd0

Try to use Kombu, but it doesn't work. This is on the right track tho.

dbd73b9

Rebase merge in next and address new import path.

5c5455a

Got the tests working.

fa04294

- Refactored `JobResultStatusChoices` to align with Celery task state names - Squashed migrations down into a single file. - Made flake8 and black happy ^_^

Added changelog file.

acb4f54

Ok the tests are really passing now, right?

f6f8680

No for real this time, you guys. I'm super cereal.

2f355bb

Renumber migrations. Ugh.

7d95957

jathanism force-pushed the jathanism-celery-merge-models branch from ac2b4fc to 7d95957 Compare January 31, 2023 20:56

jathanism added 2 commits January 31, 2023 14:47

Fixed issue with content-encoding and everything is all JSON.

75483b1

glennmatthews reviewed Feb 1, 2023

View reviewed changes

bryanculver mentioned this pull request Feb 1, 2023

Jobs Overhaul: Merge TaskResult and JobResult data models #3085

Closed

10 tasks

jathanism marked this pull request as ready for review February 1, 2023 16:47

jathanism added 2 commits February 1, 2023 17:29

Make linting happy and address some PR review comments.

be76b28

glennmatthews reviewed Feb 2, 2023

View reviewed changes

nautobot/core/testing/__init__.py Show resolved Hide resolved

bryanculver self-requested a review February 2, 2023 16:04

glennmatthews reviewed Feb 2, 2023

View reviewed changes

bryanculver approved these changes Feb 2, 2023

View reviewed changes

jathanism merged commit 1b5d62d into next Feb 3, 2023

jathanism deleted the jathanism-celery-merge-models branch February 3, 2023 01:30

bryanculver mentioned this pull request Feb 3, 2023

Jobs Overhaul: Remove existing job result state transition code for database backend #3084

Closed

This was referenced Feb 4, 2023

Jobs Overhaul: Refactor JobResult.result to JSONField from TextField #3229

Closed

Jobs Overhaul: Establish JobResultManager as true subclass #3231

Closed

gsnider2195 mentioned this pull request Jun 26, 2023

Job Results API Returning 500 Error #3978

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge `TaskResult` into `JobResult` objects #3164

Merge `TaskResult` into `JobResult` objects #3164

jathanism commented Jan 25, 2023 •

edited

Loading

jathanism Jan 25, 2023

jathanism Feb 2, 2023

jathanism commented Jan 27, 2023

glennmatthews Feb 1, 2023

jathanism Feb 2, 2023

glennmatthews Feb 1, 2023

jathanism Feb 2, 2023

bryanculver Feb 2, 2023

jathanism Feb 2, 2023

glennmatthews Feb 1, 2023

jathanism Feb 1, 2023

glennmatthews Feb 2, 2023

jathanism Feb 2, 2023

glennmatthews Feb 1, 2023

jathanism Feb 1, 2023

glennmatthews Feb 2, 2023

jathanism Feb 2, 2023

glennmatthews Feb 2, 2023

jathanism Feb 2, 2023

glennmatthews Feb 2, 2023

jathanism Feb 2, 2023

bryanculver left a comment

bryanculver Feb 2, 2023

jathanism Feb 2, 2023

bryanculver Feb 2, 2023

bryanculver Feb 2, 2023

jathanism Feb 2, 2023

bryanculver Feb 2, 2023

jathanism Feb 3, 2023

Merge TaskResult into JobResult objects #3164

Merge TaskResult into JobResult objects #3164

Conversation

jathanism commented Jan 25, 2023 • edited Loading

Closes: #3085

What's Changed

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jathanism commented Jan 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryanculver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Merge `TaskResult` into `JobResult` objects #3164

Merge `TaskResult` into `JobResult` objects #3164

jathanism commented Jan 25, 2023 •

edited

Loading