PostgreSQL notification payload limit should not restrict job payload size #42

mrputty · 2020-09-18T00:54:54Z

I have a use case for which Odd Jobs is very appealing, but it requires the ability to handle somewhat larger job payloads. Right now, creating a job with a large payload fails in the database with this error:

ERROR:  payload string too long
CONTEXT:  SQL statement "SELECT pg_notify('job_created_jobs', row_to_json(new)::text)"
PL/pgSQL function notify_job_monitor_for_jobs() line 2 at PERFORM

Out of the box, PostgreSQL limits the size of the notification payload to 8000 bytes (see https://www.postgresql.org/docs/12/sql-notify.html). Because the trigger created by createNotificationTrigger passes the full job table row (serialized to JSON) to pg_notify, the maximum size of the job payload is significantly less that 8000 bytes.

Fortunately, the de-serialization of the notification payload in jobEventListener only cares about the id, run_at, and locked_at columns. It seems that it should be easy to accommodate larger job payloads by changing the trigger to only pass these values, suitably JSON-serialized, to pg_notify, instead of the entire row.

Where the trigger function currently has row_to_json(new)::text, simply substituting json_build_object('id', new.id, 'run_at', new.run_at, 'locked_at', new.locked_at)::text seems to do the trick.

Is there a problem that I'm missing with in this approach?

The text was updated successfully, but these errors were encountered:

saurabhnanda · 2020-09-18T04:37:41Z

Thank you for the PR @mrputty . It seems absolutely fine. In fact, in another place (unrelated to odd-jobs) where we are using LISTEN/NOTIFY for larger payloads, even we have faced this issue.

Just curious, what kind of large payloads are you putting in the job queue? Usually, we end-up putting the PK of another row in the DB as the job-payload (where the target row may be very large, not the job-payload itself).

Nevertheless, this optimization is good-to-have. I will merge it after compiling locally once.

mrputty · 2020-09-18T14:45:06Z

Great, glad to hear this will be merged.

My use case involves using odd-jobs to orchestrate the retrieval of individual items from a large data set. Within each item there are different subsets of data that may or may not need to be retrieved. The job-payload consists primarily of a list of item ids, tagged with the subsets that need to be retrieved. Some of these requests may include thousands of items, and we would quickly run into the 8000 byte limit, but we don't think they'll ever be terribly large.

Because the set of items to be retrieved does not have any meaning outside of the request, it seems to me that the overhead of maintaining them is in a secondary table is unnecessary, and also potentially difficult. For example, managing the lifetime of rows in the secondary table seems complicated. If one of these jobs fails, we would have to remember to delete the corresponding row (whose id would be embedded in the JSON job payload) when we were cleaning up the failed job records.

As I thought through managing this, I realized that there's also no way for me to enforce guarantees about record lifetimes even during normal processing. For example, if I were to write code to delete the secondary record at the end of job execution in my code, I have no way to guarantee that some other exception doesn't occur after the delete but before odd-jobs actually finalized the job. Since neither job runner nor any onJobStart callback have access to the underlying database connection, this cannot be done in a transaction.

Thinking about that was enough to encourage me to look for a simpler solution, which lead me to how pg_notify was being used.

saurabhnanda · 2020-10-01T06:01:48Z

@mrputty would you like to close and re-open a fresh MR, if you care about Hacktoberfest? Else I'll merge this one.

mrputty mentioned this issue Sep 18, 2020

Reduced database notification payload (Fixes saurabhnanda/odd-jobs#42) #43

Merged

saurabhnanda closed this as completed in #43 Oct 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL notification payload limit should not restrict job payload size #42

PostgreSQL notification payload limit should not restrict job payload size #42

mrputty commented Sep 18, 2020

saurabhnanda commented Sep 18, 2020

mrputty commented Sep 18, 2020

saurabhnanda commented Oct 1, 2020

PostgreSQL notification payload limit should not restrict job payload size #42

PostgreSQL notification payload limit should not restrict job payload size #42

Comments

mrputty commented Sep 18, 2020

saurabhnanda commented Sep 18, 2020

mrputty commented Sep 18, 2020

saurabhnanda commented Oct 1, 2020