Skip to content

HARMONY-2338: Fix performance issue with work scheduler queueing work items#910

Merged
chris-durbin merged 4 commits into
mainfrom
harmony-2338
May 21, 2026
Merged

HARMONY-2338: Fix performance issue with work scheduler queueing work items#910
chris-durbin merged 4 commits into
mainfrom
harmony-2338

Conversation

@chris-durbin
Copy link
Copy Markdown
Contributor

Jira Issue ID

HARMONY-2338

Description

I found the cause of the poor performance of the work scheduler in production was not a result of locked tables - it was just we needed an index to improve selecting the work item to queue as part of the fair queueing algorithm. After adding the index in production I saw the query went from taking over 7 seconds to just milliseconds.

In addition to adding the index I removed some code that wasn't being used and a feature flag USE_SERVICE_QUEUES that was obsolete.

Local Test Steps

Test the migration up and down.

NODE_ENV=production DATABASE_URL=postgresql://postgres:<password>@localhost:5432/postgres knex --cwd db migrate:up
NODE_ENV=production DATABASE_URL=postgresql://postgres:<password>@localhost:5432/postgres knex --cwd db migrate:down
NODE_ENV=production DATABASE_URL=postgresql://postgres:<password>@localhost:5432/postgres knex --cwd db migrate:up

Verify the index work_items_ready_lookup_index is added to the work_items table.

If testing harmony in a box build all the images and then run bin/bootstrap-harmony and make sure the migrations run. Verify requests continue to work as expected.

I tested with harmony in a box and a deployment to sandbox as well.

PR Acceptance Checklist

  • Acceptance criteria met
  • Tests added/updated (if needed) and passing
  • Documentation updated (if needed)
  • Harmony in a Box tested (if changes made to microservices or new dependencies added)

… next work item to queue - from over 7 seconds to just ms in production with 25 million rows in the work_items table.
Copy link
Copy Markdown
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks straight forward in terms of the changes and migration.

Removal of the USE_SERVICE_QUEUES flag and associated function relocation for the tests also makes sense.

Migrations work locally and Index shows up when running in Harmony-In-A-Box.

I don't even have any nits.

* @param reqLogger - a logger instance
* @returns A work item from the database for the given service ID
*/
export async function getWorkFromDatabase(serviceID: string, reqLogger: Logger): Promise<WorkItemData | null> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move this function into test code? Isn't it used in non-test code, e.g., services/hamrony/app/backends/workflow-orchestration.ts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh , wait, never mind, I see now.

@chris-durbin chris-durbin merged commit be7c4d1 into main May 21, 2026
5 checks passed
@chris-durbin chris-durbin deleted the harmony-2338 branch May 21, 2026 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants