Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(core): Accelerate common execution queries #9817

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

perf(core): Accelerate common execution queries #9817

wants to merge 12 commits into from

Conversation

ivov
Copy link
Contributor

@ivov ivov commented Jun 20, 2024

@ivov ivov changed the title perf(core): Speed up common execution queries perf(core): Accelerate common execution queries Jun 20, 2024
@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Jun 20, 2024
@ivov ivov marked this pull request as ready for review June 20, 2024 13:23
@ivov ivov requested a review from a team as a code owner June 20, 2024 13:23
Copy link
Contributor

@krynble krynble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it is that we just need 4 indices, and the image below outlines the changes I'm proposing:

Screenshot 2024-06-24 at 10 22 50

The reason for each index has been described in the ticket, but in the end we want all 4 db systems to have only those 4 indices, wdyt?

@ivov
Copy link
Contributor Author

ivov commented Jun 24, 2024

Thanks, from the story I was under the impression we only wanted to touch these, rather than all of them.

@netroy netroy self-requested a review July 3, 2024 08:14
@@ -19,9 +19,18 @@ abstract class IndexOperation extends LazyPromise<void> {
protected tablePrefix: string,
queryRunner: QueryRunner,
protected customIndexName?: string,
protected skipIfMissing = false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only be on the DropIndex class

export class DropIndex extends IndexOperation {
	constructor(
		tableName: string,
		columnNames: string[],
		tablePrefix: string,
		queryRunner: QueryRunner,
		customIndexName?: string,
		protected skipIfMissing = false,
	) {
		super(tableName, columnNames, tablePrefix, queryRunner, customIndexName);
	}

	async execute(queryRunner: QueryRunner) {
		return await queryRunner
			.dropIndex(this.fullTableName, this.customIndexName ?? this.fullIndexName)
			.catch((error) => {
				if (error instanceof Error && error.message.includes('not found') && this.skipIfMissing) {
					return;
				}
				throw error;
			});
	}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we somehow use the IF EXISTS parameter for DBs that support it?

Copy link
Contributor Author

@ivov ivov Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only be on the DropIndex class

dd17796

could we somehow use the IF EXISTS parameter for DBs that support it?

We could call queryRunner.query with handcrafted SQL and then fall back to ignoring the error in a catch clause if IF EXISTS is unsupported, but I think the current approach is simpler. Let me know otherwise!

Copy link
Contributor

@tomi tomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments, mainly about the structure of the indexes

return await queryRunner
.dropIndex(this.fullTableName, this.customIndexName ?? this.fullIndexName)
.catch((error) => {
if (error instanceof Error && error.message.includes('not found') && this.skipIfMissing) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain that this is the message returned by all DBs? It's unfortunate that we need to rely on this type of hack just because MySQL doesn't support IF EXISTS. We could also also use the IF EXISTS for those that support it (SQLite, PG) and rely on this for MySQL. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They all throw the same error TypeORMError: Supplied index {indexId} was not found in table {table}.

Re: handling differently depending on DB, I replied about this above - my reasoning is it's simpler to have one way of handling this. Once we remove MySQL this will no longer be an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, makes sense 👍 Maybe we can be more specific and expect a TypeORMError instead of any generic Error?

Comment on lines +6 to +9
* - `status, startedAt` for `ExecutionRepository.findManyByRangeQuery` (default query)
* - `workflowId, status, startedAt` for `ExecutionRepository.findManyByRangeQuery` (filter query)
* - `waitTill, status` for `ExecutionRepository.getWaitingExecutions`
* - `stoppedAt, deletedAt, status` for `ExecutionRepository.softDeletePrunableExecutions`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we actually done any investigation how much these indexes help and how much they impact write performance? Having multiple indexes (especially composite) is not cheap as they need to be updated with every insert, update and delete.

Looking at the fields, the (status, startedAt) index can probably also serve the ExecutionRepository.findManyByRangeQuery query as the only column missing is workflowId. We could verify this.

For the last two indexes we could create a partial index using WHERE clause to include only rows that have waitTill and stoppedAt as NOT NULL and deletedAt as IS NULL. That way the index size is more limited and performs better as that's how we use them in the queries. Again, MySQL doesn't support partial indexes so in that case we could create the indexes without the WHERE clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we actually done any investigation how much these indexes help and how much they impact write performance? Having multiple indexes (especially composite) is not cheap as they need to be updated with every insert, update and delete. Looking at the fields, the (status, startedAt) index can probably also serve the ExecutionRepository.findManyByRangeQuery query as the only column missing is workflowId. We could verify this.

I think Omar did but that's now lost. If you have time tomorrow, let's pair on this and you can show me how?

For the last two indexes we could create a partial index using WHERE clause to include only rows that have waitTill and stoppedAt as NOT NULL and deletedAt as IS NULL. That way the index size is more limited and performs better as that's how we use them in the queries. Again, MySQL doesn't support partial indexes so in that case we could create the indexes without the WHERE clause.

Will do tomorrow 👍🏻

Copy link
Contributor

@tomi tomi Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's do it 👍 Basically the steps are:

  1. Have a DB with relevant amount of data
  2. Create the indexes
  3. Run different queries to see how they perform. Run EXPLAIN ANALYZE to see if the DBMS is utilizing the indexes
  4. Compare results

Comment on lines +36 to +39
await schemaBuilder.createIndex('execution_entity', ['status', 'startedAt']);
await schemaBuilder.createIndex('execution_entity', ['workflowId', 'status', 'startedAt']);
await schemaBuilder.createIndex('execution_entity', ['waitTill', 'status']);
await schemaBuilder.createIndex('execution_entity', ['stoppedAt', 'deletedAt', 'status']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give explicit names to these so they are better identifiable and modifying them later on is easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the names, I find them clear enough, what would you suggest?

Capture 2024-07-30 at 17 11 44@2x

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant name them explicitly, so it's obvious from the code what the index names will be, instead of letting the ORM generate the names. But those are also fine as long as they are stable across all different DBs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants