Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Messenger] Bad performances with the Doctrine/MySQL transport #47828

Closed
sylfabre opened this issue Oct 11, 2022 · 12 comments · May be fixed by #51034
Closed

[Messenger] Bad performances with the Doctrine/MySQL transport #47828

sylfabre opened this issue Oct 11, 2022 · 12 comments · May be fixed by #51034

Comments

@sylfabre
Copy link
Contributor

sylfabre commented Oct 11, 2022

Symfony version(s) affected

5.4.13 (but I guess it happens with 6.* as well as the code is the same)

Description

We have the following setup

  • MySQL 8.0.15 with an InnoDb engine and READ-COMMITTED as transaction isolation level
  • 4 Ubuntu-based workers running supervisor which runs 2 processes
  • Command to consume: php bin/console messenger:consume async_email --limit=100 --time-limit=300 --memory-limit=480M

We experienced a lag of up to ~80k messages in the queue because workers spend most of their time waiting to lock a message to process.
A colleague tried to increase the number of processes from 8 to 80 but it made things worst.
Our processing tops at 3 or 4 consumed messages per seconds on this specific queue

Here are my findings so far:

The MySQL command SHOW FULL PROCESSLIST shows a lot of the SELECT ... FOR UPDATE requests are in the Sending data state.

The SELECT ... FOR UPDATE uses the index to find and lock a message:

EXPLAIN SELECT m.id FROM messenger_email m WHERE (m.delivered_at is null OR m.delivered_at < '2022-10-10 20:16:29') AND (m.available_at <= '2022-10-10 21:16:29') AND (m.queue_name = 'email') ORDER BY available_at ASC LIMIT 1 FOR UPDATE
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE m NULL range IDX_DC4C2A4016BA31DB,IDX_DC4C2A40E3BD61CE,IDX_DC4C2A40FB7336F0 IDX_DC4C2A40E3BD61CE 5 NULL 37451 25.00 Using index condition; Using where

IDX_DC4C2A40E3BD61CE is on the available_at column

Because it uses an index and we are using READ-COMMITTED as the transaction isolation level, then the following happens:

If the WHERE condition includes an indexed column, and InnoDB uses the index, only the indexed column is considered when taking and retaining record locks. In the following example, the first UPDATE takes and retains an x-lock on each row where b = 2. The second UPDATE blocks when it tries to acquire x-locks on the same records, as it also uses the index defined on column b.

CREATE TABLE t (a INT NOT NULL, b INT, c INT, INDEX (b)) ENGINE = InnoDB;
INSERT INTO t VALUES (1,2,3),(2,2,4);
COMMIT;

# Session A
START TRANSACTION;
UPDATE t SET b = 3 WHERE b = 2 AND c = 3;

# Session B
UPDATE t SET b = 4 WHERE b = 2 AND c = 4;

Source: https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html

So the bottleneck in my case happens when workers search for a message to lock because the query is blocking.
Once locked, the message is very very quick to handle.

How to reproduce

I guess it's quite difficult to reproduce as I think there is a balance between:

  • The number of processes consuming the queue
  • The time spend to consume one message
  • The load on the MySQL server

Possible Solution

Solution 1 - Remove the indexes

As suggested by #42868 removing the indexes actually removes the blockings issue as per MySQL documentation because without indexes:

If READ COMMITTED is used instead, the first UPDATE acquires an x-lock on each row that it reads and releases those for rows that it does not modify:

x-lock(1,2); unlock(1,2)
x-lock(2,3); update(2,3) to (2,5); retain x-lock
x-lock(3,2); unlock(3,2)
x-lock(4,3); update(4,3) to (4,5); retain x-lock
x-lock(5,2); unlock(5,2)

For the second UPDATE, InnoDB does a “semi-consistent” read, returning the latest committed version of each row that it reads to MySQL so that MySQL can determine whether the row matches the WHERE condition of the UPDATE:

x-lock(1,2); update(1,2) to (1,4); retain x-lock
x-lock(2,3); unlock(2,3)
x-lock(3,2); update(3,2) to (3,4); retain x-lock
x-lock(4,3); unlock(4,3)
x-lock(5,2); update(5,2) to (5,4); retain x-lock

However, when there are a lot of messages, it leads to slow queries because MySQL uses a "filesort" because of the ORDER BY available_at clause.
I think this clause could be removed if losing the FIFO-like strategy is acceptable

Solution 2 - Use SKIP LOCKED

MySQL has a SKIP LOCKED clause that could be used here: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html

A locking read that uses SKIP LOCKED never waits to acquire a row lock. The query executes immediately, removing locked rows from the result set.

I measured these performances for the SELECT query:

  • Without FOR UPDATE & SKIP LOCKED: about 0.002 seconds to return a row
  • With FOR UPDATE & without SKIP LOCKED: about 2 seconds to return a row
  • With FOR UPDATE& with SKIP LOCKED: about 0.2 seconds to return a row

=> x10 faster with the SKIP LOCKED option

As far as I know, doctrine/dbal doesn't support the SKIP LOCKED option.
An issue has been around for years about it with doctrine/orm: doctrine/orm#7746

I suggest here that the Doctrine transport "manually" adds the SKIP LOCKED option when the database platform is MySQL 8.0 or later (maybe others that support it)

This is my preferred solution

Additional Context

No response

@stof
Copy link
Member

stof commented Oct 11, 2022

Non-support for SKIP LOCKED in doctrine/orm is a non-issue, because the Doctrine transport is a DBAL transport, not an ORM one. And DBAL allows you to pass any SQL you want (as long as it is supported by the server of course).

@sylfabre
Copy link
Contributor Author

@stof I meant that there is no method in doctrine/dbal to do it automatically like there is $this->driverConnection->getDatabasePlatform()->getWriteLockSQL().
And you're right, it can be done with pure SQL like this

$stmt = $this->executeQuery(
	$sql.' '.$this->driverConnection->getDatabasePlatform()->getWriteLockSQL() . ' SKIP LOCKED',
	$query->getParameters(),
	$query->getParameterTypes()
);

@Wouter0100
Copy link

Wouter0100 commented Oct 19, 2022

We have this issue as well. I added the SKIP LOCKED and reverted this Merge Request.

The delete statement introduced in the Merge Request (this one) is causing the discussed SELECT to wait regardless of SKIP LOCKED. We still have the indexes applied on the table.

The combination of these 2 changes makes it so we're able to process a large queue w/o any issues with ~10 of consumers.

@sylfabre
Copy link
Contributor Author

sylfabre commented Oct 19, 2022

@Wouter0100 Did you publicly publish on GitHub a fork of the package with the setup that works for you?
I would greatly appreciate it 🤩

If anyone is interested, I made this fork https://github.com/sylfabre/doctrine-messenger/tree/5.4 for Symfony 5.4 which adds the SKIP LOCKED to the MySQL query but doesn't revert any change. You can use it a temporary replacement of the official component

@Wouter0100
Copy link

@Wouter0100 Did you publicly publish on GitHub a fork of the package with the setup that works for you? I would greatly appreciate it star_struck

Unfortunately, I don't have that available. Sorry. :( We hot patched it for now, as we needed to get it working on the very short term - mainly because we're migrating to an alternative queuing backend as we speak.

@tmarly
Copy link

tmarly commented Oct 26, 2022

Skipping lock isn't dangerous when there are multiple consumers ?

@carsonbot
Copy link

Hey, thanks for your report!
There has not been a lot of activity here for a while. Is this bug still relevant? Have you managed to find a workaround?

@carsonbot
Copy link

Friendly ping? Should this still be open? I will close if I don't hear anything.

@davidroth
Copy link

davidroth commented Jul 6, 2023

This issue appears every now and then in one of our projects..
IMHO the Skip locked aproach should really be integrated here as a default option. @sylfabre Are you still using your forked version? Any plans to create an MR here?

@carsonbot carsonbot removed the Stalled label Jul 6, 2023
@sylfabre
Copy link
Contributor Author

sylfabre commented Jul 6, 2023

Hello @davidroth
We are still using my version in production at my company.

We are planning to use RabbitMQ as a backend as it is more appropriate
MySQL is better suited for small loads and testing which the official component is correctly supporting.

So I don't plan to create a PR here: feel free to use my fork and submit one if you want to!

@davidroth
Copy link

@sylfabre Thx for the update. Yes we we also switched to RabbitMQ for the high volume message workloads. But for a transactional outbox the table based queue is still needed.

@hgraca
Copy link

hgraca commented Nov 24, 2023

@davidroth @sylfabre @Wouter0100
FYI SKIP LOCKED has been added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment