Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Messenger][SQS] Uncaught SQS Exception when worker stopped. #50342

Open
armetiz opened this issue May 16, 2023 · 16 comments
Open

[Messenger][SQS] Uncaught SQS Exception when worker stopped. #50342

armetiz opened this issue May 16, 2023 · 16 comments

Comments

@armetiz
Copy link
Contributor

armetiz commented May 16, 2023

Symfony version(s) affected

5.4.23

Description

It's not a problem that happens every time, the frequency is about 1 in 3.

SupervisorD is running and monitoring : bin/console messenger:consume mailer --no-tracking --time-limit=3600.
Every ~3600 sec, messenger worker will be stopped to be restarted automatically by supervisord.

When worker is stopping, SQS throw an uncaught exception (see additional context).

How to reproduce

I don't know how to reproduce.
This configuration has been in production for several months, and the problem appeared less than 2 months ago.

I spent many hours understanding the situation. So far, I only have thoughts.

Note that the problem is occurring only when the worker is stopping after "time-limit".

Possible Solution

poll_timeout < wait_time

As a reminder, default SQS Messenger configuration:

  • wait_time: 20
  • poll_timeout: 0.1
  • buffer_size: 9

From AWS SQS Documentation.

Important: To avoid HTTP errors, ensure that the HTTP response timeout for ReceiveMessage requests is longer than the WaitTimeSeconds parameter.

AmazonSqs\Transport\Connection__destruct() is calling already destructed useful objects.

aws-async is a library that is not easy to debug..

Long polling is not working

Additional Context

messenger.yaml

framework:
    messenger:
        reset_on_message: true # reset services after consuming messages
        transports:
            mailer:
                dsn: "sqs://sqs.%amazon_region%.amazonaws.com"
                options:
                    region: "%amazon_region%"
                    access_key: "%amazon_key%"
                    secret_key: "%amazon_secret%"
                    queue_name: "%queuing_queue_namespace%-emails.fifo"
                retry_strategy:
                    max_retries: 3
                    delay: 1000
                    multiplier: 2
                    max_delay: 0

Two kind of exceptions:

ReceiveMessageResult.php on line 33

thrown in /home/ec2-user/acme/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php on line 33
#7 {main}
#6 [internal function]: Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->__destruct()
#5 /home/ec2-user/acme/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(82): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->reset()
#4 /home/ec2-user/acme/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(371): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->fetchMessage()
#3 /home/ec2-user/acme/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(236): AsyncAws\Sqs\Result\ReceiveMessageResult->getMessages()
#2 /home/ec2-user/acme/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php(26): AsyncAws\Core\Result->initialize()
#1 /home/ec2-user/acme/vendor/async-aws/core/src/Result.php(133): AsyncAws\Sqs\Result\ReceiveMessageResult->populateResult(Object(AsyncAws\Core\Response))
#0 /home/ec2-user/acme/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php(33): SimpleXMLElement->__construct('')
Stack trace:
PHP Fatal error:  Uncaught Exception: String could not be parsed as XML in /home/ec2-user/acme/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php:33

AsyncResponse.php on line 287

thrown in /home/ec2-user/acme/vendor/symfony/http-client/Response/AsyncResponse.php on line 287
#8 {main}
#7 [internal function]: Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->__destruct()
#6 /home/ec2-user/acme/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(82): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->reset()
#5 /home/ec2-user/acme/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(371): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->fetchMessage()
#4 /home/ec2-user/acme/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(232): AsyncAws\Core\Result->resolve(0.1)
#3 /home/ec2-user/acme/vendor/async-aws/core/src/Result.php(63): AsyncAws\Core\Response->resolve(0.1)
#2 /home/ec2-user/acme/vendor/async-aws/core/src/Response.php(143): Symfony\Component\HttpClient\Response\ResponseStream->rewind()
#1 /home/ec2-user/acme/vendor/symfony/http-client/Response/ResponseStream.php(47): Generator->rewind()
#0 [internal function]: Symfony\Component\HttpClient\Response\AsyncResponse::stream(Array, 0.1, '...')
Stack trace:
PHP Fatal error:  Uncaught LogicException: Instance of "Symfony\Component\HttpClient\Response\CurlResponse" is already consumed and cannot be managed by "Symfony\Component\HttpClient\RetryableHttpClient". A decorat
ed client should not call any of the response's methods in its "request()" method. in /home/ec2-user/acme/vendor/symfony/http-client/Response/AsyncResponse.php:287

supervisord output

CLI : bin/console messenger:consume mailer --no-tracking --time-limit=3600 -vv

02:25:35 INFO      [messenger] Received message Symfony\Component\Mailer\Messenger\SendEmailMessage ["class" => "Symfony\Component\Mailer\Messenger\SendEmailMessage"]

 // Quit the worker with CONTROL-C.

 // received a stop signal via the messenger:stop-workers command.
 // The worker will automatically exit once it has been running for 3600s or

 [OK] Consuming messages from transport "mailer".

  thrown in /home/ec2-user/acme/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php on line 33
#7 {main}
#6 [internal function]: Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->__destruct()
#5 /home/ec2-user/acme/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(82): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->reset()
#4 /home/ec2-user/acme/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(371): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->fetchMessage()
#3 /home/ec2-user/acme/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(236): AsyncAws\Sqs\Result\ReceiveMessageResult->getMessages()
#2 /home/ec2-user/acme/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php(26): AsyncAws\Core\Result->initialize()
#1 /home/ec2-user/acme/vendor/async-aws/core/src/Result.php(133): AsyncAws\Sqs\Result\ReceiveMessageResult->populateResult(Object(AsyncAws\Core\Response))
#0 /home/ec2-user/acme/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php(33): SimpleXMLElement->__construct('')
Stack trace:
PHP Fatal error:  Uncaught Exception: String could not be parsed as XML in /home/ec2-user/acme/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php:33
02:22:16 INFO      [messenger] Worker stopped due to time limit of 3600s exceeded ["timeLimit" => 3600]
02:22:16 INFO      [messenger] Stopping worker. ["transport_names" => ["mailer"]]
02:16:48 INFO      [messenger] Symfony\Component\Mailer\Messenger\SendEmailMessage was handled successfully (acknowledging to transport). ["class" => "Symfony\Component\Mailer\Messenger\SendEmailMessage"]
02:16:48 INFO      [messenger] Message Symfony\Component\Mailer\Messenger\SendEmailMessage handled by Symfony\Component\Mailer\Messenger\MessageHandler::__invoke ["class" => "Symfony\Component\Mailer\Messenger\Send
EmailMessage","handler" => "Symfony\Component\Mailer\Messenger\MessageHandler::__invoke"]
@armetiz
Copy link
Contributor Author

armetiz commented Jun 1, 2023

Note that polling_timeout have been increased from 0.1 to 0.5: no impact on error frequency.

@jgmullor
Copy link

Same issue here:

 // The worker will automatically exit once it has processed 9 messages, been running for 15s or received a stop signal 
 // via the messenger:stop-workers command.                                                                             

 // Quit the worker with CONTROL-C.                                                                                     

 // Re-run the command with a -vv option to see logs about consumed messages.                                           

PHP Fatal error:  Uncaught LogicException: Instance of "Symfony\Component\HttpClient\Response\CurlResponse" is already consumed and cannot be managed by "Symfony\Component\HttpClient\Response\AsyncResponse". A decorated client should not call any of the response's methods in its "request()" method. in /var/www/html/vendor/symfony/http-client/Response/AsyncResponse.php:284
Stack trace:
#0 /var/www/html/vendor/symfony/http-client/Response/AsyncResponse.php(66): Symfony\Component\HttpClient\Response\AsyncResponse::stream(Array, -0.0)
#1 /var/www/html/vendor/symfony/http-client/Response/CommonResponseTrait.php(142): Symfony\Component\HttpClient\Response\AsyncResponse::Symfony\Component\HttpClient\Response\{closure}(Object(Symfony\Component\HttpClient\Response\AsyncResponse), -0.0)
#2 /var/www/html/vendor/symfony/http-client/Response/AsyncResponse.php(96): Symfony\Component\HttpClient\Response\AsyncResponse::initialize(Object(Symfony\Component\HttpClient\Response\AsyncResponse))
#3 /var/www/html/vendor/async-aws/core/src/Response.php(141): Symfony\Component\HttpClient\Response\AsyncResponse->getStatusCode()
#4 /var/www/html/vendor/async-aws/core/src/Response.php(118): AsyncAws\Core\Response->resolve()
#5 [internal function]: AsyncAws\Core\Response->__destruct()
#6 {main}
  thrown in /var/www/html/vendor/symfony/http-client/Response/AsyncResponse.php on line 284

Fatal error: Uncaught LogicException: Instance of "Symfony\Component\HttpClient\Response\CurlResponse" is already consumed and cannot be managed by "Symfony\Component\HttpClient\Response\AsyncResponse". A decorated client should not call any of the response's methods in its "request()" method. in /var/www/html/vendor/symfony/http-client/Response/AsyncResponse.php:284
Stack trace:
#0 /var/www/html/vendor/symfony/http-client/Response/AsyncResponse.php(66): Symfony\Component\HttpClient\Response\AsyncResponse::stream(Array, -0.0)
#1 /var/www/html/vendor/symfony/http-client/Response/CommonResponseTrait.php(142): Symfony\Component\HttpClient\Response\AsyncResponse::Symfony\Component\HttpClient\Response\{closure}(Object(Symfony\Component\HttpClient\Response\AsyncResponse), -0.0)
#2 /var/www/html/vendor/symfony/http-client/Response/AsyncResponse.php(96): Symfony\Component\HttpClient\Response\AsyncResponse::initialize(Object(Symfony\Component\HttpClient\Response\AsyncResponse))
#3 /var/www/html/vendor/async-aws/core/src/Response.php(141): Symfony\Component\HttpClient\Response\AsyncResponse->getStatusCode()
#4 /var/www/html/vendor/async-aws/core/src/Response.php(118): AsyncAws\Core\Response->resolve()
#5 [internal function]: AsyncAws\Core\Response->__destruct()
#6 {main}
  thrown in /var/www/html/vendor/symfony/http-client/Response/AsyncResponse.php on line 284

Using symfony/amazon-sqs-messenger 6.3.0 and async-aws/sqs 1.7.0

@jgsamblas
Copy link

Hi,

I am getting the same issues, has anyone found any solution for this?

Thanks

@jgmullor
Copy link

In my case only happens when using localstack, works perfect while using the real SQS service.

@armetiz
Copy link
Contributor Author

armetiz commented Jul 27, 2023

I switched from a FIFO queue to a Standard queue. FIFO were not required for our use case; as far as I remember, Symfony Messenger does not suit well with AWS SQS FIFO restriction.

@jgsamblas
Copy link

Thanks for the replies!

I am using Standard Queue in prod and still getting the issue. It is hard to debug because it does not happen every time...

@jdelaune
Copy link
Contributor

jdelaune commented Aug 9, 2023

We have the same issue. Using standard queues.

@stepozer
Copy link

stepozer commented Dec 14, 2023

Hello all. I found this issue and I know how to reproduce it on local:

  1. create standard queue on amazon SQS
  2. run consume command with timeout option, for example 10s:
php /app/bin/console messenger:consume async --time-limit=10
  1. wait 10-20 seconds and get an error from task description (CurlError or XLM parsing error).

This is 100% reproducible on my local!

I can give more info about Uncaught Exception: String could not be parsed as XML. I found if I comment this if https://github.com/symfony/amazon-sqs-messenger/blob/fcce37602b1b83263ce27c0fd7ea85f988d67c53/Transport/Connection.php#L373:

            // fetch current response in order to requeue in transit messages
            if (!$this->fetchMessage()) {
                $this->currentResponse->cancel();
                $this->currentResponse = null;
            }

everything working well. So issue is when queue is empty and worker tries to stop on timeout it tries fetch events from queue and fail with an error. This is a very unpleasant error because messages end up being resent to the queue because of this

Please let me know if you need any details here. @nicolas-grekas can you please help with this?

@drekinov
Copy link

We started to receive same error after 6.4.0 update.
No SQS changes. It is standard SQS queue worked for years as is.
Here is stack trace from Sentry which is similar to provided in issue description.
image

Our worker is running with following arguments:

bin/console messenger:consume async --memory-limit=128M --time-limit=300 --limit=50 --failure-limit=10 -vv --env=prod

@stepozer
Copy link

I found an issue that code "stuck" when worker exit. This is profiler callgraph for this case. It stuck for 10-15 seconds.
Screenshot from 2024-01-24 15-25-51
Screenshot from 2024-01-24 15-26-16

@mtanasiewicz
Copy link

mtanasiewicz commented Feb 20, 2024

This is actually an issue with this release of Symfony HTTP Client: https://github.com/symfony/http-client/releases/tag/v5.4.35

We have downgraded to 5.4.34 and the issue is gone.

[EDIT]
We thought so, but the issue is back. I will proceed with downgrading.

@mtanasiewicz
Copy link

mtanasiewicz commented Feb 22, 2024

OK so I think I found the solution. I tried downgrading all the libraries, but nothing worked. I decided to downgrade php from 8.3.3 to 8.2.16 and upgrade all the libraries that result in throwing this exception to the latest versions. The issue is gone. There is probably some bug or unhandled BC break in php8.3.3 that needs to be addressed.

@armetiz
Copy link
Contributor Author

armetiz commented Mar 1, 2024

@mtanasiewicz I'm running on PHP 8.1.23

@armetiz
Copy link
Contributor Author

armetiz commented Mar 1, 2024

Is someone using Symfony 6 or 7 ?

@iamyukihiro
Copy link

iamyukihiro commented Mar 14, 2024

The error Uncaught Exception: String could not be parsed as XML no longer occurs after upgrading from async-aws/sqs 1.9.0 to 2.0.0.

async-aws/sqs@cef9f10#diff-d2ab9925cad7eac58e0ff4cc0d251a937ecf49e4b6bf57f8b95aab76648a9d34L15

It seems there have been changes related to AWS JSON-1.0 Protocol, and JSON Protocol seems to be prioritized over XML.
(There might also be a possibility that related packages changed priorities due to composer update symfony/amazon-sqs-messenger -W.)

Instead, around the same time, the error Uncaught Exception: Could not parse response as array started occurring.
(The frequency of its occurrence is occasional.)

StackTrace after the upgrade

Exception

# AsyncAws\Core\Exception\UnparsableResponse

Uncaught Exception: Could not parse response as array

"file": "/var/www/app/vendor/async-aws/core/src/Response.php:353",

Previous Exception

# Symfony\Component\HttpClient\Exception\JsonException

Response body is empty.

"file": "/var/www/app/vendor/symfony/http-client/Response/CommonResponseTrait.php:80"

Environment

  • Symfony 6.4.5
  • PHP 8.3
  • async-aws/sqs 2.0.0
  • symfony/amazon-sqs-messenger 6.4.4

@jetrodn
Copy link

jetrodn commented May 3, 2024

The exceptions String could not be parsed as XML or Could not parse response as array seems like depend on both wait_time and poll_timeout values, for some values it can be reproduced for other values everything works like intended.

As example: for values wait_time=2 and poll_timeout=0.2 I am getting {} response that is valid JSON and no exception:

 [OK] Consuming messages from transport "async_sqs".                                                                    
                                                                                                                        

 // The worker will automatically exit once it has been running for 3s or received a stop signal via the                
 // messenger:stop-workers command.                                                                                     

 // Quit the worker with CONTROL-C.                                                                                     

^ "{}"

For values wait_time=4 and poll_timeout=0.6 I am getting '' response with exception:

[OK] Consuming messages from transport "async_sqs".                                                                    
                                                                                                                        

 // The worker will automatically exit once it has been running for 3s or received a stop signal via the                
 // messenger:stop-workers command.                                                                                     

 // Quit the worker with CONTROL-C.                                                                                     

^ ""
PHP Fatal error:  Uncaught Symfony\Component\HttpClient\Exception\JsonException: Response body is empty. in /var/www/api/vendor/symfony/http-client/Response/CommonResponseTrait.php:78
Stack trace:
#0 /var/www/api/vendor/async-aws/core/src/Response.php(353): Symfony\Component\HttpClient\Response\AsyncResponse->toArray()
#1 /var/www/api/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php(37): AsyncAws\Core\Response->toArray()
#2 /var/www/api/vendor/async-aws/core/src/Result.php(139): AsyncAws\Sqs\Result\ReceiveMessageResult->populateResult()
#3 /var/www/api/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php(28): AsyncAws\Core\Result->initialize()
#4 /var/www/api/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(238): AsyncAws\Sqs\Result\ReceiveMessageResult->getMessages()
#5 /var/www/api/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(374): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->fetchMessage()
#6 /var/www/api/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(80): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->reset()
#7 [internal function]: Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->__destruct()
#8 {main}

Next AsyncAws\Core\Exception\UnparsableResponse: Could not parse response as array in /var/www/api/vendor/async-aws/core/src/Response.php:355
Stack trace:
#0 /var/www/api/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php(37): AsyncAws\Core\Response->toArray()
#1 /var/www/api/vendor/async-aws/core/src/Result.php(139): AsyncAws\Sqs\Result\ReceiveMessageResult->populateResult()
#2 /var/www/api/vendor/async-aws/sqs/src/Result/ReceiveMessageResult.php(28): AsyncAws\Core\Result->initialize()
#3 /var/www/api/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(238): AsyncAws\Sqs\Result\ReceiveMessageResult->getMessages()
#4 /var/www/api/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(374): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->fetchMessage()
#5 /var/www/api/vendor/symfony/amazon-sqs-messenger/Transport/Connection.php(80): Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->reset()
#6 [internal function]: Symfony\Component\Messenger\Bridge\AmazonSqs\Transport\Connection->__destruct()
#7 {main}
  thrown in /var/www/api/vendor/async-aws/core/src/Response.php on line 355

I assume it is related to Symfony HttpClient and stream method and how it works with chunks, especially I am getting this ErrorChunk and there are multiple places where chunk content is prefilled with '' as content, like this for example. Maybe it happens because connection is getting cancelled by the client and then the content is '' empty string instead of valid JSON or XML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants