Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: can't receive message from fifo queue with the same MessageGroupId #10107

Closed
1 task done
Nenavsegda opened this issue Jan 23, 2024 · 22 comments · Fixed by #10223 or #10859
Closed
1 task done

bug: can't receive message from fifo queue with the same MessageGroupId #10107

Nenavsegda opened this issue Jan 23, 2024 · 22 comments · Fixed by #10223 or #10859
Assignees
Labels
aws:sqs Amazon Simple Queue Service status: backlog Triaged but not yet being worked on type: bug Bug report

Comments

@Nenavsegda
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

with python boto3 client

  • send a message to fifo queue with MessageGroupId='1'
  • receive the message
  • delete the message
  • send another message to the queue with MessageGroupId='1'
  • try to receive the message, get none
  • check message availability with get_queue_attributes, get {'ApproximateNumberOfMessages': '1', 'ApproximateNumberOfMessagesNotVisible': '0', 'ApproximateNumberOfMessagesDelayed': '0', ...

Expected Behavior

expecting to receive message from the queue

How are you starting LocalStack?

With a docker-compose file

Steps To Reproduce

docker-compose up

Environment

- OS:MacOS Sonoma 14.2.1
- LocalStack: latest

Anything else?

I tried to run with Rosetta, no result

@Nenavsegda Nenavsegda added status: triage needed Requires evaluation by maintainers type: bug Bug report labels Jan 23, 2024
@localstack-bot
Copy link
Collaborator

Welcome to LocalStack! Thanks for reporting your first issue and our team will be working towards fixing the issue for you or reach out for more background information. We recommend joining our Slack Community for real-time help and drop a message to LocalStack Pro Support if you are a Pro user! If you are willing to contribute towards fixing this issue, please have a look at our contributing guidelines and our contributing guide.

@schraitle-pomelo
Copy link

Saw an issue similar to this in our builds a few weeks ago. The symptom was that our tests would fail apparently because only one queue message was ever delivered. The localstack-pro version was 3.0.3.dev20240105221043 in the docker logs of the builds that failed. We switched to use the stable tag as a quick fix at the time, and the issue went away. The issue resurfaced today (we are still using the stable tag) and the version number in the docker logs is being reported as 3.1.0. Changing the image version from stable down to 3.0.2 brings us back to the expected behavior and our tests pass again.

I can confirm with testing that using a randomized string for the MessageGroupId in each message will allow more than one queue message to flow through a given FIFO queue, while using a static value makes it so that only the first message will be delivered.

@classicPintus
Copy link

I don't know if can be helpful but I have created a minimal spring boot project with the described scenarios in this repository

@jbauers
Copy link

jbauers commented Jan 31, 2024

Just echoing what @schraitle-pomelo mentioned - our tests failed as only one message gets delivered. In those, the MessageGroupId was the same for the messages sent. Using a unique MessageGroupId for each message allows more than one message to be delivered through the FIFO queue.

For now sticking to the 3.0.2 version.

Edit: The issue seems to have been introduced in commit 367ff33 - removing the added check by using this (naive) Dockerfile makes our tests pass again:

FROM localstack/localstack:stable
RUN sed -i '1019,1020d' /opt/code/localstack/localstack/services/sqs/models.py

Then use this image in an updated docker-compose.yaml:

diff --git a/docker-compose.yaml b/docker-compose.yaml
index 6d14eca..3375093 100644
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@@ -2,7 +2,8 @@ version: '3.3'
 services:
   localstack:
     # REMEMBER TO CHANGE docker-compose.yml AND .github/workflows/build.yml AT THE SAME TIME
-    image: localstack/localstack:stable
+    # image: localstack/localstack:stable
+    build: .
     environment:
       SERVICES: dynamodb,sqs
       DEFAULT_REGION: eu-central-1

Not sure about the internals going on here, but hopefully this helps :)

@Nenavsegda
Copy link
Author

okay, version 3.0.2 works just fine, thanks a lot, I'm going close the issue now

@jbauers
Copy link

jbauers commented Feb 1, 2024

@Nenavsegda it's still a valid bug, I'd suggest to reopen please. Otherwise I'll create a new issue. I can also repro it:

root@5bdd86475de1:/opt/code/localstack# awslocal sqs create-queue --queue-name test.fifo --attributes FifoQueue=true
{
    "QueueUrl": "http://sqs.us-east-1.localhost.localstack.cloud:4566/000000000000/test.fifo"
}
root@5bdd86475de1:/opt/code/localstack# awslocal sqs send-message --queue-url http://sqs.us-east-1.localhost.localstack.cloud:4566/000000000000/test.fifo --message-body '{"hello": "world"}' --message-group-id foo --message-deduplication-id 1
{
    "MD5OfMessageBody": "49dfdd54b01cbcd2d2ab5e9e5ee6b9b9",
    "MessageId": "f00e4e5c-e98d-42a9-b8a3-b14d4f690a4e",
    "SequenceNumber": "14661116923572387840"
}
root@5bdd86475de1:/opt/code/localstack# awslocal sqs receive-message --queue-url http://sqs.us-east-1.localhost.localstack.cloud:4566/000000000000/test.fifo
{
    "Messages": [
        {
            "MessageId": "f00e4e5c-e98d-42a9-b8a3-b14d4f690a4e",
            "ReceiptHandle": "MzU1NzMyZmQtNTkzMC00MWY5LWE4MGUtNTNiNmFlZGJjZDIzIGFybjphd3M6c3FzOnVzLWVhc3QtMTowMDAwMDAwMDAwMDA6dGVzdC5maWZvIGYwMGU0ZTVjLWU5OGQtNDJhOS1iOGEzLWIxNGQ0ZjY5MGE0ZSAxNzA2Nzc4NjQ5LjIxOTQ3",
            "MD5OfBody": "49dfdd54b01cbcd2d2ab5e9e5ee6b9b9",
            "Body": "{\"hello\": \"world\"}"
        }
    ]
}
root@5bdd86475de1:/opt/code/localstack# awslocal sqs send-message --queue-url http://sqs.us-east-1.localhost.localstack.cloud:4566/000000000000/test.fifo --message-body '{"foo": "bar"}' --message-group-id foo --message-deduplication-id 2
{
    "MD5OfMessageBody": "94232c5b8fc9272f6f73a1e36eb68fcf",
    "MessageId": "71e31a0a-17d9-4e83-ac90-fd42382669b1",
    "SequenceNumber": "14661116923572387841"
}
root@5bdd86475de1:/opt/code/localstack# awslocal sqs receive-message --queue-url http://sqs.us-east-1.localhost.localstack.cloud:4566/000000000000/test.fifo
root@5bdd86475de1:/opt/code/localstack# awslocal sqs send-message --queue-url http://sqs.us-east-1.localhost.localstack.cloud:4566/000000000000/test.fifo --message-body '{"foo": "bar"}' --message-group-id bar --message-deduplication-id 3
{
    "MD5OfMessageBody": "94232c5b8fc9272f6f73a1e36eb68fcf",
    "MessageId": "caefe7cc-4355-4779-9b64-c1249d9bd4b9",
    "SequenceNumber": "14661116923572387842"
}
root@5bdd86475de1:/opt/code/localstack# awslocal sqs receive-message --queue-url http://sqs.us-east-1.localhost.localstack.cloud:4566/000000000000/test.fifo
{
    "Messages": [
        {
            "MessageId": "caefe7cc-4355-4779-9b64-c1249d9bd4b9",
            "ReceiptHandle": "ODg0NTI0NTQtOWM5MS00OTlmLWIzMWItODFjNTZlMThhZTBiIGFybjphd3M6c3FzOnVzLWVhc3QtMTowMDAwMDAwMDAwMDA6dGVzdC5maWZvIGNhZWZlN2NjLTQzNTUtNDc3OS05YjY0LWMxMjQ5ZDliZDRiOSAxNzA2Nzc4NjYzLjc5MjI3MDI=",
            "MD5OfBody": "94232c5b8fc9272f6f73a1e36eb68fcf",
            "Body": "{\"foo\": \"bar\"}"
        }
    ]
}

@Nenavsegda Nenavsegda reopened this Feb 1, 2024
@bentsku
Copy link
Contributor

bentsku commented Feb 1, 2024

Hello and thanks for your reports! Could you please indicate if your test cases are working against AWS?

@jbauers, it looks like this should be the intended behavior per the AWS documentation:

When you receive a message with a message group ID, no more messages for the same message group ID are returned unless you delete the message or it becomes visible.

https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues-understanding-logic.html
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html

If you could confirm that LocalStack is behaving differently than AWS here, that would be great. Thanks!

@jbauers
Copy link

jbauers commented Feb 1, 2024

Hi @bentsku, thanks for the quick reply. Can confirm that this is in fact expected behaviour:

➜  ~ aws sqs create-queue --queue-name test.fifo --attributes FifoQueue=true
➜  ~ aws sqs send-message --queue-url https://sqs.eu-central-1.amazonaws.com/0123456789/test.fifo --message-body '{"hello": "world"}' --message-group-id foo --message-deduplication-id 1
➜  ~ aws sqs receive-message --queue-url https://sqs.eu-central-1.amazonaws.com/0123456789/test.fifo
➜  ~ aws sqs send-message --queue-url https://sqs.eu-central-1.amazonaws.com/0123456789/test.fifo --message-body '{"foo": "bar"}' --message-group-id foo --message-deduplication-id 2
➜  ~ aws sqs receive-message --queue-url https://sqs.eu-central-1.amazonaws.com/0123456789/test.fifo
➜  ~ aws sqs send-message --queue-url https://sqs.eu-central-1.amazonaws.com/0123456789/test.fifo --message-body '{"foo": "bar"}' --message-group-id bar --message-deduplication-id 3
➜  ~ aws sqs receive-message --queue-url https://sqs.eu-central-1.amazonaws.com/0123456789/test.fifo

☝️ yields the same result as awslocal + localstack. The caveat is that as per [1], Lambda deletes the message from the SQS queue after successful processing - so if you're using the SQS client directly in your handler, without LocalStack Lambda, you'll get different results depending on whether you're running tests with LocalStack or in an AWS + Lambda environment.

We'll update our tests - sorry for the noise, hope this helps others :)

[1] https://docs.aws.amazon.com/en_gb/lambda/latest/dg/with-sqs.html

@jamescarter-le
Copy link

jamescarter-le commented Feb 2, 2024

I'm not sure this expected behavior, I'm sending unique messages with the same MessageGroupId.
Only the first request to the FIFO queue returns pending messages.

When the messages are deleted (Acknowledged) localstack SQS does not return the next batch of messages for that MessageGroupId, instead they appear as Available Messages but are never pulled.

image

When targeting a real SQS FIFO queue, it will return new messages for this MessageGroupId when the previous messages are deleted.

Version: 3.1.1.dev

@fkohl04
Copy link

fkohl04 commented Feb 5, 2024

Hey all,

When you receive a message with a message group ID, no more messages for the same message group ID are returned unless you delete the message or it becomes visible.

According to this documentation @bentsku already posted, if there are two messages with the same group id and the first one is deleted (aka. acknowledged), the second one should be returned. This is not the case in localstack 3.1.0, but in localstack 3.0.2 and also in AWS. Like @jamescarter-le already pointed out in localstack 3.1.0 further messages group id are just not returned at all. This is not expected and a bug. @jbauers first example demonstrates it quite well.

@thrau
Copy link
Member

thrau commented Feb 5, 2024

just as additional context, here are changes we made to FIFO SQS queue handling

It seems the behavior we have right now is still not 100% correct. i don't have time to look into it right now, but will see whether someone can work on this in the team.

i think we need to do more testing of AWS behavior, but automated testing of ordering for FIFO queues and visibility timeout is very tricky. so a more manual testing approach would probably be better. perhaps it's also just a silly little thing missing somewhere

cc @steffyP

@kairsas
Copy link

kairsas commented Feb 6, 2024

I also get messages stuck from the same MessageGroupId, only the first one is delivered.
Reproduced it today on the latest version.
And can confirm that a custom image with this change removed (like mentioned here) is working fine, subsequent MessageGroupId messages are delivered.

@komarkovich komarkovich added aws:sqs Amazon Simple Queue Service status: backlog Triaged but not yet being worked on and removed status: triage needed Requires evaluation by maintainers labels Feb 6, 2024
@thrau
Copy link
Member

thrau commented Feb 7, 2024

☝️ yields the same result as awslocal + localstack. The caveat is that as per [1], Lambda deletes the message from the SQS queue after successful processing - so if you're using the SQS client directly in your handler, without LocalStack Lambda, you'll get different results depending on whether you're running tests with LocalStack or in an AWS + Lambda environment.

i think we do that, but should double check. cc @joe4dev

@joe4dev
Copy link
Member

joe4dev commented Feb 8, 2024

☝️ yields the same result as awslocal + localstack. The caveat is that as per [1], Lambda deletes the message from the SQS queue after successful processing - so if you're using the SQS client directly in your handler, without LocalStack Lambda, you'll get different results depending on whether you're running tests with LocalStack or in an AWS + Lambda environment.

i think we do that, but should double check. cc @joe4dev

Our SQS event source listener does delete the SQS messages here upon successful processing using the delete_messages callback.

@jbauers I guess you are referring to the following Lambda event source mapping behavior, right?

When Lambda reads a batch, the messages stay in the queue but are hidden for the length of the queue's visibility timeout. If your function successfully processes the batch, Lambda deletes the messages from the queue. By default, if your function encounters an error while processing a batch, all messages in that batch become visible in the queue again. For this reason, your function code must be able to process the same message multiple times without unintended side effects.

Source: https://docs.aws.amazon.com/en_gb/lambda/latest/dg/with-sqs.html

@jbauers Related to the LocalStack Lambda integration, did you observe any difference between AWS and LocalStack?

@baermat baermat self-assigned this Feb 9, 2024
@jbauers
Copy link

jbauers commented Feb 9, 2024

@joe4dev Yes, exactly. We're only using LocalStack SQS, not the Lambda integration, in our tests, therefore have an inconsistent setup compared to running on AWS, where we do use Lambda + SQS, and then obviously need to account for the Lambda integration handling message deletion on AWS vs. that not happening automatically in our tests. I'm therefore unable to comment on this:

Related to the LocalStack Lambda integration, did you observe any difference between AWS and LocalStack?

What I can say is that when we purge the queue "manually" in our tests, as we do now, subsequent messages do get delivered. Long-term we'll probably start using LocalStack SQS + Lambda, but I didn't dig into this yet - for the time being our setup is working fine and as expected.

I'd refer to @jamescarter-le, as he appears to have a different experience, but probably also a different setup :)

@jakkubu
Copy link

jakkubu commented Feb 9, 2024

I'm also experiencing the same issue. When using LocalStack SQS fifo queue. I am not able to receive messages with the same MessageGroupId after deleting the first one (or when processing in batch after deleting the ones in first batch)

I tested it with aws-sdk-go-v2 and localstack version 3.1.1.dev.

I did not experience this issue when running the same code against AWS SQS fifo.

@vpavic
Copy link

vpavic commented Feb 9, 2024

We're also affected by this.

Downgraded to 3.0.2 until this is fixed, but that means we have to live with #9832 which is what prompted us to pick up 3.1.0 in the first place.

@classicPintus
Copy link

classicPintus commented Mar 14, 2024

I have tried now against the "latest" and "3" tags of the docker image "localstack/localstack" and the problem is still there.

I have a repo that maybe can help

@esirK
Copy link

esirK commented Mar 15, 2024

I have tried now against the "latest" and "3" tags of the docker image "localstack/localstack" and the problem is still there.

I have a repo that maybe can help

+1

Same here. I have upgraded to 3.2.0 but I'm still facing the same problem.

I think for me, it just that the message wasn't yet visible. I can confirm I am able to send multiple messages with same MessageGroupId.

One thing I'm not sure is, shouldn't the messages automatically tigger e.g a lambda function once the messages become available? Right now I would have to resend the message after x amount of seconds for the trigger to re-occur.

@baermat
Copy link
Member

baermat commented Mar 26, 2024

I have tried now against the "latest" and "3" tags of the docker image "localstack/localstack" and the problem is still there.

I have a repo that maybe can help

For that particular repo, it seems that the messages are not deleted once they are received in the tests. Therefore the message is in-flight, which locks the message group. That behavior should be the same on AWS. I used the scenario01 as example, and once I added deletion it worked on my end on LocalStack. Please let us know if you can make these tests run again AWS but not against LocalStack!

@sebastianistoblame
Copy link

sebastianistoblame commented May 2, 2024

@baermat I am still experiencing this issue using Localstack 3.4.0 with a FIFO queue + dead letter queue.
Steps to reproduce:

  1. Setup FIFO Queue with Max-Receive-Count 1 and Dead Letter Queue
  2. Send message to FIFO queue
  3. Receive the message without acknowledging, therefore it ends up on the DLQ
  4. Send message with the same message group ID to FIFO queue
    -> New message is not visible

The invisible message can be seen using the Localstack API:

curl -H "Accept: application/json"  "http://localhost:4566/_aws/sqs/messages?ShowInvisible=true&ShowDelayed=true&QueueUrl=http://localhost:4566/queue/eu-central-1/000000000000/your-queue-name.fifo" | jq

@baermat baermat reopened this May 21, 2024
@baermat
Copy link
Member

baermat commented May 21, 2024

Apologies for the delay, the initial attempt at reproducing this after your message failed due to a misunderstanding. We think we identified the underlying issue, and are working on a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:sqs Amazon Simple Queue Service status: backlog Triaged but not yet being worked on type: bug Bug report
Projects
None yet