New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[crmp] Check IsAckPendingBefore send ACK #7740
[crmp] Check IsAckPendingBefore send ACK #7740
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test that exercises this? Specifically:
- A test that exercises the actual situation that's at issue here, which is one side sending a message and the other side taking long enough to process that it sends a standalone ack and then a response. The response should not get dropped.
- Ideally a test that would exercise the brokenness around
HasPeerRequestedAck
.
But also, it's not clear to me why we are completely stopping processing of a message that carries an ack to a message we don't think we want an ack for. That logic is not in the spec anywhere I can see (@turon am I just missing it there?) and we could end up in this situation simply by virtue of running out of CRMP retries for a message and then getting a response to it, with an ack, no?
@@ -136,6 +136,7 @@ CHIP_ERROR ReliableMessageContext::FlushAcks() | |||
#if !defined(NDEBUG) | |||
ChipLogDetail(ExchangeManager, "Flushed pending ack for MsgId:%08" PRIX32, mPendingPeerAckId); | |||
#endif | |||
SetAckPending(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just have SendStandaloneAckMessage
do the SetAckPending(false)
bit. That's what other callers of it assume anyway, except for ReliableMessageMgr::ExecuteActions
, which could remove its manual call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if piggy back message does not do the SetAckPending(false), then a standalong ack message will be sent in the following MRP tick, right?
We have another issue since those flags might get manipulated from multiple threads, but we already have a separate thread to discuss the general racing issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if piggy back message does not do the SetAckPending(false),
But it does, right? That call is in ExchangeMessageDispatch::SendMessage
where we piggyback.
So to be clearer: we should have SetAckPending(false)
in piggyback where we have it now and in SendStandaloneAckMessage
, which are the two places that can send acks. And we should not need it anywhere else. That means not in FlushAcks
and not in ReliableMessageMgr::ExecuteActions
.
// If there is a pending acknowledgment piggyback it on this message. | ||
if (reliableMessageContext->HasPeerRequestedAck()) | ||
// If there is a pending acknowledgment piggyback it on this message, and we have not send ACK before. | ||
if (reliableMessageContext->HasPeerRequestedAck() && reliableMessageContext->IsAckPending()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be testing reliableMessageContext->IsAckPending()
. See #7339
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just use one flag to indicate there is a pending message needs to be acked to prevent confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isAckPending
should be the right thing, I agree.
Is it true that in SendMessage
, we could just get rid of SetPeerRequestedAck
and just call SetAckPending()
? I think so. Then we could replace all calls to Set/HasPeerRequestedAck()
completely, and just always use IsAckPending()
.
See snippet below:
CHIP_ERROR ExchangeMessageDispatch::OnMessageReceived(const PayloadHeader & payloadHeader, uint32_t messageId,
const Transport::PeerAddress & peerAddress,
ReliableMessageContext * reliableMessageContext)
{
ReturnErrorCodeIf(!MessagePermitted(payloadHeader.GetProtocolID().GetProtocolId(), payloadHeader.GetMessageType()),
CHIP_ERROR_INVALID_ARGUMENT);
if (IsReliableTransmissionAllowed())
{
if (payloadHeader.IsAckMsg() && payloadHeader.GetAckId().HasValue())
{
ReturnErrorOnFailure(reliableMessageContext->HandleRcvdAck(payloadHeader.GetAckId().Value()));
}
if (payloadHeader.NeedsAck())
{
MessageFlags msgFlags;
// An acknowledgment needs to be sent back to the peer for this message on this exchange,
// Set the flag in message header indicating an ack requested by peer;
msgFlags.Set(MessageFlagValues::kPeerRequestedAck);
// Also set the flag in the exchange context indicating an ack requested;
//////////// HERE REPLACE
// reliableMessageContext->SetPeerRequestedAck(true);
reliableMessageContext->SetAckPending(true);
///// END OF SUGGESTION
ReturnErrorOnFailure(reliableMessageContext->HandleNeedsAck(messageId, msgFlags));
}
}
return CHIP_NO_ERROR;
}
I found this logic is a bit hard to test, since it is a race condition infact. I also raised an issue in the spec repo about this logic, I'm not sure if this case is a possible replay attack, need some discussions for this. I will go through the CRMP tests to see where can be used to add some hooks for the test. @yufengwangca Do you have some idea about this? |
Size increase report for "gn_qpg6100-example-build" from 01ee80e
Full report output
|
Size increase report for "nrfconnect-example-build" from 01ee80e
Full report output
|
Size increase report for "esp32-example-build" from 01ee80e
Full report output
|
@@ -49,8 +49,8 @@ CHIP_ERROR ExchangeMessageDispatch::SendMessage(SecureSessionHandle session, uin | |||
PayloadHeader payloadHeader; | |||
payloadHeader.SetExchangeID(exchangeId).SetMessageType(protocol, type).SetInitiator(isInitiator); | |||
|
|||
// If there is a pending acknowledgment piggyback it on this message. | |||
if (reliableMessageContext->HasPeerRequestedAck()) | |||
// If there is a pending acknowledgment piggyback it on this message, and we have not send ACK before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is the MRP tick fire before the response message from the app, and it only clear the isAckPending flag, but not HasPeerRequestedAck flag. So the following response message still piggy back the ack for the same message which has already been acked by a standalong ack message
Regarding "the server sent two ACKs to the same message id", each should have a unique MessageCounter and the same AckMessageCounter. In other words, the message ids should not be the same unless the message is in fact a retransmission of the same exact message. Regarding "the client think this was a replay attack", why would the client register these two separate messages as a replay attack? They are from the same source, but each has a unique MessageCounter. Ack processing and AckMessageCounter should never factor into replay protection. In fact, all the header fields used for Ack processing are in the encrypted portion of the message, so must not retroactively trigger rejection of an otherwise verified and accepted message. |
To test this broken logic, we need to delay the echo response and allow the standalong ack kick first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we have no new tests, or updated tests that confirm this new logic. Can we have tests added to confirm this works, and doesn't break in future?
I agree that it's a pain... That's why I haven't put up the (simple) fix for #7339 yet: I have not had time to write the tests. That said,
Yes, agreed. We should fix that (separate pr?) in addition to whatever else we do here. |
Problem
In #7634, the test is failling
Note that:
and
The server sent two ACKs to the same message id, thus the client think this was a replay attack (?), and drop the second message.
Change overview
Adds a check in CRMP not to send ack flag if the ack was already sent.
Testing
https://github.com/erjiaqing/connectedhomeip/actions/runs/949064565