TestDiagnosticLogsDownloadCommand fails during darwin CI runs quite often #32636

nivi-apple · 2024-03-20T00:18:31Z

The diagnostic log tests seem to fail quite frequently. Logs - https://github.com/project-chip/connectedhomeip/actions/runs/8347127817/job/22854062448?pr=32631 for an instance

Disabled in #34260

Please re-enable in https://github.com/project-chip/connectedhomeip/blob/master/scripts/tests/chiptest/__init__.py#L170, once this is addressed.

bzbarsky-apple · 2024-03-26T16:03:01Z

This is failing on:

ERROR   21:13:51.686 - TEST OUT  : 		***** Test Step 12 : Read End User Support log intent after a failure to make sure that everything still works

Is it possible that the test is getting confused about which test the "failure" belongs with or something?

bzbarsky-apple · 2024-03-26T16:03:11Z

@vivien-apple Could you take a look, please?

harimau-qirex · 2024-05-10T16:55:49Z

I looked into this a bit, and here's a brief summary of what I believe is happening:

The previous step attempts to generate an error. It requests the network diagnostics logs with a very short timeout, under the assumption that the timeout will be hit and generate an error.
Some of the time, a CHIP_ERROR_INTERNAL error occurs during that step. I haven't looked into the source of that error.
That error is treated as the error the step was expecting and the test moves onto the next step, which requests the end user support logs.
The all clusters app receives the BDX StatusReport indicating the timeout and sends the appropriate command response indicating the diagnostic logs transfer failed.
The test associates this command response with the current step and considers the test to have failed.

There are a couple things here. The immediate cause is this mystery internal error, as that causes the test to move on to the next step so the expected failure is inadvertently associated with the wrong step.

A broader issue is that the tests aren't quite detailed enough in what they're expecting. Some steps expect that an error occurs but doesn't specify what kinds of error are acceptable, so the test treats any error as the step passing rather than stop the test when the wrong errors occur.

I ran into a case where this broader issue caused the tests to erroneously succeed. One of the steps right before these transfers the network diagnostics logs normally. However, the test moved on before the all clusters app sent the final ACK for the BlockAckEof message. This meant the all clusters app didn't consider the BDX transfer completed. When the network diagnostics logs were requested again, the all clusters app responded with a busy status since the previous transfer was still ongoing. The test saw the error from the busy status and treated it as the expected timeout-related error, instead of stopping.

nivi-apple · 2024-06-06T23:16:35Z

Uploaded PR #33788 but that does not fix the issue. It looks like we might not be aborting the transfer for timeout or internal errors on the Initiator side. This needs more analysis to fix it properly. I am working on it.

nivi-apple · 2024-06-07T22:14:43Z

So I figured out why the test is failing intermittently. I am using logs from here as reference - https://github.com/project-chip/connectedhomeip/actions/runs/9372056858/job/25802710559?pr=33672.

It looks like for the test -***** Test Step 11 : Read Network Diagnostic log intent with a very short timeout and a very long log

We send a request to initiate BDX -

ERROR 19:23:46.404 - APP OUT : [1717529026404] [67942:190636] [EM] <<< [E:60809i S:61167 M:18707577] (S) Msg TX to 1:233CE82C33836403 [6D7B] [UDP:192.168.64.25:59528] --- Type 0002:01 (BDX:SendInit)

For some reason the Init fails and instead of a SendAccept we get a status report and that is not expected and hence the next test fails.

ERROR 19:23:47.411 - TEST OUT : ***** Test Step 12 : Read End User Support log intent after a failure to make sure that everything still works
ERROR 19:23:47.450 - APP OUT : [1717529027450] [67942:190664] [BDX] OutputEvent type: StatusReceived
ERROR 19:23:47.451 - APP OUT : [1717529027450] [67942:190664] [BDX] Diagnostic logs transfer: StatusReport Error 5f

To alleviate this, I guess we can make sure we get a sendaccept before we proceed with success tests. I will also corroborate with couple other logs to see if its the same issue

nivi-apple · 2024-06-08T04:26:14Z

also it looks like sometimes the status report gets through before reset happens. Other times the status report is not sent. I had observed this when initially writing the code for downloading logs. I need to look into the code and make sure we ensure the status report is sent out before calling reset.

nivi-apple added darwin tests labels Mar 20, 2024

This was referenced Apr 5, 2024

Darwin Build Failure #32836

Closed

Darwin Build Failure #32837

Closed

bzbarsky-apple added the build issue label Apr 5, 2024

This was referenced Apr 5, 2024

Darwin Build Failure #32838

Closed

Darwin Build Failure #32839

Closed

Darwin Build Failure #32841

Closed

nivi-apple linked a pull request Jun 6, 2024 that will close this issue

Fix download log test #33788

Open

cecille mentioned this issue Jul 9, 2024

Turn off flaky MRP test in darwin #34260

Merged

github-actions bot removed the darwin label Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestDiagnosticLogsDownloadCommand fails during darwin CI runs quite often #32636

TestDiagnosticLogsDownloadCommand fails during darwin CI runs quite often #32636

nivi-apple commented Mar 20, 2024 •

edited by cecille

Loading

bzbarsky-apple commented Mar 26, 2024

bzbarsky-apple commented Mar 26, 2024

harimau-qirex commented May 10, 2024

nivi-apple commented Jun 6, 2024 •

edited

Loading

nivi-apple commented Jun 7, 2024 •

edited

Loading

nivi-apple commented Jun 8, 2024

TestDiagnosticLogsDownloadCommand fails during darwin CI runs quite often #32636

TestDiagnosticLogsDownloadCommand fails during darwin CI runs quite often #32636

Comments

nivi-apple commented Mar 20, 2024 • edited by cecille Loading

bzbarsky-apple commented Mar 26, 2024

bzbarsky-apple commented Mar 26, 2024

harimau-qirex commented May 10, 2024

nivi-apple commented Jun 6, 2024 • edited Loading

nivi-apple commented Jun 7, 2024 • edited Loading

nivi-apple commented Jun 8, 2024

nivi-apple commented Mar 20, 2024 •

edited by cecille

Loading

nivi-apple commented Jun 6, 2024 •

edited

Loading

nivi-apple commented Jun 7, 2024 •

edited

Loading