Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflake LoopTest.WaitOneBlocking by increasing timeout. #17857

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Jul 10, 2024

This test case has failed a few times on macos_x86, and possibly on other platforms too. I suspect the OS isn't scheduling the test threads soon enough - there's only a 150 millisecond window between the event set on a worker thread and iree_loop_wait_one's timeout on the main thread:

main thread                worker thread
-----------                -------------
test start
spin up thread   ------>   wait 50ms
wait 200ms                 ...
...              <------   set event
timeout if event not set

Sample logs:

[ RUN      ] LoopTest.WaitOneBlocking
iree/runtime/src/iree/base/loop_test.h:612: Failure
Value of: status
Expected: error code OK
  Actual: 0x4, whose error code is DEADLINE_EXCEEDED: DEADLINE_EXCEEDED

[  FAILED  ] LoopTest.WaitOneBlocking (200 ms)

We could also use an infinite timeout.

@ScottTodd ScottTodd added the runtime Relating to the IREE runtime library label Jul 10, 2024
@ScottTodd ScottTodd requested a review from benvanik as a code owner July 10, 2024 22:51
@benvanik
Copy link
Collaborator

nice!

ScottTodd added a commit that referenced this pull request Jul 11, 2024
This test has flaked a few times:

*
https://github.com/iree-org/iree/actions/runs/9653016869/job/26624436852#step:5:1882
*
https://github.com/iree-org/iree/actions/runs/9716312176/job/26819673196#step:4:18786

```
[ RUN      ] ScopeTest.WaitIdleFailure
/work/runtime/src/iree/task/scope_test.cc:225: Failure
Value of: iree_task_scope_is_idle(&scope)
  Actual: true
Expected: false

[  FAILED  ] ScopeTest.WaitIdleFailure (175 ms)
[----------] 9 tests from ScopeTest (326 ms total)

[----------] Global test environment tear-down
[==========] 9 tests from 1 test suite ran. (326 ms total)
[  PASSED  ] 8 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ScopeTest.WaitIdleFailure

 1 FAILED TEST
```

We could instead drop the
`EXPECT_FALSE(iree_task_scope_is_idle(&scope));` check entirely.
Increasing the sleep time will increase test time regardless of OS
scheduler behavior (unlike #17857
which increased a _timeout_ duration).
@ScottTodd ScottTodd merged commit f7f930d into iree-org:main Jul 11, 2024
55 checks passed
@ScottTodd ScottTodd deleted the loop-test-flake branch July 11, 2024 00:59
loop, wait_source, iree_make_timeout_ms(200),
loop, wait_source, iree_make_timeout_ms(2000),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/iree-org/iree/actions/runs/9883115135/job/27297254957#step:9:95 oh, should also change WaitAllBlocking (and maybe other test cases? I'll check)

[ RUN      ] LoopTest.WaitAllBlocking
iree/runtime/src/iree/base/loop_test.h:962: Failure
Value of: status
Expected: error code OK
  Actual: 0x4, whose error code is DEADLINE_EXCEEDED: DEADLINE_EXCEEDED

[  FAILED  ] LoopTest.WaitAllBlocking (202 ms)
[----------] 25 tests from LoopTest (893 ms total)

[----------] 1 test from LoopInlineTest
[ RUN      ] LoopInlineTest.ExternalStorage
[       OK ] LoopInlineTest.ExternalStorage (0 ms)
[----------] 1 test from LoopInlineTest (0 ms total)

[----------] Global test environment tear-down
[==========] 26 tests from 2 test suites ran. (893 ms total)
[  PASSED  ] 25 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] LoopTest.WaitAllBlocking

 1 FAILED TEST

ScottTodd added a commit that referenced this pull request Jul 11, 2024
Missed these in #17857 since I only
saw `WaitOneBlocking` flake on CI. The other test cases can flake too.
saienduri pushed a commit to saienduri/iree that referenced this pull request Jul 12, 2024
…#17859)

This test has flaked a few times:

*
https://github.com/iree-org/iree/actions/runs/9653016869/job/26624436852#step:5:1882
*
https://github.com/iree-org/iree/actions/runs/9716312176/job/26819673196#step:4:18786

```
[ RUN      ] ScopeTest.WaitIdleFailure
/work/runtime/src/iree/task/scope_test.cc:225: Failure
Value of: iree_task_scope_is_idle(&scope)
  Actual: true
Expected: false

[  FAILED  ] ScopeTest.WaitIdleFailure (175 ms)
[----------] 9 tests from ScopeTest (326 ms total)

[----------] Global test environment tear-down
[==========] 9 tests from 1 test suite ran. (326 ms total)
[  PASSED  ] 8 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ScopeTest.WaitIdleFailure

 1 FAILED TEST
```

We could instead drop the
`EXPECT_FALSE(iree_task_scope_is_idle(&scope));` check entirely.
Increasing the sleep time will increase test time regardless of OS
scheduler behavior (unlike iree-org#17857
which increased a _timeout_ duration).
saienduri pushed a commit to saienduri/iree that referenced this pull request Jul 12, 2024
This test case has failed a few times on macos_x86, and possibly on
other platforms too. I suspect the OS isn't scheduling the test threads
soon enough - there's only a 150 millisecond window between the event
set on a worker thread and `iree_loop_wait_one`'s timeout on the main
thread:

```
main thread                worker thread
-----------                -------------
test start
spin up thread   ------>   wait 50ms
wait 200ms                 ...
...              <------   set event
timeout if event not set
```

Sample logs:
*
https://github.com/iree-org/iree/actions/runs/9214985535/job/25352380335#step:10:1578
*
https://github.com/iree-org/iree/actions/runs/9882364677/job/27295096340?pr=17856#step:9:43

```
[ RUN      ] LoopTest.WaitOneBlocking
iree/runtime/src/iree/base/loop_test.h:612: Failure
Value of: status
Expected: error code OK
  Actual: 0x4, whose error code is DEADLINE_EXCEEDED: DEADLINE_EXCEEDED

[  FAILED  ] LoopTest.WaitOneBlocking (200 ms)
```

We could also use an infinite timeout.
saienduri pushed a commit to saienduri/iree that referenced this pull request Jul 12, 2024
Missed these in iree-org#17857 since I only
saw `WaitOneBlocking` flake on CI. The other test cases can flake too.
saienduri pushed a commit to saienduri/iree that referenced this pull request Jul 12, 2024
…#17859)

This test has flaked a few times:

*
https://github.com/iree-org/iree/actions/runs/9653016869/job/26624436852#step:5:1882
*
https://github.com/iree-org/iree/actions/runs/9716312176/job/26819673196#step:4:18786

```
[ RUN      ] ScopeTest.WaitIdleFailure
/work/runtime/src/iree/task/scope_test.cc:225: Failure
Value of: iree_task_scope_is_idle(&scope)
  Actual: true
Expected: false

[  FAILED  ] ScopeTest.WaitIdleFailure (175 ms)
[----------] 9 tests from ScopeTest (326 ms total)

[----------] Global test environment tear-down
[==========] 9 tests from 1 test suite ran. (326 ms total)
[  PASSED  ] 8 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ScopeTest.WaitIdleFailure

 1 FAILED TEST
```

We could instead drop the
`EXPECT_FALSE(iree_task_scope_is_idle(&scope));` check entirely.
Increasing the sleep time will increase test time regardless of OS
scheduler behavior (unlike iree-org#17857
which increased a _timeout_ duration).

Signed-off-by: saienduri <saimanas.enduri@amd.com>
saienduri pushed a commit to saienduri/iree that referenced this pull request Jul 12, 2024
This test case has failed a few times on macos_x86, and possibly on
other platforms too. I suspect the OS isn't scheduling the test threads
soon enough - there's only a 150 millisecond window between the event
set on a worker thread and `iree_loop_wait_one`'s timeout on the main
thread:

```
main thread                worker thread
-----------                -------------
test start
spin up thread   ------>   wait 50ms
wait 200ms                 ...
...              <------   set event
timeout if event not set
```

Sample logs:
*
https://github.com/iree-org/iree/actions/runs/9214985535/job/25352380335#step:10:1578
*
https://github.com/iree-org/iree/actions/runs/9882364677/job/27295096340?pr=17856#step:9:43

```
[ RUN      ] LoopTest.WaitOneBlocking
iree/runtime/src/iree/base/loop_test.h:612: Failure
Value of: status
Expected: error code OK
  Actual: 0x4, whose error code is DEADLINE_EXCEEDED: DEADLINE_EXCEEDED

[  FAILED  ] LoopTest.WaitOneBlocking (200 ms)
```

We could also use an infinite timeout.

Signed-off-by: saienduri <saimanas.enduri@amd.com>
saienduri pushed a commit to saienduri/iree that referenced this pull request Jul 12, 2024
Missed these in iree-org#17857 since I only
saw `WaitOneBlocking` flake on CI. The other test cases can flake too.

Signed-off-by: saienduri <saimanas.enduri@amd.com>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
…#17859)

This test has flaked a few times:

*
https://github.com/iree-org/iree/actions/runs/9653016869/job/26624436852#step:5:1882
*
https://github.com/iree-org/iree/actions/runs/9716312176/job/26819673196#step:4:18786

```
[ RUN      ] ScopeTest.WaitIdleFailure
/work/runtime/src/iree/task/scope_test.cc:225: Failure
Value of: iree_task_scope_is_idle(&scope)
  Actual: true
Expected: false

[  FAILED  ] ScopeTest.WaitIdleFailure (175 ms)
[----------] 9 tests from ScopeTest (326 ms total)

[----------] Global test environment tear-down
[==========] 9 tests from 1 test suite ran. (326 ms total)
[  PASSED  ] 8 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ScopeTest.WaitIdleFailure

 1 FAILED TEST
```

We could instead drop the
`EXPECT_FALSE(iree_task_scope_is_idle(&scope));` check entirely.
Increasing the sleep time will increase test time regardless of OS
scheduler behavior (unlike iree-org#17857
which increased a _timeout_ duration).

Signed-off-by: Lubo Litchev <lubol@google.com>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
This test case has failed a few times on macos_x86, and possibly on
other platforms too. I suspect the OS isn't scheduling the test threads
soon enough - there's only a 150 millisecond window between the event
set on a worker thread and `iree_loop_wait_one`'s timeout on the main
thread:

```
main thread                worker thread
-----------                -------------
test start
spin up thread   ------>   wait 50ms
wait 200ms                 ...
...              <------   set event
timeout if event not set
```

Sample logs:
*
https://github.com/iree-org/iree/actions/runs/9214985535/job/25352380335#step:10:1578
*
https://github.com/iree-org/iree/actions/runs/9882364677/job/27295096340?pr=17856#step:9:43

```
[ RUN      ] LoopTest.WaitOneBlocking
iree/runtime/src/iree/base/loop_test.h:612: Failure
Value of: status
Expected: error code OK
  Actual: 0x4, whose error code is DEADLINE_EXCEEDED: DEADLINE_EXCEEDED

[  FAILED  ] LoopTest.WaitOneBlocking (200 ms)
```

We could also use an infinite timeout.

Signed-off-by: Lubo Litchev <lubol@google.com>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
Missed these in iree-org#17857 since I only
saw `WaitOneBlocking` flake on CI. The other test cases can flake too.

Signed-off-by: Lubo Litchev <lubol@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
runtime Relating to the IREE runtime library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants