Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent codec-switching test failure on Edge #6458

Closed
joeyparrish opened this issue Apr 19, 2024 · 10 comments · Fixed by #6466 or #5987
Closed

Consistent codec-switching test failure on Edge #6458

joeyparrish opened this issue Apr 19, 2024 · 10 comments · Fixed by #6466 or #5987
Assignees
Labels
browser: Edge Issues affecting Microsoft Edge (any version) component: tests The issue involves our automated tests (generally; otherwise use a more specific component) priority: P2 Smaller impact or easy workaround type: bug Something isn't working correctly
Milestone

Comments

@joeyparrish
Copy link
Member

Have you read the FAQ and checked for duplicate open issues?
Yes

If the problem is related to FairPlay, have you read the tutorial?

N/A

What version of Shaka Player are you using?

main

Can you reproduce the issue with our latest release version?
N/A

Can you reproduce the issue with the latest code from main?
yes

Are you using the demo app or your own custom app?
N/A

If custom app, can you reproduce the issue using our demo app?
N/A

What browser and OS are you using?
Microsoft Edge 123 on Windows

For embedded devices (smart TVs, etc.), what model and firmware version are you using?
N/A

What are the manifest and license server URIs?

N/A

What configuration are you using? What is the output of player.getConfiguration()?

N/A

What did you do?

Run standard CI tests on a PR.

What did you expect to happen?
All tests pass on all platforms

What actually happened?

Test runs on PRs have been failing lately, with a fairly consistent failure in codec-switching tests. For example:

Edge 123.0.0.0 (Windows 10): Executed 2618 of 2640 (1 FAILED) (skipped 22) (11 mins 20.341 secs / 11 mins 14.271 secs)
TOTAL: 1 FAILED, 2617 SUCCESS


1) can switch codecs RELOAD
     Codec Switching for audio
     Error: Failed: (video:1) did not expect update to be scheduled
    at <Jasmine>
    at Function.jasmineAssert (test/test/boot.js:35:7 <- test/test/boot.js:41:7)
    at _class.scheduleUpdate_ (lib/media/streaming_engine.js:2594:18 <- lib/media/streaming_engine.js:27322:20)
    at _class._callee9$ (lib/media/streaming_engine.js:1664:12 <- lib/media/streaming_engine.js:25900:22)
    at tryCatch (node_modules/@babel/polyfill/dist/polyfill.js:6473:40)
     Error: Failed: (video:1) unexpected call to onUpdate_()
    at <Jasmine>
    at Function.jasmineAssert (test/test/boot.js:35:7 <- test/test/boot.js:41:7)
    at _class._callee7$ (lib/media/streaming_engine.js:1115:18 <- lib/media/streaming_engine.js:25013:30)
    at tryCatch (node_modules/@babel/polyfill/dist/polyfill.js:6473:40)
    at Generator.invoke [as _invoke] (node_modules/@babel/polyfill/dist/polyfill.js:6702:22)

(From https://github.com/shaka-project/shaka-player/actions/runs/8740237552/job/23990010975)

@joeyparrish joeyparrish added type: bug Something isn't working correctly priority: P2 Smaller impact or easy workaround browser: Edge Issues affecting Microsoft Edge (any version) component: tests The issue involves our automated tests (generally; otherwise use a more specific component) labels Apr 19, 2024
@joeyparrish joeyparrish self-assigned this Apr 19, 2024
@joeyparrish
Copy link
Member Author

Consistent failures are seen in GitHub CI. Running tests in the lab with --browsers Edge --filter 'Codec Switching' executed 2 tests, both of which passed. I'm now re-running tests without the filter to see if the failure can be reproduced only in a full run. If they pass without the filter, the issue may be specific to GitHub's VMs (or slow execution in general).

@shaka-bot shaka-bot added this to the v4.8 milestone Apr 19, 2024
@joeyparrish
Copy link
Member Author

A full test run in the lab passed.

The first example of this failure I can spot in the GitHub Actions history is this run testing #6387:

https://github.com/shaka-project/shaka-player/actions/runs/8520107563/job/23394569800

There might be an older one, but I haven't found it. The PR isn't obviously related to the failure.

@joeyparrish
Copy link
Member Author

A slightly older run of the same PR also shows this failure:

https://github.com/shaka-project/shaka-player/actions/runs/8519264955/job/23333780164

The overall run was cancelled, but the logs for Edge show the same failure before the cancelation.

@joeyparrish
Copy link
Member Author

The same failure shows up in some nightly test runs, but not all.

Last night it passed after 2 retries:

https://github.com/shaka-project/shaka-player/actions/runs/8751808406

The night before it failed every time:

https://github.com/shaka-project/shaka-player/actions/runs/8736013350/job/23971587081

Before the pass, it failed 3 nights in a row. There's another long streak of failures before that.

I may have been unlucky to have the test pass this morning in my manual run.

I will try more repeated runs with a filter, more repeated runs without a filter, and runs with all Windows browsers running at once to slow down the system (as happens in our nightly runs).

@joeyparrish
Copy link
Member Author

Repeated test runs produced additional failures that may be unrelated flakiness.


With --browsers Edge --filter 'Codec Switching' --runs 20:

Running tests on: Edge
Edge 123.0.0.0 (Windows 10) Codec Switching for audio can switch codecs RELOAD FAILED
	Error: Timeout waiting for movement from 10.012688 to 10
	current time: 10.012839
	duration: 60
	ready state: 0
	playback rate: 1
	paused: true
	ended: false
	buffered: {"total":[],"audio":[],"video":[],"text":[]}
	
	Error: Timeout waiting for movement from 10.012688 to 10
	current time: 10.012839
	duration: 60
	ready state: 0
	playback rate: 1
	paused: true
	ended: false
	buffered: {"total":[],"audio":[],"video":[],"text":[]}
	    at _class.waitUntilGeneric_ (test/test/util/waiter.js:339:19 <- test/test/util/waiter.js:367:19)
	    at _class.waitUntilPlayheadReaches (test/test/util/waiter.js:133:17 <- test/test/util/waiter.js:152:19)
	    at _class.waitUntilPlayheadReachesOrFailOnTimeout (test/test/util/waiter.js:148:17 <- test/test/util/waiter.js:169:19)
	    at _callee6$ (test/codec_switching/codec_switching_integration.js:139:20 <- test/codec_switching/codec_switching_integration.js:213:29)
	    at tryCatch (node_modules/@babel/polyfill/dist/polyfill.js:6473:40)
	    at Generator.invoke [as _invoke] (node_modules/@babel/polyfill/dist/polyfill.js:6702:22)
	    at prototype.<computed> [as next] (node_modules/@babel/polyfill/dist/polyfill.js:6525:21)
	    at asyncGeneratorStep (test/ui/ui_integration.js:17:103)
	    at _next (test/codec_switching/codec_switching_integration.js:4:194)
Edge 123.0.0.0 (Windows 10): Executed 2 of 2640 (1 FAILED) (skipped 2638) (47.928 secs / 47.347 secs)
TOTAL: 1 FAILED, 1 SUCCESS
[INFO] Running test (11 / 20, 1 failed so far)...

With --browsers ChromeWindows Edge FirefoxWindows --filter 'Codec Switching' --runs 20:

Running tests on: ChromeWindows, Edge, FirefoxWindows
Edge 123.0.0.0 (Windows 10) Codec Switching for audio and only-audio content can switch codecs RELOAD FAILED
        Uncaught shaka.util.Error {
          "severity": 2,
	  "category": 7,
	  "code": 7003,
	  "data": [
	    null
	  ],
	  "handled": false,
	  "message": "Shaka Error 7003",
	  "stack": "Error: Shaka Error 7003\n    at new R (dist/shaka-player.ui.js:79:282)\n    at yg (dist/shaka-player.ui.js:212:770)\n    at eval (dist/shaka-player.ui.js:381:74)\n    at HTMLVideoElement.e (dist/shaka-player.ui.js:124:1369)"
	} thrown
	Failed: Unhandled error: shaka.util.Error {
	  "severity": 2,
	  "category": 7,
	  "code": 7003,
	  "data": [
	    null
	  ],
	  "handled": false,
	  "message": "Shaka Error 7003",
	  "stack": "Error: Shaka Error 7003\n    at new R (dist/shaka-player.ui.js:79:282)\n    at yg (dist/shaka-player.ui.js:212:770)\n    at eval (dist/shaka-player.ui.js:381:74)\n    at HTMLVideoElement.e (dist/shaka-player.ui.js:124:1369)"
	}
	Error: Shaka Error 7003
	    at new R (dist/shaka-player.ui.js:79:282)
	    at yg (dist/shaka-player.ui.js:212:770)
	    at eval (dist/shaka-player.ui.js:381:74)
	    at HTMLVideoElement.e (dist/shaka-player.ui.js:124:1369)
	Error: Failed: Unhandled error: shaka.util.Error {
	  "severity": 2,
	  "category": 7,
	  "code": 7003,
	  "data": [
	    null
	  ],
	  "handled": false,
	  "message": "Shaka Error 7003",
	  "stack": "Error: Shaka Error 7003\n    at new R (dist/shaka-player.ui.js:79:282)\n    at yg (dist/shaka-player.ui.js:212:770)\n    at eval (dist/shaka-player.ui.js:381:74)\n    at HTMLVideoElement.e (dist/shaka-player.ui.js:124:1369)"
	}
	Error: Shaka Error 7003
	    at new R (dist/shaka-player.ui.js:79:282)
	    at yg (dist/shaka-player.ui.js:212:770)
	    at eval (dist/shaka-player.ui.js:381:74)
	    at HTMLVideoElement.e (dist/shaka-player.ui.js:124:1369)
	    at <Jasmine>
	    at failOnError (test/test/boot.js:82:3 <- test/test/boot.js:86:3)
	    at test/test/boot.js:102:5 <- test/test/boot.js:106:5
Edge 123.0.0.0 (Windows 10): Executed 2 of 2640 (1 FAILED) (skipped 2638) (3.478 secs / 2.428 secs)
Chrome 123.0.0.0 (Windows 10): Executed 4 of 2640 (skipped 2636) SUCCESS (5.341 secs / 4.754 secs)
Firefox 122.0 (Windows 10): Executed 4 of 2640 (skipped 2636) SUCCESS (5.727 secs / 5.254 secs)
TOTAL: 1 FAILED, 9 SUCCESS

@joeyparrish
Copy link
Member Author

Running with --filter, all Windows browsers, and --uncompiled, I see the codec-switching test failure 20/20 times on Edge. And due to --filter, the tests complete in about 6 seconds, so it should be easy enough to rapidly iterate and debug.

I see now that this test failure is actually an internal assertion failure in StreamingEngine. The reason it fails on Edge specifically in our test runs is that we use --uncompiled on Edge to get code coverage stats. Assertions are removed in the compiled bundle, so doing integration testing based on that will never trigger the assertion.

The error 7003 from above seems related. In an uncompiled run, error 7003 appears some of the time (4/20), but only after the assertion fails.

@joeyparrish
Copy link
Member Author

I discovered that compiled mode masked another problem with these tests. They used the wrong config path streaming.mediaSource.codecSwitchingStrategy instead of mediaSource.codecSwitchingStrategy, but the error log for this only shows up in uncompiled mode. (It's also not an error if it happens during a test.)

So these tests did not test the actual strategies they intended to.

@joeyparrish
Copy link
Member Author

With the configs fixed, I get assertion failures on three browsers on Windows during the RELOAD test. This shows that the RELOAD test was actually using the SMOOTH strategy before the fix.

joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 19, 2024
@joeyparrish
Copy link
Member Author

After enabling stack traces for Shaka error objects in the test run, I see the 7003 error (OBJECT_DESTROYED) derives from MediaSourceEngine, where destroyer_.ensureNotDestroyed() is called from an event listener instead of an async function. This results in an uncaught exception if MSE is destroyed before canplaythrough fires.

joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 19, 2024
joeyparrish added a commit that referenced this issue Apr 19, 2024
The tests were not testing what they were supposed to because their
configs were invalid and being ignored.

Related to #6458
@joeyparrish
Copy link
Member Author

I found some code in StreamingEngine around reloading MediaSource that appears to be to blame for breaking state tracking in StreamingEngine. Testing a fix now.

joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 19, 2024
joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 19, 2024
The tests for MediaSourceEngine codec switching were written to ignore types
and suppress access controls.  The were unreadable, too, with very little
whitespace, confusing one-letter variable names, and difficult-to-follow event
mocking.  This made it more difficult to debug test failures in PR shaka-project#6460.

This rewrites the tests in a more readable manner with compiler enforcement of
types in the tests.  Two helper functions are used to isolate the necessary
access-control suppressions.

This exposed a bug in the tests, in which one test case (preserve SourceBuffer
attributes) only passed because the original version failed to await on an
async process.  I am not sure that the functionality in that test exists at
that level.  For now, the test is disabled.  I'll follow up with removal after
more investigation.

Related to shaka-project#6458, shaka-project#6460
joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 19, 2024
The tests for MediaSourceEngine codec switching were written to ignore types
and suppress access controls.  The were unreadable, too, with very little
whitespace, confusing one-letter variable names, and difficult-to-follow event
mocking.  This made it more difficult to debug test failures in PR shaka-project#6460.

This rewrites the tests in a more readable manner with compiler enforcement of
types in the tests.  Two helper functions are used to isolate the necessary
access-control suppressions.

This exposed a bug in the tests, in which one test case (preserve SourceBuffer
attributes) only passed because the original version failed to await on an
async process.  I am not sure that the functionality in that test exists at
that level.  For now, the test is disabled.  I'll follow up with removal after
more investigation.

Related to shaka-project#6458, shaka-project#6460
avelad pushed a commit that referenced this issue Apr 22, 2024
The tests for MediaSourceEngine codec switching were written to ignore
types and suppress access controls. The were unreadable, too, with very
little whitespace, confusing one-letter variable names, and
difficult-to-follow event mocking. This made it more difficult to debug
test failures in PR #6460.

This rewrites the tests in a more readable manner with compiler
enforcement of types in the tests. Two helper functions are used to
isolate the necessary access-control suppressions.

This exposed a bug in the tests, in which one test case (preserve
SourceBuffer attributes) only passed because the original version failed
to await on an async process. I am not sure that the functionality in
that test exists at that level. For now, the test is disabled. I'll
follow up with removal after more investigation.

Related to #6458, #6460
joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 22, 2024
MediaSourceEngine does not, in fact, preserve SourceBuffer properties
when we reset it to switch codecs.  This is handled by StreamingEngine
instead, through a follow-up call to setStreamProperties.

The test only ever passed as originally written because there was a
missing `await` on an async reset process.

This removes the bogus test.

Related to shaka-project#6458, shaka-project#6462
joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 22, 2024
joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 22, 2024
The tests for MediaSourceEngine codec switching were written to ignore types
and suppress access controls.  The were unreadable, too, with very little
whitespace, confusing one-letter variable names, and difficult-to-follow event
mocking.

This rewrites the tests in a more readable manner with compiler enforcement of
types in the tests.  Two helper functions are used to isolate the necessary
access-control suppressions.

This exposed a bug in the tests, in which one test case (preserve SourceBuffer
attributes) only passed because the original version failed to await on an
async process.  I am not sure that the functionality in that test exists at
that level.  For now, the test is disabled.  I'll follow up with removal after
more investigation.

Related to shaka-project#6458
joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 22, 2024
When reloading MediaSourceEngine via StreamingEngine, some logic executed
before the reload cancels operations and aligns StreamingEngine state with
MediaSourceEngine.

However, additional logic was executed after setStreamProperties, which caused
a race condition that broke StreamingEngine state by manipulating it out of the
usual sequence and outside the usual methods, leading to exceptions and failed
assertions.  This removes that unnecessary logic and leaves subtle state
management to the functions that are meant to do it.

Closes shaka-project#6458
joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Apr 22, 2024
When reloading MediaSourceEngine via StreamingEngine, some logic executed
before the reload cancels operations and aligns StreamingEngine state with
MediaSourceEngine.

However, additional logic was executed after setStreamProperties, which caused
a race condition that broke StreamingEngine state by manipulating it out of the
usual sequence and outside the usual methods, leading to exceptions and failed
assertions.  This removes that unnecessary logic and leaves subtle state
management to the functions that are meant to do it.

Closes shaka-project#6458
joeyparrish added a commit that referenced this issue Apr 22, 2024
MediaSourceEngine does not, in fact, preserve SourceBuffer properties
when we reset it to switch codecs. This is handled by StreamingEngine
instead, through a follow-up call to setStreamProperties.

The test only ever passed as originally written because there was a
missing `await` on an async reset process.

This removes the bogus test.

Related to #6458, #6462
joeyparrish added a commit that referenced this issue Apr 22, 2024
When reloading MediaSourceEngine via StreamingEngine, some logic executed before the reload cancels operations and aligns StreamingEngine state with MediaSourceEngine.

However, additional logic was executed after setStreamProperties, which caused a race condition that broke StreamingEngine state by manipulating it out of the usual sequence and outside the usual methods, leading to exceptions and failed assertions.  This removes that unnecessary logic and leaves subtle state management to the functions that are meant to do it.

Closes #6458
joeyparrish added a commit that referenced this issue May 7, 2024
The tests were not testing what they were supposed to because their
configs were invalid and being ignored.

Related to #6458
joeyparrish added a commit that referenced this issue May 7, 2024
The tests for MediaSourceEngine codec switching were written to ignore
types and suppress access controls. The were unreadable, too, with very
little whitespace, confusing one-letter variable names, and
difficult-to-follow event mocking. This made it more difficult to debug
test failures in PR #6460.

This rewrites the tests in a more readable manner with compiler
enforcement of types in the tests. Two helper functions are used to
isolate the necessary access-control suppressions.

This exposed a bug in the tests, in which one test case (preserve
SourceBuffer attributes) only passed because the original version failed
to await on an async process. I am not sure that the functionality in
that test exists at that level. For now, the test is disabled. I'll
follow up with removal after more investigation.

Related to #6458, #6460
joeyparrish added a commit that referenced this issue May 7, 2024
MediaSourceEngine does not, in fact, preserve SourceBuffer properties
when we reset it to switch codecs. This is handled by StreamingEngine
instead, through a follow-up call to setStreamProperties.

The test only ever passed as originally written because there was a
missing `await` on an async reset process.

This removes the bogus test.

Related to #6458, #6462
joeyparrish added a commit that referenced this issue May 7, 2024
When reloading MediaSourceEngine via StreamingEngine, some logic executed before the reload cancels operations and aligns StreamingEngine state with MediaSourceEngine.

However, additional logic was executed after setStreamProperties, which caused a race condition that broke StreamingEngine state by manipulating it out of the usual sequence and outside the usual methods, leading to exceptions and failed assertions.  This removes that unnecessary logic and leaves subtle state management to the functions that are meant to do it.

Closes #6458
joeyparrish added a commit that referenced this issue May 7, 2024
The tests were not testing what they were supposed to because their
configs were invalid and being ignored.

Related to #6458
joeyparrish added a commit that referenced this issue May 7, 2024
The tests for MediaSourceEngine codec switching were written to ignore
types and suppress access controls. The were unreadable, too, with very
little whitespace, confusing one-letter variable names, and
difficult-to-follow event mocking. This made it more difficult to debug
test failures in PR #6460.

This rewrites the tests in a more readable manner with compiler
enforcement of types in the tests. Two helper functions are used to
isolate the necessary access-control suppressions.

This exposed a bug in the tests, in which one test case (preserve
SourceBuffer attributes) only passed because the original version failed
to await on an async process. I am not sure that the functionality in
that test exists at that level. For now, the test is disabled. I'll
follow up with removal after more investigation.

Related to #6458, #6460
joeyparrish added a commit that referenced this issue May 7, 2024
MediaSourceEngine does not, in fact, preserve SourceBuffer properties
when we reset it to switch codecs. This is handled by StreamingEngine
instead, through a follow-up call to setStreamProperties.

The test only ever passed as originally written because there was a
missing `await` on an async reset process.

This removes the bogus test.

Related to #6458, #6462
joeyparrish added a commit that referenced this issue May 7, 2024
When reloading MediaSourceEngine via StreamingEngine, some logic executed before the reload cancels operations and aligns StreamingEngine state with MediaSourceEngine.

However, additional logic was executed after setStreamProperties, which caused a race condition that broke StreamingEngine state by manipulating it out of the usual sequence and outside the usual methods, leading to exceptions and failed assertions.  This removes that unnecessary logic and leaves subtle state management to the functions that are meant to do it.

Closes #6458

Backported to v4.6.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
browser: Edge Issues affecting Microsoft Edge (any version) component: tests The issue involves our automated tests (generally; otherwise use a more specific component) priority: P2 Smaller impact or easy workaround type: bug Something isn't working correctly
Projects
None yet
2 participants