Skip to content

Fix flaky test: no telemetry by default in test suite#1203

Merged
richardm-stripe merged 1 commit intomasterfrom
richardm-fix-gha-tests
May 6, 2021
Merged

Fix flaky test: no telemetry by default in test suite#1203
richardm-stripe merged 1 commit intomasterfrom
richardm-fix-gha-tests

Conversation

@richardm-stripe
Copy link
Contributor

@richardm-stripe richardm-stripe commented May 5, 2021

Our CI in github actions has ~consistently been failing tests like this. This hasn't been consistent across time, and it doesn't happen in Travis CI (which we are trying to deprecate).

This happens in RequestTelemetry.java, which in my opinion shouldn't be enabled when we are running tests.

This disables telemetry throughout tests, which causes CI to pass.

I do want to better understand the cause of why the NullPointerException is triggering in this environment but not locally, but have not yet identified it.

Copy link
Contributor

@remi-stripe remi-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with the change but that means we could break telemetry and not notice right? Or do we have enough coverage in tests specifically for this?

@richardm-stripe
Copy link
Contributor Author

Good point, we have some tests that explicitly test enabled/disabled Telemetry. Still, I don't really want to merge this until I actually understand the error.

@richardm-stripe richardm-stripe marked this pull request as draft May 6, 2021 03:17
@richardm-stripe
Copy link
Contributor Author

richardm-stripe commented May 6, 2021

Ok, I now understand what is going on.

In RequestTelemetry.java there is a piece of global state:

private static ConcurrentLinkedQueue<RequestMetrics> prevRequestMetrics

which can affect the behavior of RequestMetrics.getHeaderValue. Sometimes it returns empty, sometimes not, depending on previous/concurrent requests that have been made throughout the test suite. (This is why I can never reproduce the failure running the test in isolation).

When it's not empty, then requestWithTelemetry adds a header to the request argument before passing it to this.request.

This prevents the Mockito matcher from applying, and so the mocked client just returns null instead of providing a response. Attempting to call a method on the null response triggers the null pointer exception.


Given the role of Mockito, this does not represent an issue that could arise production. Disabling Telemetry by default for requests is a reasonable solution, because that removes the global state/non-determinism here.

@richardm-stripe richardm-stripe force-pushed the richardm-fix-gha-tests branch from 0ee5600 to 924c262 Compare May 6, 2021 13:57
@richardm-stripe richardm-stripe marked this pull request as ready for review May 6, 2021 13:57
@richardm-stripe richardm-stripe changed the title [WIP] Disable telemetry in tests to fix github actions Fix flaky test: no telemetry by default in test suite May 6, 2021
@richardm-stripe richardm-stripe merged commit fd91f7f into master May 6, 2021
@richardm-stripe richardm-stripe deleted the richardm-fix-gha-tests branch May 6, 2021 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants