Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response_classification tests are flaky in CI #887

Closed
olix0r opened this issue May 1, 2018 · 4 comments
Closed

response_classification tests are flaky in CI #887

olix0r opened this issue May 1, 2018 · 4 comments

Comments

@olix0r
Copy link
Member

olix0r commented May 1, 2018

I suspect this 500 is caused by a networking error, etc.

---- response_classification::inbound_http stdout ----
	Http2 server running; addr=127.0.0.1:34165
proxy running; control=127.0.0.1:44213, inbound=127.0.0.1:40891 (SO_ORIGINAL_DST=127.0.0.1:34165), outbound=127.0.0.1:37157, metrics=127.0.0.1:36213
thread 'response_classification::inbound_http' panicked at 'assertion failed after 335.760594ms (retried 5 times): metrics scrape:
# HELP request_total Total count of HTTP requests.
# TYPE request_total counter
request_total{authority="tele.test.svc.cluster.local",direction="inbound"} 6
# HELP response_total Total count of HTTP responses
# TYPE response_total counter
response_total{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500"} 1
# HELP response_latency_ms Elapsed times between a request's headers being received and its response stream completing
# TYPE response_latency_ms histogram
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="10"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="20"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="30"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="40"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="50"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="100"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="200"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="300"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="400"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="500"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="1000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="2000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="3000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="4000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="5000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="10000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="20000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="30000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="40000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="50000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="100000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="200000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="300000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="400000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="500000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="+Inf"} 1
response_latency_ms_count{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500"} 1
response_latency_ms_sum{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500"} 1
# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1525201737
did not contain:
response_total{authority="tele.test.svc.cluster.local",direction="inbound",classification="success",status_code="200"} 1', proxy/tests/telemetry.rs:223:17
server Listening dropped; addr=127.0.0.1:34165
failures:
    response_classification::inbound_http
test result: FAILED. 9 passed; 1 failed; 21 ignored; 0 measured; 0 filtered out
error: test failed, to rerun pass '--test telemetry'
@seanmonstar
Copy link
Contributor

I'd assume the 500 response would mean an error! log was emitted. That should be visible in the test output.

@hawkw
Copy link
Member

hawkw commented May 2, 2018

huh, I just saw the same tests fail on my dev machine, so it's not only flaky on CI. interestingly, I didn't see any error! log lines while running the telemetry test executable (I did see some of the expected error! lines from the discovery tests, so I don't think they were being silenced somehow).

Test output:
     Running target/debug/deps/telemetry-47091b9088284619

running 31 tests
test metrics_endpoint_inbound_request_count ... ok
test metrics_has_start_time ... ok
test metrics_endpoint_outbound_request_count ... ok
test metrics_have_no_double_commas ... ok
test outbound_dst_labels::multiple_addr_labels ... ok
test outbound_dst_labels::labeled_addr_and_addrset ... ok
test outbound_dst_labels::multiple_addrset_labels ... ok
test support::assert_eventually ... ok
test outbound_dst_labels::controller_updates_set_labels ... ok
test outbound_dst_labels::controller_updates_addr_labels ... ok
test transport::inbound_http_accept ... ok
test transport::inbound_tcp_connect ... ok
test transport::inbound_http_connect ... ok
test transport::inbound_tcp_accept ... ok
test transport::inbound_tcp_read_bytes_total ... ok
test transport::inbound_tcp_write_bytes_total ... ok
test transport::inbound_tcp_duration ... ok
test transport::outbound_http_connect ... ok
test transport::outbound_http_accept ... ok
test metrics_compression ... ok
test transport::outbound_http_tcp_open_connections ... ok
test transport::outbound_tcp_connect ... ok
test transport::outbound_tcp_accept ... ok
test transport::outbound_tcp_duration ... ok
test transport::outbound_tcp_read_bytes_total ... ok
test transport::outbound_tcp_open_connections ... ok
test transport::outbound_tcp_write_bytes_total ... ok
test response_classification::inbound_http ... FAILED
test response_classification::outbound_http ... FAILED
test metrics_endpoint_inbound_response_latency ... ok
test metrics_endpoint_outbound_response_latency ... ok

failures:

---- response_classification::inbound_http stdout ----
        Http2 server running; addr=127.0.0.1:53361
proxy running; control=127.0.0.1:53364, inbound=127.0.0.1:53365 (SO_ORIGINAL_DST=127.0.0.1:53361), outbound=127.0.0.1:53366, metrics=127.0.0.1:53367
thread 'response_classification::inbound_http' panicked at 'assertion failed after 196.247139ms (retried 5 times): metrics scrape:
# HELP request_total Total count of HTTP requests.
# TYPE request_total counter
request_total{authority="tele.test.svc.cluster.local",direction="inbound"} 6
# HELP response_total Total count of HTTP responses
# TYPE response_total counter
response_total{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500"} 1
# HELP response_latency_ms Elapsed times between a request's headers being received and its response stream completing
# TYPE response_latency_ms histogram
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="10"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="20"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="30"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="40"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="50"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="100"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="200"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="300"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="400"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="500"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="1000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="2000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="3000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="4000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="5000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="10000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="20000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="30000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="40000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="50000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="100000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="200000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="300000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="400000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="500000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500",le="+Inf"} 1
response_latency_ms_count{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500"} 1
response_latency_ms_sum{authority="tele.test.svc.cluster.local",direction="inbound",classification="failure",status_code="500"} 1
# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1525284238

did not contain:
response_total{authority="tele.test.svc.cluster.local",direction="inbound",classification="success",status_code="304"} 1', proxy/tests/telemetry.rs:223:17
server Listening dropped; addr=127.0.0.1:53361

---- response_classification::outbound_http stdout ----
        Http2 server running; addr=127.0.0.1:53370
proxy running; control=127.0.0.1:53373, inbound=127.0.0.1:53374, outbound=127.0.0.1:53375 (SO_ORIGINAL_DST=127.0.0.1:53370), metrics=127.0.0.1:53376
thread 'response_classification::outbound_http' panicked at 'assertion failed after 231.067226ms (retried 5 times): metrics scrape:
# HELP request_total Total count of HTTP requests.
# TYPE request_total counter
request_total{authority="tele.test.svc.cluster.local",direction="outbound"} 6
# HELP response_total Total count of HTTP responses
# TYPE response_total counter
response_total{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504"} 1
response_total{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500"} 1
# HELP response_latency_ms Elapsed times between a request's headers being received and its response stream completing
# TYPE response_latency_ms histogram
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="10"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="20"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="30"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="40"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="50"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="100"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="200"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="300"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="400"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="500"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="1000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="2000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="3000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="4000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="5000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="10000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="20000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="30000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="40000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="50000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="100000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="200000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="300000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="400000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="500000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504",le="+Inf"} 1
response_latency_ms_count{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504"} 1
response_latency_ms_sum{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="504"} 2
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="10"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="20"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="30"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="40"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="50"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="100"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="200"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="300"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="400"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="500"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="1000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="2000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="3000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="4000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="5000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="10000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="20000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="30000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="40000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="50000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="100000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="200000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="300000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="400000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="500000"} 1
response_latency_ms_bucket{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500",le="+Inf"} 1
response_latency_ms_count{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500"} 1
response_latency_ms_sum{authority="tele.test.svc.cluster.local",direction="outbound",classification="failure",status_code="500"} 2
# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1525284238

did not contain:
response_total{authority="tele.test.svc.cluster.local",direction="outbound",classification="success",status_code="200"} 1', proxy/tests/telemetry.rs:245:17
server Listening dropped; addr=127.0.0.1:53370


failures:
    response_classification::inbound_http
    response_classification::outbound_http

@stale
Copy link

stale bot commented Oct 7, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 7, 2018
@hawkw hawkw removed the wontfix label Oct 9, 2018
@stale
Copy link

stale bot commented Jan 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 7, 2019
@stale stale bot closed this as completed Jan 22, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants