Response time for ping test on netprobe quantiles metrics increases as time goes by

When an agent with netprobe tap is running a netprobe policy, the response time for "ping" test increases as time goes by and it is weird.

I have two agents running and both have similar behavior:

![image.png](https://images.zenhubusercontent.com/61aa14a937730b887dff82b3/61226710-3e31-46cf-9177-dc32a9877330)

I have another agent, on the same machine, that starts after the 2 mentioned.. the pattern looks similar, but the response time is different:

![image](https://user-images.githubusercontent.com/78241475/204901207-87ba2bf8-d5ee-4adf-85ea-24bb92502549.png)


`/api/v1/policies/policy-probe/metrics/prometheus` data:

[agent1_policy-probe.txt](https://app.zenhub.com/files/241899022/acb9f8a6-8bad-42c3-813c-1994ace116c3/download)


[agent0_policy-probe.txt](https://app.zenhub.com/files/241899022/d0269083-73fb-4f5f-8802-a126746fdf17/download)


`/api/v1/policies/__all/metrics/prometheus` data:


[agent1_all.txt](https://app.zenhub.com/files/241899022/2a15662e-f29c-475d-95fe-a82d8b76f7ed/download)

[agent0_all.txt](https://app.zenhub.com/files/241899022/acaf82b6-36bf-4e02-baad-3c2fd9fba234/download)


To reproduce:

Agent tap:

```
  taps:
    default_netprobe:
      input_type: netprobe
      config:
        test_type: ping
      tags:
        virtual: true
        vhost: 1
```

Policy:

```
handlers:
  modules:
    default_netprobe:
      type: netprobe
      metric_groups:
        enable:
          - counters
          - quantiles
input:
  tap: default_netprobe
  input_type: netprobe
  config:
    test_type: ping
    packets_per_test: 5
    interval_msec: 2000
    timeout_msec: 1000
    packets_interval_msec: 10
    packet_payload_size: 56
    targets:
      www.google.com:
        target: www.google.com
      'orb live':
        target: orb.live
      orb_community:
        target: orb.community       
kind: collection
```

grafana query: `avg(netprobe_response_quantiles_us{quantile="0.5"}) by (agent,target, policy)`

obs: i'm usign the query filtering the 'orb live' data because it have another problem so..(`avg(netprobe_response_quantiles_us{quantile="0.5", target!="orb live"}) by (agent,target, policy)`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Response time for ping test on netprobe quantiles metrics increases as time goes by #554

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Response time for ping test on netprobe quantiles metrics increases as time goes by #554

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions