Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OWAMP bucket_width parameter with negative effect on the measurements #1388

Closed
igarny opened this issue Jun 17, 2024 · 3 comments
Closed

OWAMP bucket_width parameter with negative effect on the measurements #1388

igarny opened this issue Jun 17, 2024 · 3 comments
Assignees

Comments

@igarny
Copy link

igarny commented Jun 17, 2024

Hi guys,

I am having this awkward observation.
When specifying the "bucket_width" parameter to the default value I see a distortion in the results of a powstream measurement

psmp-gn-mgmt-lon-uk:~$ pscheduler task latency --dest psmp-gn-owd-par-fr.geant.org --dest-node psmp-gn-mgmt-par-fr.geant.org --source psmp-gn-owd-lon-uk.geant.org --source-node psmp-gn-mgmt-lon-uk.geant.org --packet-count 100 --packet-interval 0.1 --packet-padding 0 --ip-version 4 --bucket-width 0.0001
Submitting task...
Task URL:
https://psmp-gn-mgmt-lon-uk.geant.org/pscheduler/tasks/a3089c1d-d3e4-409c-8021-c82491fd52e0
Running with tool 'owping'
Fetching first run...

Next scheduled run:
https://psmp-gn-mgmt-lon-uk.geant.org/pscheduler/tasks/a3089c1d-d3e4-409c-8021-c82491fd52e0/runs/54adacb5-aad7-4354-b97e-5bda68c4b38f
Starts 2024-06-12T08:56:59+00:00 (~3 seconds)
Ends   2024-06-12T08:57:21+00:00 (~21 seconds)
Waiting for result...

Packet Statistics
-----------------
Packets Sent ......... 100 packets
Packets Received ..... 100 packets
Packets Lost ......... 0 packets
Packets Duplicated ... 0 packets
Packets Reordered .... 0 packets

One-way Latency Statistics
--------------------------
Delay Median ......... 16.88 ms
Delay Minimum ........ 16.10 ms
Delay Maximum ........ 17.39 ms
Delay Mean ........... 16.88 ms
Delay Mode ........... 16.88 ms 16.89 ms
Delay 25th Percentile ... 16.75 ms
Delay 75th Percentile ... 17.16 ms
Delay 95th Percentile ... 17.32 ms
Max Clock Error ...... 3.04 ms
Common Jitter Measurements:
    P95 - P50 ........ 0.44 ms
    P75 - P25 ........ 0.41 ms
    Variance ......... 0.07 ms
    Std Deviation .... 0.27 ms
Histogram:
    16.10 ms: 1 packets
    16.39 ms: 2 packets
    16.44 ms: 2 packets
    16.45 ms: 1 packets
    16.46 ms: 1 packets
    16.48 ms: 1 packets
    16.49 ms: 1 packets
    16.51 ms: 1 packets
    16.52 ms: 2 packets
    16.53 ms: 4 packets
    16.54 ms: 2 packets
    16.55 ms: 1 packets
    16.56 ms: 2 packets
    16.58 ms: 1 packets
    16.68 ms: 1 packets
    16.72 ms: 1 packets
    16.75 ms: 2 packets
    16.76 ms: 2 packets
    16.77 ms: 4 packets
    16.78 ms: 1 packets
    16.81 ms: 2 packets
    16.82 ms: 1 packets
    16.83 ms: 1 packets
    16.85 ms: 1 packets
    16.86 ms: 4 packets
    16.87 ms: 3 packets
    16.88 ms: 9 packets
    16.89 ms: 9 packets
    16.91 ms: 3 packets
    16.92 ms: 1 packets
    16.93 ms: 1 packets
    16.97 ms: 2 packets
    17.06 ms: 1 packets
    17.09 ms: 1 packets
    17.12 ms: 1 packets
    17.14 ms: 2 packets
    17.17 ms: 1 packets
    17.18 ms: 3 packets
    17.19 ms: 1 packets
    17.20 ms: 5 packets
    17.21 ms: 1 packets
    17.22 ms: 5 packets
    17.24 ms: 3 packets
    17.27 ms: 1 packets
    17.32 ms: 2 packets
    17.35 ms: 1 packets
    17.37 ms: 1 packets
    17.39 ms: 1 packets

TTL Statistics
--------------
TTL Median ........... 252.00
TTL Minimum .......... 252.00
TTL Maximum .......... 252.00
TTL Mean ............. 252.00
TTL Mode ............. 252.00
TTL 25th Percentile ... 252.00
TTL 75th Percentile ... 252.00
TTL 95th Percentile ... 252.00
Histogram:
    252: 100 packets

If I remove the parameter and works, just fine, despite the fact, that the default value = the removed parameter is:


Without bucket-width

psmp-gn-mgmt-lon-uk:~$ pscheduler task latency --dest psmp-gn-owd-par-fr.geant.org --dest-node psmp-gn-mgmt-par-fr.geant.org --source psmp-gn-owd-lon-uk.geant.org --source-node psmp-gn-mgmt-lon-uk.geant.org --packet-count 100 --packet-interval 0.1 --packet-padding 0 --ip-version 4
Submitting task...
Task URL:
https://psmp-gn-mgmt-lon-uk.geant.org/pscheduler/tasks/039cf1a6-8bdc-40b1-a728-2058d45ff925
Running with tool 'owping'
Fetching first run...

Next scheduled run:
https://psmp-gn-mgmt-lon-uk.geant.org/pscheduler/tasks/039cf1a6-8bdc-40b1-a728-2058d45ff925/runs/1905a933-85a3-4c80-843c-de505a0bb55c
Starts 2024-06-12T09:17:40+00:00 (~2 seconds)
Ends   2024-06-12T09:18:02+00:00 (~21 seconds)
Waiting for result...

Packet Statistics
-----------------
Packets Sent ......... 100 packets
Packets Received ..... 100 packets
Packets Lost ......... 0 packets
Packets Duplicated ... 0 packets
Packets Reordered .... 0 packets

One-way Latency Statistics
--------------------------
Delay Median ......... 1.66 ms
Delay Minimum ........ 1.60 ms
Delay Maximum ........ 1.69 ms
Delay Mean ........... 1.66 ms
Delay Mode ........... 1.68 ms
Delay 25th Percentile ... 1.65 ms
Delay 75th Percentile ... 1.68 ms
Delay 95th Percentile ... 1.68 ms
Max Clock Error ...... 3.04 ms
Common Jitter Measurements:
    P95 - P50 ........ 0.02 ms
    P75 - P25 ........ 0.03 ms
    Variance ......... 0.00 ms
    Std Deviation .... 0.02 ms
Histogram:
    1.60 ms: 1 packets
    1.61 ms: 2 packets
    1.62 ms: 11 packets
    1.63 ms: 1 packets
    1.64 ms: 9 packets
    1.65 ms: 2 packets
    1.66 ms: 28 packets
    1.67 ms: 11 packets
    1.68 ms: 33 packets
    1.69 ms: 2 packets

TTL Statistics
--------------
TTL Median ........... 252.00
TTL Minimum .......... 252.00
TTL Maximum .......... 252.00
TTL Mean ............. 252.00
TTL Mode ............. 252.00
TTL 25th Percentile ... 252.00
TTL 75th Percentile ... 252.00
TTL 95th Percentile ... 252.00
Histogram:
    252: 100 packets

No further runs scheduled.

One other observation of the awkwardness is that MaDDash somehow compensates, but Grafana doesn't.
Meaning despite the skewed results... MaDDash somehow recognizes the issue parameter lack/existence and provides the correct results
Below you'll see despite the change in the depth parameter MaDDash doesn't recognize the issue

image

@laeti-tia
Copy link
Member

Isn't there some calculation happening in MaDDash? And/or a need to have the packet-intertval and bucket-width parameters aligned somehow? The difference between the first and second results seems to be a 10-fold decrease.

@arlake228 arlake228 self-assigned this Jun 26, 2024
@igarny
Copy link
Author

igarny commented Jun 26, 2024

Here are the final tests:
testBAD.txt
resBAD.json
testOK.txt
resultOK.json

arlake228 added a commit to perfsonar/grafana that referenced this issue Jun 26, 2024
@arlake228
Copy link
Contributor

As suspected this is a display issue. I was able to recreate in my testbed. The problem was both in CLI and Grafana as neither was accounting for bucket-width when working with histogram or derived values. The histogram buckets are stored in Opensearch with the bins sized according to the bucket width (default .001) which is what is desired. Grafana and the CLI were just slapping a ms label on whatever came out without consideration for bucket width. I have updated both to look for the bucket width and scale values accordingly so they are always normalized to milliseconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants