Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No datapoint from old datasources #659

Closed
keyolk opened this issue Jun 27, 2024 · 2 comments · Fixed by #663
Closed

No datapoint from old datasources #659

keyolk opened this issue Jun 27, 2024 · 2 comments · Fixed by #663
Labels

Comments

@keyolk
Copy link

keyolk commented Jun 27, 2024

I'm trying to leverage Promxy as my unique endpoint while having Prometheus migration.
Seem it working nice if my datapoints just switched to the new datasource.
But not woking well if some metrics are dual written to both old and new datasources.

Let's say I have metric a

For migration, I make it dual write to the both datasource A, and B.
So B datasource only have some recent metrics.
and A have whole metrics.

but when I query through Promxy, I can't get old data which in A
Any clue why this happens?

@keyolk
Copy link
Author

keyolk commented Jun 27, 2024

My configuration

      promxy:
        server_groups:
         - static_configs:
             - targets:
               - sigv4-proxy.victoria-metrics.svc.cluster.local.:8080
           path_prefix: '/workspaces/ws-224d07f9-9fc3-4b6f-9ab9-f1b5be37fbc1/'
           timeout: 1m
           absolute_time_range:
             end: '2024-06-01T00:00:00Z'
             truncate: true
         - static_configs:
             - targets:
               - o11y-victoria-metrics-cluster-vmselect.victoria-metrics.svc.cluster.local.:8481
           path_prefix: '/select/1/prometheus/'

           timeout: 1m
           query_params:
             nocache: 1
           absolute_time_range:
             start: '2024-06-01T00:00:00Z'
             truncate: true

Also tried with single servergroup

      promxy:
        server_groups:
         - static_configs:
             - targets:
               - sigv4-proxy.victoria-metrics.svc.cluster.local.:8080
               - o11y-victoria-metrics-cluster-vmselect.victoria-metrics.svc.cluster.local.:8481
           relabel_confgis:
             - source_labels: [__address__]
                action: replace
                target_label: __path_prefix
                replacement: '/workspaces/ws-224d07f9-9fc3-4b6f-9ab9-f1b5be37fbc1/'
                regex: .*8080.*
             - source_labels: [__address__]
                action: replace
                target_label: __path_prefix
                regex: .*8481.*
                replacement: '/select/1/prometheus/'
           timeout: 1m

jacksontj added a commit that referenced this issue Jul 14, 2024
Query range is a bit of an odd beast. It requires a start, end, and a
step. Promql's calculations assume that the datapoint times are start
plus a multiple of step; if they aren't you are subject to LookbackDelta
constraints.

Specifically with TimeFilter -- where we are setting either an absolute
or relative time barrier we were causing some issues as we were
previously truncating the time -- instead of finding the first multiple
of step within our timerange.

To explain the issue; lets consider the following query:

```
Start: t10
End: t25
Step: 3
```

If this were to run normally we'd get results with data at
10,13,16,19,22,25

If we at the same time have an absoluteTimeFilter @ t20 -- we'd (prior
to this patch) get: 10, 13, 16, 19, 20, 23.

For shorter-range queries (i.e. within a day) this is generally not an
issue because the step isn't generally long enough to exceed the
LookbackDelta default of 5m. But as you look at *longer* time ranges
this problem is easier and easier to hit (in #659 I was able to
reproduce easily with a ~3 day query -- where step was ~9m.

Fixes #659
@jacksontj
Copy link
Owner

First off; thanks for reaching out! I love to see people using the project -- and especially love a good puzzle (foreshadowing!) :D

So I was actually able to reproduce this issue with relative ease -- but debugging it down to the root cause took some time. To spare the details of my various theories and debugging (which took quite a few hours :D ) -- I figured out that this is due to how the TimeFilters were handling their segmentation (more details in #663).

Thankfully the fix here is easy; but I'm honestly a bit surprised we never noticed before. The TLDR here is it is VERY EASY to reproduce as long as your QueryRange step is larger than your lookback delta -- which generally needs a multi-day query. So a HUGE thanks for catching and reporting this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants