[new feature] add Remote read API #7324

liguozhong · 2022-10-03T17:58:02Z

What this PR does / why we need it:

[new feature] add Remote read API.

remote read use /Users/fuling/go/src/github.com/grafana/loki/pkg/logcli/client/client.go, loki server do not add new http handler.

type Client interface {
	Query(queryStr string, limit int, time time.Time, direction logproto.Direction, quiet bool) (*loghttp.QueryResponse, error)
	QueryRange(queryStr string, limit int, start, end time.Time, direction logproto.Direction, step, interval time.Duration, quiet bool) (*loghttp.QueryResponse, error)
	ListLabelNames(quiet bool, start, end time.Time) (*loghttp.LabelResponse, error)
	ListLabelValues(name string, quiet bool, start, end time.Time) (*loghttp.LabelResponse, error)
	Series(matchers []string, start, end time.Time, quiet bool) (*loghttp.SeriesResponse, error)
	LiveTailQueryConn(queryStr string, delayFor time.Duration, limit int, start time.Time, quiet bool) (*websocket.Conn, error)
	GetOrgID() string
}

loki.yaml config demo

remote_read:
  - url: http://loki_us.svc:3100
    name: loki_us
    orgID: buy
  - url: http://loki_eu.svc:3100
    name: loki_eu
    orgID: carts

prometheus remote read doc：https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_read
Which issue(s) this PR fixes:
Fixes #7306 And ##1866

Special notes for your reviewer:
This PR is already quite large, this PR is only implementation. The basic framework of remote read and the configuration of remote read and an interface read.go SelectLogs test.

Checklist

Reviewed the CONTRIBUTING.md guide
Documentation added
Tests updated
CHANGELOG.md updated
Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

# Conflicts: # pkg/loki/modules.go

grafanabot · 2022-10-04T08:50:22Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.4%

grafanabot · 2022-10-04T09:11:02Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
- querier/queryrange	-0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.4%

# Conflicts: # pkg/loki/loki.go

grafanabot · 2022-10-08T06:09:40Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.4%

jeschkies

We were planning a remote read API as well. I'll reject it just so we have time reviewing it not because I think our solution would be better.

I'm super there's interest in it.

jeschkies · 2022-10-10T11:36:59Z

Is there an issue or something similar where this is discussed in more detail?

jeschkies

Nice work. I meant to introduce a new remote read API that would stream the results so we would only transfer the data we need.

jeschkies · 2022-10-10T12:48:22Z

pkg/querier/remote/read.go

+	if !ok {
+		return nil, errors.New("remote read Querier selectLogs fail,value cast (loghttp.Streams) fail")
+	}
+	return iter.NewStreamsIterator(streams.ToProto(), params.Direction), nil


I think we should inject the remote id like we did with the tenant ID for multi tenant queries.

agree , you are right

Sorry. I cannot find it 🙈

liguozhong · 2022-10-11T05:34:13Z

Is there an issue or something similar where this is discussed in more detail?

hi @jeschkies ,As far as I know, there is no detailed discussion about remote read, but I have some experience with the remote read module in the prometheus project before.

This PR is only to provide a template for the community to discuss, we can discuss in this PR.

I'm very sorry to submit this PR directly without discussing it in detail, trust me, I have no ill intentions. I just want to make loki a more mature and complete logging solution faster.

…remote id test

grafanabot · 2022-10-11T07:01:45Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.4%

jeschkies · 2022-10-11T07:08:03Z

I'm very sorry to submit this PR directly without discussing it in detail, trust me, I have no ill intentions. I just want to make loki a more mature and complete logging solution faster.

@liguozhong no worries. I didn't mean to criticize. I was out for some time and thought I missed some info.

I'm super happy you took this initiative. I have a few ideas that I'll add later here.

owen-d

I'm worried about performance of this at scale, which often coincides with users that want this feature. Right now, no aggregation is done remotely, meaning we're going to ship potentially a lot of data back to the host cluster on each query. This can quickly become prohibitively expensive (data egress costs).

Some other thoughts

(mentioned above) How do we minimize bandwidth between clusters? Ideally we'd need to do more sophisticated query planning (aggregation push-down).
What happens when the ideal shard factor should differ on different clusters during query parallelization?

I'm inclined to say we don't want this feature because at scale, this will be both costly and slow :( .

liguozhong · 2023-03-06T01:29:19Z

I'm worried about performance of this at scale, which often coincides with users that want this feature. Right now, no aggregation is done remotely, meaning we're going to ship potentially a lot of data back to the host cluster on each query. This can quickly become prohibitively expensive (data egress costs).

Some other thoughts

(mentioned above) How do we minimize bandwidth between clusters? Ideally we'd need to do more sophisticated query planning (aggregation push-down).

What happens when the ideal shard factor should differ on different clusters during query parallelization?

I'm inclined to say we don't want this feature because at scale, this will be both costly and slow :( .

Agreed, introducing this feature will cause a lot of complaints

SuperQ · 2023-03-28T08:33:57Z

@owen-d / @liguozhong This kind of remote-read federation is very much needed.

For example, if have multiple compute/cloud service providers in multiple geo regions. I would rather ship data per query over, than ship all logs over our regional links. Most log output goes un-queried, so keeping the logs inside a regional or provider boundary is desired.

We would also rather have the ability to circuit-breaker pattern such that if a single geo region/provider is unavailable we can continue to have access to the regions/providers.

Without remote read / distributed query, Loki would a SPoF in our observability architecture.

Having a remote read / federated model is why we went with Thanos over Mimir.

liguozhong · 2023-03-28T08:44:31Z

@owen-d / @liguozhong This kind of remote-read federation is very much needed.

For example, if have multiple compute/cloud service providers in multiple geo regions. I would rather ship data per query over, than ship all logs over our regional links. Most log output goes un-queried, so keeping the logs inside a regional or provider boundary is desired.

We would also rather have the ability to circuit-breaker pattern such that if a single geo region/provider is unavailable we can continue to have access to the regions/providers.

Without remote read / distributed query, Loki would a SPoF in our observability architecture.

Having a remote read / federated model is why we went with Thanos over Mimir.

Hi, I understand your situation very well. At present, I also operate 22 loki clusters.
When I need to query a traceID log, I need to click 22 times each time. This is very uncomfortable.
I am willing to complete this PR to free up this unproductive duplication of work ,

needs to be discussed by more people.

# Conflicts: # docs/sources/configuration/_index.md # pkg/querier/querier.go

SuperQ · 2023-03-28T09:23:29Z

Having this as an option, and then improving on things like query pushdown is probably the best incremental approach. Documenting the down-sides like fan-out and lack of push-down is fine IMO.

For example in Thanos, we had lots of issues with fanout (we have 2000+ Prometheus instances), so we introduced a label enforcement proxy to make sure users selected which promehteus instances they needed per query. This is slightly less intuitive, but a lot less cumbersome to maintaining 2000+ datasources. :)

liguozhong · 2023-03-28T09:29:11Z

I am very looking forward to launching an alpha version so that more developers can participate in optimizing the code.
If there are 2000 loki, each loki is configured with a label, and then filtering different datasource based on the label can provide a very cool federation query. great idea.

wait more eyes

jeschkies · 2023-03-29T15:12:29Z

What do you think about blocking queries that do not perform well without remote aggregation?

liguozhong · 2023-04-27T07:08:18Z

What do you think about blocking queries that do not perform well without remote aggregation?

sorry, I missed your message.
Today our dev team asked me for this feature again. I found your message in the PR today.

sum(count_over_time{app="buy2"}[1m])

As far as I know, when the querier queries the ingester, it will execute this statement（count_over_time{app="buy2"}[1m]） in the ingester instead of pulling all the original data back to the querier for execution. What the ingester returns to the queier is a time series result . And this amount of data is very small, there will be no performance problems.

The Remote read API is the same, the performance problem is mainly concentrated in s3 file download, | json calculation. instead of happening in sum

Maybe my understanding is wrong.

jeschkies · 2023-04-28T16:20:37Z

What the ingester returns to the querier is a time series result . And this amount of data is very small, there will be no performance problems.

@liguozhong, that's because the ingester covers a small time range. If you query over days it's an issue. If I'm not mistaken we want to use something like the shard_summer here that fans out to each remote. Say you have one thousand pods and use their ids a labels. You'll have over a thousand streams returning. However, if you sum by say the namespace, the streams are reduced.

liguozhong · 2023-07-26T06:13:56Z

What the ingester returns to the querier is a time series result . And this amount of data is very small, there will be no performance problems.

@liguozhong, that's because the ingester covers a small time range. If you query over days it's an issue. If I'm not mistaken we want to use something like the shard_summer here that fans out to each remote. Say you have one thousand pods and use their ids a labels. You'll have over a thousand streams returning. However, if you sum by say the namespace, the streams are reduced.

sorry ,I miss this review tips . I will try to make it right. thanks ,"over a thousand streams returning" this is really danger .

# Conflicts: # pkg/logcli/client/client.go # pkg/logcli/client/file.go # pkg/logcli/query/query_test.go # pkg/querier/querier_test.go

# Conflicts: # pkg/logcli/client/client.go

JStickler · 2024-01-08T22:49:54Z

@liguozhong This PR is over a year old. Can we close it? Or are you still hoping to get this merged?

paulojmdias · 2024-03-28T00:14:18Z

Any updates on this PR?

[new feature] add Remote read API

1fade4a

liguozhong requested a review from a team as a code owner October 3, 2022 17:58

pull-request-size bot added the size/XXL label Oct 3, 2022

remove dup client.go

617db63

pull-request-size bot added size/XL and removed size/XXL labels Oct 3, 2022

liguozhong added 10 commits October 4, 2022 02:03

lint

a30f2f9

Merge branch 'main' into remote_read_api

c49ecee

# Conflicts: # pkg/loki/modules.go

lint

787f75a

lint

c5fcd67

test

f50f504

fix test fail

4527a5d

lint

4c7cb7c

remote read support selectSamples

7cc4e64

remote read support label and series

65a17f6

remote read add label and seires test

27463f3

liguozhong added 2 commits October 4, 2022 17:02

Merge branch 'main' into remote_read_api

6c5ee35

fix typo

acb472a

Merge branch 'main' into remote_read_api

83ede52

# Conflicts: # pkg/loki/loki.go

jeschkies requested changes Oct 10, 2022

View reviewed changes

liguozhong added 2 commits October 11, 2022 14:34

client support pass ctx for remote read inject the remote id and add …

3c9e858

…remote id test

fix test

c5eeaa9

liguozhong added 2 commits February 28, 2023 09:01

make doc

6fe1992

remove no usege func

3c5e3d0

owen-d requested changes Mar 3, 2023

View reviewed changes

Merge branch 'main' into remote_read_api

b7c52c8

# Conflicts: # docs/sources/configuration/_index.md # pkg/querier/querier.go

liguozhong added 3 commits April 27, 2023 15:14

Merge branch 'main' into remote_read_api

8559654

format import

e2628a3

trigger ci

950f33a

liguozhong added 11 commits July 26, 2023 14:29

Merge branch 'main' into remote_read_api

7832063

# Conflicts: # pkg/logcli/client/client.go # pkg/logcli/client/file.go # pkg/logcli/query/query_test.go # pkg/querier/querier_test.go

add Volume() inteface impl

074d19d

fix test

982a340

fix lint error

e1e2ee7

update test

ae43086

lint

3c8da9e

lint

eb8a7d6

lint

51a42ad

Merge branch 'main' into remote_read_api

d9bd2ff

# Conflicts: # pkg/logcli/client/client.go

lint

2b696bd

add metrics query test

a9e0c70

sarasensible mentioned this pull request May 7, 2024

Need a way to expose Loki on a read only basis to a separate cluster in the same vpc grafana/helm-charts#3124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[new feature] add Remote read API #7324

[new feature] add Remote read API #7324

liguozhong commented Oct 3, 2022 •

edited

grafanabot commented Oct 4, 2022

grafanabot commented Oct 4, 2022

grafanabot commented Oct 8, 2022

jeschkies left a comment

jeschkies commented Oct 10, 2022

jeschkies left a comment

jeschkies Oct 10, 2022

liguozhong Oct 11, 2022

liguozhong Oct 11, 2022

jeschkies Feb 23, 2023

liguozhong commented Oct 11, 2022

grafanabot commented Oct 11, 2022

jeschkies commented Oct 11, 2022

owen-d left a comment

liguozhong commented Mar 6, 2023

SuperQ commented Mar 28, 2023

liguozhong commented Mar 28, 2023 •

edited

SuperQ commented Mar 28, 2023

liguozhong commented Mar 28, 2023

jeschkies commented Mar 29, 2023

liguozhong commented Apr 27, 2023 •

edited

jeschkies commented Apr 28, 2023

liguozhong commented Jul 26, 2023

JStickler commented Jan 8, 2024

paulojmdias commented Mar 28, 2024

[new feature] add Remote read API #7324

Are you sure you want to change the base?

[new feature] add Remote read API #7324

Conversation

liguozhong commented Oct 3, 2022 • edited

grafanabot commented Oct 4, 2022

grafanabot commented Oct 4, 2022

grafanabot commented Oct 8, 2022

jeschkies left a comment

Choose a reason for hiding this comment

jeschkies commented Oct 10, 2022

jeschkies left a comment

Choose a reason for hiding this comment

jeschkies Oct 10, 2022

Choose a reason for hiding this comment

liguozhong Oct 11, 2022

Choose a reason for hiding this comment

liguozhong Oct 11, 2022

Choose a reason for hiding this comment

jeschkies Feb 23, 2023

Choose a reason for hiding this comment

liguozhong commented Oct 11, 2022

grafanabot commented Oct 11, 2022

jeschkies commented Oct 11, 2022

owen-d left a comment

Choose a reason for hiding this comment

liguozhong commented Mar 6, 2023

SuperQ commented Mar 28, 2023

liguozhong commented Mar 28, 2023 • edited

SuperQ commented Mar 28, 2023

liguozhong commented Mar 28, 2023

jeschkies commented Mar 29, 2023

liguozhong commented Apr 27, 2023 • edited

jeschkies commented Apr 28, 2023

liguozhong commented Jul 26, 2023

JStickler commented Jan 8, 2024

paulojmdias commented Mar 28, 2024

liguozhong commented Oct 3, 2022 •

edited

liguozhong commented Mar 28, 2023 •

edited

liguozhong commented Apr 27, 2023 •

edited