-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[new feature] add Remote read API #7324
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # pkg/loki/modules.go
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
- loki -0.4% |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
- querier/queryrange -0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
- loki -0.4% |
# Conflicts: # pkg/loki/loki.go
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
- loki -0.4% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were planning a remote read API as well. I'll reject it just so we have time reviewing it not because I think our solution would be better.
I'm super there's interest in it.
Is there an issue or something similar where this is discussed in more detail? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I meant to introduce a new remote read API that would stream the results so we would only transfer the data we need.
pkg/querier/remote/read.go
Outdated
if !ok { | ||
return nil, errors.New("remote read Querier selectLogs fail,value cast (loghttp.Streams) fail") | ||
} | ||
return iter.NewStreamsIterator(streams.ToProto(), params.Direction), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should inject the remote id like we did with the tenant ID for multi tenant queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree , you are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I cannot find it 🙈
hi @jeschkies ,As far as I know, there is no detailed discussion about remote read, but I have some experience with the remote read module in the prometheus project before. This PR is only to provide a template for the community to discuss, we can discuss in this PR. I'm very sorry to submit this PR directly without discussing it in detail, trust me, I have no ill intentions. I just want to make loki a more mature and complete logging solution faster. |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
- loki -0.4% |
@liguozhong no worries. I didn't mean to criticize. I was out for some time and thought I missed some info. I'm super happy you took this initiative. I have a few ideas that I'll add later here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about performance of this at scale, which often coincides with users that want this feature. Right now, no aggregation is done remotely, meaning we're going to ship potentially a lot of data back to the host cluster on each query. This can quickly become prohibitively expensive (data egress costs).
Some other thoughts
- (mentioned above) How do we minimize bandwidth between clusters? Ideally we'd need to do more sophisticated query planning (aggregation push-down).
- What happens when the ideal shard factor should differ on different clusters during query parallelization?
I'm inclined to say we don't want this feature because at scale, this will be both costly and slow :( .
Agreed, introducing this feature will cause a lot of complaints |
@owen-d / @liguozhong This kind of remote-read federation is very much needed. For example, if have multiple compute/cloud service providers in multiple geo regions. I would rather ship data per query over, than ship all logs over our regional links. Most log output goes un-queried, so keeping the logs inside a regional or provider boundary is desired. We would also rather have the ability to circuit-breaker pattern such that if a single geo region/provider is unavailable we can continue to have access to the regions/providers. Without remote read / distributed query, Loki would a SPoF in our observability architecture. Having a remote read / federated model is why we went with Thanos over Mimir. |
Hi, I understand your situation very well. At present, I also operate 22 loki clusters. needs to be discussed by more people. |
# Conflicts: # docs/sources/configuration/_index.md # pkg/querier/querier.go
Having this as an option, and then improving on things like query pushdown is probably the best incremental approach. Documenting the down-sides like fan-out and lack of push-down is fine IMO. For example in Thanos, we had lots of issues with fanout (we have 2000+ Prometheus instances), so we introduced a label enforcement proxy to make sure users selected which promehteus instances they needed per query. This is slightly less intuitive, but a lot less cumbersome to maintaining 2000+ datasources. :) |
I am very looking forward to launching an alpha version so that more developers can participate in optimizing the code. wait more eyes |
What do you think about blocking queries that do not perform well without remote aggregation? |
sorry, I missed your message.
As far as I know, when the querier queries the ingester, it will execute this statement( The Remote read API is the same, the performance problem is mainly concentrated in s3 file download, | json calculation. instead of happening in Maybe my understanding is wrong. |
@liguozhong, that's because the ingester covers a small time range. If you query over days it's an issue. If I'm not mistaken we want to use something like the shard_summer here that fans out to each remote. Say you have one thousand pods and use their ids a labels. You'll have over a thousand streams returning. However, if you sum by say the namespace, the streams are reduced. |
sorry ,I miss this review tips . I will try to make it right. thanks ,"over a thousand streams returning" this is really danger . |
# Conflicts: # pkg/logcli/client/client.go # pkg/logcli/client/file.go # pkg/logcli/query/query_test.go # pkg/querier/querier_test.go
# Conflicts: # pkg/logcli/client/client.go
@liguozhong This PR is over a year old. Can we close it? Or are you still hoping to get this merged? |
Any updates on this PR? |
What this PR does / why we need it:
[new feature] add Remote read API.
remote read use /Users/fuling/go/src/github.com/grafana/loki/pkg/logcli/client/client.go, loki server do not add new http handler.
loki.yaml config demo
prometheus remote read doc:https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_read
Which issue(s) this PR fixes:
Fixes #7306 And ##1866
Special notes for your reviewer:
This PR is already quite large, this PR is only implementation. The basic framework of remote read and the configuration of remote read and an interface read.go SelectLogs test.
Checklist
CONTRIBUTING.md
guideCHANGELOG.md
updateddocs/sources/upgrading/_index.md