-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[new feature] Index: loki support apache calcite avatica index storage. #5692
Conversation
# Conflicts: # go.mod
Thanks for the PR @liguozhong! Have you considered using |
Sorry, not sure what gave you an impression that |
hi, @sandeepsukhani , thanks a lot for your suggestion. This advice has taught me another great knowledge. I read this document carefully (https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/), the implementation of boltdb-shipper is really amazing, this idea is very good , a perfect replacement for Cassandra. I really want to use this new mode of boltdb-shipper to operate loki, it will be a very interesting challenge. But I am a little worried. At present, our loki is running in k8s. Neither ingester nor qierier use persistent volume, but only use 2 resources of memory and cpu. The reason we do this is that we think that disk is a database, and the database is difficult, such as how to solve the problem of disk failure, how to expand the capacity of insufficient disk space, how to migrate the data on the disk, mmap, etc. The hardest problems in the CS are all in the disk. Because our team is a monitoring team, not a database team, our development team is mainly monitoring skills. Because of the above concerns, we chose hosted Cassandra and hosted s3 to hand over the disk and database issues to a professional team, which has obtained loki's higher availability. We still hope to put persistent storage in a database similar to lindorm /cassandra /bigtable /dynamoDB, and we will try to use boltdb-shipper in some small loki clusters. After the ability, we will try to replace all online loki with boltdb-shipper. But I think this is difficult, we need to have the skills of the database team, but we will try. |
I think you should give it a try since adding disks now a days is very easy with most platforms providing k8s as a service. |
Ok, I will try to use boltdb-shipper to store index data in a dev cluster soon |
@sandeepsukhani hi ,I need your help. ruler yaml ruler:
storage:
type: s3
s3:
s3: s3://s3.aliyuncs.com:9053/loki-rule
s3forcepathstyle: true shipper yaml storage_config:
aws:
bucketnames: loki-logs
endpoint:s3.com
boltdb_shipper:
active_index_directory: /loki/index
shared_store: s3
cache_location: /loki/boltdb-cache
compactor:
working_directory: /loki/boltdb-shipper-compactor
shared_store: aws |
panic: runtime error: index out of range [0] with length 0 goroutine 305779 [running]: github.com/apache/calcite-avatica-go/v5.(*rows).Columns(0xc08910fa5eb1c1b8) /Users/fuling/go/pkg/mod/github.com/apache/calcite-avatica-go/v5@v5.0.0/rows.go:53 +0xfe database/sql.(*Rows).nextLocked(0xc06b0a4080) /usr/local/go/src/database/sql/sql.go:2964 +0xb0
I think it's unlikely we'll accept another index client (sorry!) because we're trying to deprecate usage of alternative index stores and standardize on using object-storage for the index via boltdb-shipper (or TSDB in the future). As for your question about independent buckets for the index, it's not currently a supported option like it is for the ruler. |
Ok, I had to do this work because hosted Cassandra in China went offline. |
@sandeepsukhani The write performance of boltdb-shipper is too bad. When Cassandra needs 3ms to write, boltdb needs 300ms. Currently our Cassandra is going offline, should I wait for the tsdb engine to finish , or continue with the pr of this Chinese version of the nosql index client? |
@liguozhong thanks for giving it a try and getting back with your findings! |
I just started reading boltdb code, trying to find lock related code, check if there are optimized code blocks. |
Our Cassandra has qps=7000, duration=3ms. We are using a 50Gb SSD |
The
Not sure if you would need that much, you can check the disk usage. I would like to point out that there are tradeoffs in both systems. You either pay for running Cassandra or you throw a little more resources on your Loki cluster when using |
In Alibabacloud, a cloud computing platform, the pv of the ssd type needs to be at least 50Gb, otherwise the error "InvalidDiskSize.NotSupported" will be reported. ErrorCode: InvalidDiskSize.NotSupported |
Splitting all s3 to multiple buckets might help you with performance. It's currently possible with comma separated buckets in the config. Migration is difficult however as it's not at the schema level. |
Hi! This issue has been automatically marked as stale because it has not had any We use a stalebot among other tools to help manage the state of issues in this project. Stalebots are also emotionless and cruel and can close issues which are still very relevant. If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry. We regularly sort for closed issues which have a We may also:
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, |
What this PR does / why we need it:
There is no google bigtable, aws dynamoDB and no hosted Cassandra service in China.
To operate a loki also needs to operate a NOSQL distributed databases, which puts a lot of pressure on our loki operater team.
We expect to find a hosted managed NOSQL service in China.
There is a cloud product on Alibaba Cloud lindorm(https://lindorm.console.aliyun.com/), that can solve this problem.
The go client of lindorm is apache calcite avatica(https://calcite.apache.org/avatica/). This PR is also applicable to the servers of the avatica protocol such as kylin, Phoenix and other service.
This PR attempts to provide an option for index storage that can run loki on a large scale in China.
out clutser config.
Which issue(s) this PR fixes:
Fixes ##5667
Special notes for your reviewer:
The tests have all passed in my dev env.
prometheus monitor snapshot, write status code == 200
https://github.com/dlmiddlecote/sqlstats
https://github.com/prometheus/client_golang/blob/main/prometheus/collectors/dbstats_collector.go
sql stats metrics
Checklist
CHANGELOG.md
about the changes.