[new feature] Index: loki support apache calcite avatica index storage. #5692

liguozhong · 2022-03-22T07:52:49Z

What this PR does / why we need it:
There is no google bigtable, aws dynamoDB and no hosted Cassandra service in China.
To operate a loki also needs to operate a NOSQL distributed databases, which puts a lot of pressure on our loki operater team.
We expect to find a hosted managed NOSQL service in China.

There is a cloud product on Alibaba Cloud lindorm(https://lindorm.console.aliyun.com/), that can solve this problem.

The go client of lindorm is apache calcite avatica(https://calcite.apache.org/avatica/). This PR is also applicable to the servers of the avatica protocol such as kylin, Phoenix and other service.

This PR attempts to provide an option for index storage that can run loki on a large scale in China.

out clutser config.

schema_config:
  configs:
    - from: "2020-05-15"
      index:
        period: 168h
        prefix: loki_avatica_table2_index_
      object_store: s3
      schema: v11
      store: avatica
storage_config:
  max_chunk_batch_size: 500
  avatica:
    username: root
    password: xxxx
    addresses: http://ld-xxx-proxy-lindorm.lindorm.rds.aliyuncs.com:30060
    database: indexlokiavatica

Which issue(s) this PR fixes:
Fixes ##5667

Special notes for your reviewer:
The tests have all passed in my dev env.

prometheus monitor snapshot, write status code == 200

https://github.com/dlmiddlecote/sqlstats
https://github.com/prometheus/client_golang/blob/main/prometheus/collectors/dbstats_collector.go

sql stats metrics

Name	Description	Labels	Go Version
go_sql_stats_connections_max_open	Maximum number of open connections to the database.	db_name	1.11+
go_sql_stats_connections_open	The number of established connections both in use and idle.	db_name	1.11+
go_sql_stats_connections_in_use	The number of connections currently in use.	db_name	1.11+
go_sql_stats_connections_idle	The number of idle connections.	db_name	1.11+
go_sql_stats_connections_waited_for	The total number of connections waited for.	db_name	1.11+
go_sql_stats_connections_blocked_seconds	The total time blocked waiting for a new connection.	db_name	1.11+
go_sql_stats_connections_closed_max_idle	The total number of connections closed due to SetMaxIdleConns.	db_name	1.11+
go_sql_stats_connections_closed_max_lifetime	The total number of connections closed due to SetConnMaxLifetime.	db_name	1.11+
go_sql_stats_connections_closed_max_idle_time	The total number of connections closed due to SetConnMaxIdleTime.	db_name	1.15+

Checklist

Documentation added
Tests updated
Add an entry in the CHANGELOG.md about the changes.

# Conflicts: # go.mod

sandeepsukhani · 2022-03-22T08:25:55Z

Thanks for the PR @liguozhong! Have you considered using boltdb-shipper for index storage?

liguozhong · 2022-03-22T08:59:22Z

as I know, boltdb-shipper can only be deployed in the form of a single loki,
we have deployed a very large distributed loki.

sandeepsukhani · 2022-03-22T15:06:35Z

Sorry, not sure what gave you an impression that boltdb-shipper can only be used with single Loki. Actually boltdb-shipper is meant to run Loki in distributed mode at a large scale without having to use expensive and complicated NoSql stores. We have been running all our Loki deployments with boltdb-shipper for more than a year now.

liguozhong · 2022-03-23T02:56:29Z

boltdb-shipper

hi, @sandeepsukhani , thanks a lot for your suggestion. This advice has taught me another great knowledge.

I read this document carefully (https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/), the implementation of boltdb-shipper is really amazing, this idea is very good , a perfect replacement for Cassandra. I really want to use this new mode of boltdb-shipper to operate loki, it will be a very interesting challenge.

But I am a little worried. At present, our loki is running in k8s. Neither ingester nor qierier use persistent volume, but only use 2 resources of memory and cpu. The reason we do this is that we think that disk is a database, and the database is difficult, such as how to solve the problem of disk failure, how to expand the capacity of insufficient disk space, how to migrate the data on the disk, mmap, etc. The hardest problems in the CS are all in the disk.

Because our team is a monitoring team, not a database team, our development team is mainly monitoring skills. Because of the above concerns, we chose hosted Cassandra and hosted s3 to hand over the disk and database issues to a professional team, which has obtained loki's higher availability.

We still hope to put persistent storage in a database similar to lindorm /cassandra /bigtable /dynamoDB,

and we will try to use boltdb-shipper in some small loki clusters. After the ability, we will try to replace all online loki with boltdb-shipper. But I think this is difficult, we need to have the skills of the database team, but we will try.

sandeepsukhani · 2022-03-24T11:00:08Z

I think you should give it a try since adding disks now a days is very easy with most platforms providing k8s as a service.
If you are using jsonnet for deployment then you can even configure the size of PV.

liguozhong · 2022-03-24T14:22:05Z

Ok, I will try to use boltdb-shipper to store index data in a dev cluster soon

liguozhong · 2022-03-30T03:09:25Z

I think you should give it a try since adding disks now a days is very easy with most platforms providing k8s as a service. If you are using jsonnet for deployment then you can even configure the size of PV.

@sandeepsukhani hi ,I need your help.
I see that the index data of boltdb-shipper and chunk exist in the same bucket.
Can you separate /index/ and chunk into two different buckets? The ruler component is also an independent s3 bucket.
The performance stability of our Chinese s3 storage in the same bucket is not good.
thanks for your help.

ruler yaml

ruler:
    storage:
      type: s3
      s3:
        s3: s3://s3.aliyuncs.com:9053/loki-rule
        s3forcepathstyle: true

shipper yaml

storage_config:
  aws:
    bucketnames: loki-logs 
    endpoint:s3.com  
  boltdb_shipper:
    active_index_directory: /loki/index
    shared_store: s3
    cache_location: /loki/boltdb-cache
compactor:
  working_directory: /loki/boltdb-shipper-compactor
  shared_store: aws

panic: runtime error: index out of range [0] with length 0 goroutine 305779 [running]: github.com/apache/calcite-avatica-go/v5.(*rows).Columns(0xc08910fa5eb1c1b8) /Users/fuling/go/pkg/mod/github.com/apache/calcite-avatica-go/v5@v5.0.0/rows.go:53 +0xfe database/sql.(*Rows).nextLocked(0xc06b0a4080) /usr/local/go/src/database/sql/sql.go:2964 +0xb0

owen-d · 2022-04-04T13:22:01Z

I think it's unlikely we'll accept another index client (sorry!) because we're trying to deprecate usage of alternative index stores and standardize on using object-storage for the index via boltdb-shipper (or TSDB in the future).

As for your question about independent buckets for the index, it's not currently a supported option like it is for the ruler.

liguozhong · 2022-04-06T04:30:25Z

I think it's unlikely we'll accept another index client (sorry!) because we're trying to deprecate usage of alternative index stores and standardize on using object-storage for the index via boltdb-shipper (or TSDB in the future).

As for your question about independent buckets for the index, it's not currently a supported option like it is for the ruler.

Ok, I had to do this work because hosted Cassandra in China went offline.
I'm already trying to use boltdb-shipper to replace the original Cassandra.
Will the loki community consider splitting chunk and index into two separate buckets in the future?

liguozhong · 2022-04-08T08:00:51Z

I think it's unlikely we'll accept another index client (sorry!) because we're trying to deprecate usage of alternative index stores and standardize on using object-storage for the index via boltdb-shipper (or TSDB in the future).

As for your question about independent buckets for the index, it's not currently a supported option like it is for the ruler.

@sandeepsukhani The write performance of boltdb-shipper is too bad. When Cassandra needs 3ms to write, boltdb needs 300ms.
At present, boltdb will make writing too slow and cause the s3 write queue to accumulate, which will eventually cause ingester OOM.
At present, according to my judgment, boltdb Not applicable in our online environment.
@owen-d ,hi ,I see if the tsdb engine(#5428) you are doing is to solve the problem of low boltdb write performance?

Currently our Cassandra is going offline, should I wait for the tsdb engine to finish , or continue with the pr of this Chinese version of the nosql index client?

boltdb write duration

cassandra write duration

sandeepsukhani · 2022-04-08T09:06:43Z

@liguozhong thanks for giving it a try and getting back with your findings!
What is your QPS on index writes? Are you using a node disk or PV? If PV then are they SSD backed?

liguozhong · 2022-04-08T09:50:34Z

@liguozhong thanks for giving it a try and getting back with your findings! What is your QPS on index writes? Are you using a node disk or PV? If PV then are they SSD backed?

I just started reading boltdb code, trying to find lock related code, check if there are optimized code blocks.
This pr is because I saw it during the viewing process.
I'm still trying to find performance optimization points for writes.
We're working on switching from Cassandra to boltdb-shipper, which is a huge change for us, and we hope our team will be more familiar with the boltdb-shipper part of the code before switching to boltdb-shipper from cassandra.

liguozhong · 2022-04-08T09:52:52Z

@liguozhong thanks for giving it a try and getting back with your findings! What is your QPS on index writes? Are you using a node disk or PV? If PV then are they SSD backed?

Our Cassandra has qps=7000, duration=3ms.
The case of boltdb-shipper is qps=1000, duration=300ms

We are using a 50Gb SSD

sandeepsukhani · 2022-04-08T10:25:18Z

Our Cassandra has qps=7000, duration=3ms. The case of boltdb-shipper is qps=1000, duration=300ms

The boltdb-shipper actually measures batch writes while cassandra measures individual index entry writes.
boltdb-shipper would be doing much more index writes than cassandra because we disable index deduplication to avoid losing index if we lose an ingester since they write the index locally before uploading the index files to object storage.

We are using a 50Gb SSD

Not sure if you would need that much, you can check the disk usage.

I would like to point out that there are tradeoffs in both systems. You either pay for running Cassandra or you throw a little more resources on your Loki cluster when using boltdb-shipper since it now has to do more work to manage the index as well.
We run all our clusters with boltdb-shipper which soon would be replaced by tsdb. We will continue to invest our time improving object storage only backed Loki since it is cheaper and easier to run.

liguozhong · 2022-04-08T14:31:58Z

We are using a 50Gb SSD

Not sure if you would need that much, you can check the disk usage.

In Alibabacloud, a cloud computing platform, the pv of the ssd type needs to be at least 50Gb, otherwise the error "InvalidDiskSize.NotSupported" will be reported.
But the disk is relatively cheap compared to buying a set of hosted Cassandra.

ErrorCode: InvalidDiskSize.NotSupported
Recommend: https://troubleshoot.api.aliyun.com?q=InvalidDiskSize.NotSupported&product=Ecs
RequestId: 9A5F223A-BB9C-5BD4-88BB-5C5AC0C51B57
Message: disk size is not supported.
recommand: Please adjust the size of the cloud disk；
reference：https://help.aliyun.com/document_detail/25412.html

splitice · 2022-05-20T15:11:45Z

Splitting all s3 to multiple buckets might help you with performance. It's currently possible with comma separated buckets in the config.

Migration is difficult however as it's not at the schema level.

stale · 2022-07-10T10:20:57Z

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely
to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

[new feature] Index: loki support apache calcite avatica index storage.

ef2ae8b

liguozhong requested a review from a team as a code owner March 22, 2022 07:52

pull-request-size bot added the size/XXL label Mar 22, 2022

liguozhong added 2 commits March 22, 2022 16:02

add changelog

6ca2a0c

Merge branch 'main' into index_apache_avatica

fc07843

# Conflicts: # go.mod

remove metrics.

058ad44

liguozhong added 7 commits March 22, 2022 17:06

remove metrics

fc0acfc

go fmt

fa85031

replace keyspace

0fff75f

go fmt

92aa9d5

import fmt

8aa772e

add max conn config.

8f061d0

add test config

ff1fcac

liguozhong added 2 commits March 23, 2022 17:11

close query conn.

4a5078c

add tracing instrumentation

4f7347d

add doc

c7176dc

liguozhong requested a review from KMiller-Grafana as a code owner March 24, 2022 14:35

liguozhong added 7 commits March 25, 2022 11:10

add sql query retry

326a520

add test param

aceff1a

fix retry logic

22544dd

add error log

4e70cdc

add maxlife time conf ; opt instrumentation

2bbe721

exec replace query

5950475

add test param

21fe4ee

liguozhong closed this Mar 30, 2022

liguozhong reopened this Mar 30, 2022

liguozhong added 5 commits March 30, 2022 12:04

fix query panic.

6443620

add sql value msg when avatica query panic.

fdb430a

add hash value when avatica panic

c774e3f

Merge branch 'main' into index_apache_avatica

c532b5a

liguozhong mentioned this pull request Mar 30, 2022

[CALCITE-5072] Index out of range when calling rows.Next() apache/calcite-avatica-go#56

Closed

liguozhong added 5 commits March 31, 2022 17:27

reuse sql stmt connection when insert data.

02e94cd

rename prepare for insert instrument sql key.

024f63d

lint

d4522d5

support batching feature for insert

1e5e3b2

check rows error

693e122

liguozhong mentioned this pull request Apr 2, 2022

[Feature request] boltdb-shipper: Use a separate s3 bucket just like the ruler #5757

Open

add PrepareStmt config , support insert directly and prepare insert

164f9ed

liguozhong added 2 commits April 8, 2022 18:30

Merge branch 'main' into index_apache_avatica

eb03e4b

Merge branch 'main' into index_apache_avatica

16f69d0

stale bot added the stale A stale issue or PR that will automatically be closed. label Jul 10, 2022

stale bot closed this Aug 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[new feature] Index: loki support apache calcite avatica index storage. #5692

[new feature] Index: loki support apache calcite avatica index storage. #5692

liguozhong commented Mar 22, 2022 •

edited

Loading

sandeepsukhani commented Mar 22, 2022

liguozhong commented Mar 22, 2022 •

edited

Loading

sandeepsukhani commented Mar 22, 2022

liguozhong commented Mar 23, 2022

sandeepsukhani commented Mar 24, 2022

liguozhong commented Mar 24, 2022

liguozhong commented Mar 30, 2022 •

edited

Loading

owen-d commented Apr 4, 2022

liguozhong commented Apr 6, 2022

liguozhong commented Apr 8, 2022 •

edited

Loading

sandeepsukhani commented Apr 8, 2022

liguozhong commented Apr 8, 2022

liguozhong commented Apr 8, 2022

sandeepsukhani commented Apr 8, 2022 •

edited

Loading

liguozhong commented Apr 8, 2022

splitice commented May 20, 2022

stale bot commented Jul 10, 2022

[new feature] Index: loki support apache calcite avatica index storage. #5692

[new feature] Index: loki support apache calcite avatica index storage. #5692

Conversation

liguozhong commented Mar 22, 2022 • edited Loading

sandeepsukhani commented Mar 22, 2022

liguozhong commented Mar 22, 2022 • edited Loading

sandeepsukhani commented Mar 22, 2022

liguozhong commented Mar 23, 2022

sandeepsukhani commented Mar 24, 2022

liguozhong commented Mar 24, 2022

liguozhong commented Mar 30, 2022 • edited Loading

owen-d commented Apr 4, 2022

liguozhong commented Apr 6, 2022

liguozhong commented Apr 8, 2022 • edited Loading

sandeepsukhani commented Apr 8, 2022

liguozhong commented Apr 8, 2022

liguozhong commented Apr 8, 2022

sandeepsukhani commented Apr 8, 2022 • edited Loading

liguozhong commented Apr 8, 2022

splitice commented May 20, 2022

stale bot commented Jul 10, 2022

liguozhong commented Mar 22, 2022 •

edited

Loading

liguozhong commented Mar 22, 2022 •

edited

Loading

liguozhong commented Mar 30, 2022 •

edited

Loading

liguozhong commented Apr 8, 2022 •

edited

Loading

sandeepsukhani commented Apr 8, 2022 •

edited

Loading