add compactor details and other boltdb-shipper doc improvments #2622

sandeepsukhani · 2020-09-14T12:50:37Z

What this PR does / why we need it:
Adds details about compactor to the docs and updates some other details related to recent changes in BoltDB Shipper.

Checklist

Documentation added

achatterjee-grafana · 2020-09-14T12:59:59Z

docs/sources/operations/storage/boltdb-shipper.md

@@ -48,22 +48,26 @@ Loki can be configured to run as just a single vertically scaled instance or as
 When it comes to reads and writes, Ingesters are the ones which writes the index and chunks to stores and Queriers are the ones which reads index and chunks from the store for serving requests.

 Before we get into more details, it is important to understand how Loki manages index in stores. Loki shards index as per configured period which defaults to 7 days i.e when it comes to table based stores like Bigtable/Cassandra/DynamoDB there would be separate table per week containing index for that week.
-In case of BoltDB files there is no concept of tables, so it creates a BoltDB file per period(i.e day in case of boltdb-shipper store). Files/Tables created per day are identified by a configured `prefix_` + `<period-number-since-epoch>`.
+In case of BolDB Shipper a table is defined by a collection of many smaller BoltDB files, each file storing just 15 mins worth of index. Tables created per day are identified by a configured `prefix_` + `<period-number-since-epoch>`.


Needs an article and a comma:

In the case of BolDB Shipper, a table is defined by a collection of many smaller BoltDB files, each file storing just 15 mins worth of index. Tables created per day are identified by a configured prefix_ + <period-number-since-epoch>.

codecov-commenter · 2020-09-14T13:02:20Z

Codecov Report

Merging #2622 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2622      +/-   ##
==========================================
- Coverage   62.87%   62.85%   -0.02%     
==========================================
  Files         170      170              
  Lines       15049    15049              
==========================================
- Hits         9462     9459       -3     
- Misses       4826     4831       +5     
+ Partials      761      759       -2

Impacted Files	Coverage Δ
pkg/logql/vector.go	`68.75% <0.00%> (-18.75%)`	⬇️
pkg/promtail/targets/file/filetarget.go	`66.28% <0.00%> (-0.58%)`	⬇️
pkg/logql/evaluator.go	`92.07% <0.00%> (-0.41%)`	⬇️
pkg/querier/queryrange/downstreamer.go	`97.93% <0.00%> (+2.06%)`	⬆️
pkg/promtail/targets/file/tailer.go	`75.00% <0.00%> (+4.16%)`	⬆️

achatterjee-grafana · 2020-09-14T13:10:30Z

docs/sources/operations/storage/boltdb-shipper.md

 Here `<period-number-since-epoch>` in case of boltdb-shipper would be day number since epoch.
-For example, if you have prefix set to `loki_index_` and a write request comes in on 20th April 2020, it would be stored in table/file named `loki_index_18372` because it has been `18371` days since epoch, and we are in `18372`th day.
+For example, if you have prefix set to `loki_index_` and a write request comes in on 20th April 2020, it would be stored in table named `loki_index_18372` because it has been `18371` days since epoch, and we are in `18372`th day.


Copy-edit suggestion:

For example, if you have a prefix set to loki_index_ and a written request comes in on 20th April 2020, it would be stored in a table named loki_index_18372 because it has been 18371 days since the epoch, and we are in 18372th day.

achatterjee-grafana · 2020-09-14T13:12:01Z

docs/sources/operations/storage/boltdb-shipper.md

 Since sharding of index creates multiple files when using BoltDB, BoltDB Shipper would create a folder per day and add files for that day in that folder and names those files after ingesters which created them.

+To reduce the size of files which helps with faster transfer speeds and reduced storage costs, they are stored after compressing them with gzip.


"help" not helps

achatterjee-grafana · 2020-09-14T14:06:59Z

docs/sources/operations/storage/boltdb-shipper.md

@@ -77,18 +81,21 @@ When running Loki in clustered mode there could be multiple ingesters serving wr
 *NOTE: To avoid any loss of index when Ingester crashes it is recommended to run Ingesters as statefulset(when using k8s) with a persistent storage for storing index files.*

 Another important detail to note is when chunks are flushed they are available for reads in object store instantly while index is not since we only upload them every 15 Minutes with BoltDB shipper.
-To avoid missing logs from queries which happen to be indexed in BoltDB files which are not shipped yet, while serving queries for in-memory logs, Ingesters would also do a store query for `now()` - (`max_chunk_age` + `30 Min`) to `<end-time-from-query-request>`.
+Ingesters expose a new RPC for letting Queriers query the Ingester's local index for chunks which were recently flushed but its index might not be available yet with Queriers.
+For all the queries which requires chunks to be read from the store, Queriers also query Ingesters over RPC for IDs of chunks which were recently flushed which is to avoid missing any logs from queries.


"require" not requires

achatterjee-grafana · 2020-09-14T14:07:57Z

docs/sources/operations/storage/boltdb-shipper.md


 ### Queriers

 Queriers lazily loads BoltDB files from shared object store to configured `cache_location`.
-When a querier receives a read request, query range from request is resolved to period numbers and all the files for those period numbers are downloaded to `cache_location` if not already.
+When a querier receives a read request, query range from request is resolved to period numbers and all the files for those period numbers are downloaded to `cache_location`, if not already.


Copy-edit:
When a querier receives a read request, the query range from the request is resolved to period numbers and all the files for those period numbers are downloaded to cache_location, if not already.

achatterjee-grafana · 2020-09-14T14:09:38Z

docs/sources/operations/storage/boltdb-shipper.md

 Once we have downloaded files for a period we keep looking for updates in shared object store and download them every 15 Minutes by default.
 Frequency for checking updates can be configured with `resync_interval` config.

 To avoid keeping downloaded index files forever there is a ttl for them which defaults to 24 hours, which means if index files for a period are not used for 24 hours they would be removed from cache location.
 ttl can be configured using `cache_ttl` config.

+*NOTE: For better read performance and to avoid using node disk it is recommended to run Queriers as statefulset(when using k8s) with a persistent storage for downloading and querying index files.*


Remove "a" from "a persistent storage..."

achatterjee-grafana · 2020-09-14T14:10:42Z

docs/sources/operations/storage/boltdb-shipper.md

@@ -99,4 +106,25 @@ This problem would be faced even during rollouts which is quite common.
 To avoid this, Loki disables deduplication of index when the replication factor is greater than 1 and `boltdb-shipper` is an active or upcoming index type.
 While using `boltdb-shipper` please avoid configuring WriteDedupe cache since it is used purely for the index deduplication, so it would not be used anyways.

+### Compactor
+
+Compactor is a BoltDB Shipper specific service which reduces the index size by deduping the index and merging all the files to a single file per table.


Change to "that reduces ..."

achatterjee-grafana · 2020-09-14T14:12:14Z

docs/sources/operations/storage/boltdb-shipper.md

+Compactor is a BoltDB Shipper specific service which reduces the index size by deduping the index and merging all the files to a single file per table.
+It is highly recommended running a Compactor since a single Ingester creates 96 files per day which includes a lot of duplicate index entries and querying multiple files per table adds up the overall query latency.
+
+*NOTE: There should be only 1 compactor instance running at a time which otherwise could create problems and may lead to data loss. *


Change to "that otherwise..."

achatterjee-grafana · 2020-09-14T14:14:43Z

docs/sources/operations/storage/boltdb-shipper.md

+### Compactor
+
+Compactor is a BoltDB Shipper specific service which reduces the index size by deduping the index and merging all the files to a single file per table.
+It is highly recommended running a Compactor since a single Ingester creates 96 files per day which includes a lot of duplicate index entries and querying multiple files per table adds up the overall query latency.


Copy-edit:
We recommend running a Compactor since a single Ingester creates 96 files per day which include a lot of duplicate index entries and querying multiple files per table adds up the overall query latency.

sandeepsukhani · 2020-09-15T06:04:17Z

@achatterjee-grafana Thanks for the review! I have taken care of all the suggested changes.

achatterjee-grafana

Thank you, Sandeep.

oddlittlebird

A few minor format suggestions.

oddlittlebird · 2020-09-15T16:27:37Z

docs/sources/operations/storage/boltdb-shipper.md

-        └── ingester-1
+        ├── ingester-0-1587254400.gz
+        └── ingester-1-1587254400.gz
+        ...
 ```
 *NOTE: We also add a timestamp to names of the files to randomize the names to avoid overwriting files when running Ingesters with same name and not have a persistent storage. Timestamps not shown here for simplification*


Suggested change

*NOTE: We also add a timestamp to names of the files to randomize the names to avoid overwriting files when running Ingesters with same name and not have a persistent storage. Timestamps not shown here for simplification*

**Note:** We also add a timestamp to names of the files to randomize the names to avoid overwriting files when running Ingesters with same name and not have a persistent storage. Timestamps not shown here for simplification.

oddlittlebird · 2020-09-15T16:27:56Z

docs/sources/operations/storage/boltdb-shipper.md

@@ -77,18 +81,21 @@ When running Loki in clustered mode there could be multiple ingesters serving wr
 *NOTE: To avoid any loss of index when Ingester crashes it is recommended to run Ingesters as statefulset(when using k8s) with a persistent storage for storing index files.*


Suggested change

*NOTE: To avoid any loss of index when Ingester crashes it is recommended to run Ingesters as statefulset(when using k8s) with a persistent storage for storing index files.*

**Note:** To avoid any loss of index when Ingester crashes it is recommended to run Ingesters as statefulset(when using k8s) with a persistent storage for storing index files.

oddlittlebird · 2020-09-15T16:28:17Z

docs/sources/operations/storage/boltdb-shipper.md

 Once we have downloaded files for a period we keep looking for updates in shared object store and download them every 15 Minutes by default.
 Frequency for checking updates can be configured with `resync_interval` config.

 To avoid keeping downloaded index files forever there is a ttl for them which defaults to 24 hours, which means if index files for a period are not used for 24 hours they would be removed from cache location.
 ttl can be configured using `cache_ttl` config.

+*NOTE: For better read performance and to avoid using node disk it is recommended to run Queriers as statefulset(when using k8s) with persistent storage for downloading and querying index files.*


Suggested change

*NOTE: For better read performance and to avoid using node disk it is recommended to run Queriers as statefulset(when using k8s) with persistent storage for downloading and querying index files.*

**Note:** For better read performance and to avoid using node disk it is recommended to run Queriers as statefulset(when using k8s) with persistent storage for downloading and querying index files.

oddlittlebird · 2020-09-15T16:28:42Z

docs/sources/operations/storage/boltdb-shipper.md

+Compactor is a BoltDB Shipper specific service that reduces the index size by deduping the index and merging all the files to a single file per table.
+We recommend running a Compactor since a single Ingester creates 96 files per day which include a lot of duplicate index entries and querying multiple files per table adds up the overall query latency.
+
+*NOTE: There should be only 1 compactor instance running at a time that otherwise could create problems and may lead to data loss. *


Suggested change

*NOTE: There should be only 1 compactor instance running at a time that otherwise could create problems and may lead to data loss. *

**Note:** There should be only 1 compactor instance running at a time that otherwise could create problems and may lead to data loss.

oddlittlebird · 2020-09-15T16:29:01Z

docs/sources/operations/storage/boltdb-shipper.md

@@ -48,22 +48,26 @@ Loki can be configured to run as just a single vertically scaled instance or as
 When it comes to reads and writes, Ingesters are the ones which writes the index and chunks to stores and Queriers are the ones which reads index and chunks from the store for serving requests.

 Before we get into more details, it is important to understand how Loki manages index in stores. Loki shards index as per configured period which defaults to 7 days i.e when it comes to table based stores like Bigtable/Cassandra/DynamoDB there would be separate table per week containing index for that week.


Suggested change

Before we get into more details, it is important to understand how Loki manages index in stores. Loki shards index as per configured period which defaults to 7 days i.e when it comes to table based stores like Bigtable/Cassandra/DynamoDB there would be separate table per week containing index for that week.

Before we get into more details, it is important to understand how Loki manages index in stores. Loki shards index as per configured period which defaults to seven days i.e when it comes to table based stores like Bigtable/Cassandra/DynamoDB there would be separate table per week containing index for that week.

Write out numbers zero through ten unless it would cause confusion.

add compactor details and other boltdb-shipper doc improvments

f7cfa6b

sandeepsukhani requested review from achatterjee-grafana and oddlittlebird as code owners September 14, 2020 12:50

pull-request-size bot added the size/M label Sep 14, 2020

achatterjee-grafana reviewed Sep 14, 2020

View reviewed changes

changes suggested from PR review

56df4bd

achatterjee-grafana approved these changes Sep 15, 2020

View reviewed changes

oddlittlebird approved these changes Sep 15, 2020

View reviewed changes

sandeepsukhani added 2 commits September 16, 2020 11:53

changes suggested from PR review

73de87c

fix broken logcli build

aba48a2

sandeepsukhani merged commit 65b539c into grafana:master Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add compactor details and other boltdb-shipper doc improvments #2622

add compactor details and other boltdb-shipper doc improvments #2622

sandeepsukhani commented Sep 14, 2020

achatterjee-grafana Sep 14, 2020

codecov-commenter commented Sep 14, 2020 •

edited

Loading

achatterjee-grafana Sep 14, 2020

achatterjee-grafana Sep 14, 2020

achatterjee-grafana Sep 14, 2020

achatterjee-grafana Sep 14, 2020

achatterjee-grafana Sep 14, 2020

achatterjee-grafana Sep 14, 2020

achatterjee-grafana Sep 14, 2020

achatterjee-grafana Sep 14, 2020

sandeepsukhani commented Sep 15, 2020

achatterjee-grafana left a comment

oddlittlebird left a comment

oddlittlebird Sep 15, 2020

oddlittlebird Sep 15, 2020

oddlittlebird Sep 15, 2020

oddlittlebird Sep 15, 2020

oddlittlebird Sep 15, 2020

oddlittlebird Sep 15, 2020

		Since sharding of index creates multiple files when using BoltDB, BoltDB Shipper would create a folder per day and add files for that day in that folder and names those files after ingesters which created them.

		To reduce the size of files which helps with faster transfer speeds and reduced storage costs, they are stored after compressing them with gzip.

	NOTE: We also add a timestamp to names of the files to randomize the names to avoid overwriting files when running Ingesters with same name and not have a persistent storage. Timestamps not shown here for simplification
	Note: We also add a timestamp to names of the files to randomize the names to avoid overwriting files when running Ingesters with same name and not have a persistent storage. Timestamps not shown here for simplification.

		@@ -77,18 +81,21 @@ When running Loki in clustered mode there could be multiple ingesters serving wr
		NOTE: To avoid any loss of index when Ingester crashes it is recommended to run Ingesters as statefulset(when using k8s) with a persistent storage for storing index files.

	NOTE: To avoid any loss of index when Ingester crashes it is recommended to run Ingesters as statefulset(when using k8s) with a persistent storage for storing index files.
	Note: To avoid any loss of index when Ingester crashes it is recommended to run Ingesters as statefulset(when using k8s) with a persistent storage for storing index files.

	NOTE: For better read performance and to avoid using node disk it is recommended to run Queriers as statefulset(when using k8s) with persistent storage for downloading and querying index files.
	Note: For better read performance and to avoid using node disk it is recommended to run Queriers as statefulset(when using k8s) with persistent storage for downloading and querying index files.

	NOTE: There should be only 1 compactor instance running at a time that otherwise could create problems and may lead to data loss.
	Note: There should be only 1 compactor instance running at a time that otherwise could create problems and may lead to data loss.

		@@ -48,22 +48,26 @@ Loki can be configured to run as just a single vertically scaled instance or as
		When it comes to reads and writes, Ingesters are the ones which writes the index and chunks to stores and Queriers are the ones which reads index and chunks from the store for serving requests.

		Before we get into more details, it is important to understand how Loki manages index in stores. Loki shards index as per configured period which defaults to 7 days i.e when it comes to table based stores like Bigtable/Cassandra/DynamoDB there would be separate table per week containing index for that week.

add compactor details and other boltdb-shipper doc improvments #2622

add compactor details and other boltdb-shipper doc improvments #2622

Conversation

sandeepsukhani commented Sep 14, 2020

Choose a reason for hiding this comment

codecov-commenter commented Sep 14, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandeepsukhani commented Sep 15, 2020

achatterjee-grafana left a comment

Choose a reason for hiding this comment

oddlittlebird left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 14, 2020 •

edited

Loading