Skip to content

Latest commit

 

History

History
213 lines (156 loc) · 9.05 KB

operations.md

File metadata and controls

213 lines (156 loc) · 9.05 KB

Operations

This page lists operational aspects of running Loki in alphabetical order:

Authentication

Loki does not have an authentication layer. You are expected to run an authenticating reverse proxy in front of your services, such as an Nginx with basic auth or an OAuth2 proxy. See client options for more details about supported authentication methods.

Multi-tenancy

Loki is a multitenant system; requests and data for tenant A are isolated from tenant B. Requests to the Loki API should include an HTTP header (X-Scope-OrgID) identifying the tenant for the request. Tenant IDs can be any alphanumeric string; limiting them to 20 bytes is reasonable. To run in multitenant mode, loki should be started with auth_enabled: true.

Loki can be run in "single-tenant" mode where the X-Scope-OrgID header is not required. In this situation, the tenant ID is defaulted to be fake.

Observability

Metrics

Both Loki and promtail expose a /metrics endpoint for Prometheus metrics. You need a local Prometheus and make sure it can add your Loki and promtail as targets, see Prometheus configuration. When Prometheus can scrape Loki and promtail, you get the following metrics:

Loki metrics:

  • log_messages_total Total number of log messages.
  • loki_distributor_bytes_received_total The total number of uncompressed bytes received per tenant.
  • loki_distributor_lines_received_total The total number of lines received per tenant.
  • loki_ingester_streams_created_total The total number of streams created per tenant.
  • loki_request_duration_seconds_count Number of received HTTP requests.

Promtail metrics:

  • promtail_read_bytes_total Number of bytes read.
  • promtail_read_lines_total Number of lines read.
  • promtail_request_duration_seconds_count Number of send requests.
  • promtail_sent_bytes_total Number of bytes sent.

Most of these metrics are counters and should continuously increase during normal operations:

  1. Your app emits a log line to a file tracked by promtail.
  2. Promtail reads the new line and increases its counters.
  3. Promtail forwards the line to a Loki distributor, its received counters should increase.
  4. The Loki distributor forwards it to a Loki ingester, its request duration counter increases.

You can import dashboard with ID 10004 to see them in Grafana UI.

Monitoring Mixins

Check out our Loki mixin for a set of dashboards, recording rules, and alerts. These give you a comprehensive package on how to monitor Loki in production.

For more information about mixins, take a look at the mixins project docs.

Retention/Deleting old data

Retention in Loki can be done by configuring Table Manager. You need to set a retention period and enable deletes for retention using yaml config as seen here or using table-manager.retention-period and table-manager.retention-deletes-enabled command line args. Retention period needs to be a duration in string format that can be parsed using time.Duration.

In the case of chunks retention when using S3 or GCS, you need to set the expiry policy on the bucket that is configured for storing chunks. For more details check this for S3 and this for GCS.

Currently we only support global retention policy. A per user retention policy and API to delete ingested logs is still under development. Feel free to add your use case to this GitHub issue.

A design goal of Loki is that storing logs should be cheap, hence a volume-based deletion API was deprioritized.

Until this feature is released: If you suddenly must delete ingested logs, you can delete old chunks in your object store. Note that this will only delete the log content while keeping the label index intact. You will still be able to see related labels, but the log retrieval of the deleted log content will no longer work.

Scalability

See this blog post on a discussion about Loki's scalability.

When scaling Loki, consider running several Loki processes with their respective roles of ingestor, distributor, and querier. Take a look at their respective .libsonnet files in our production setup to get an idea about resource usage.

We're happy to get feedback about your resource usage.

Storage

Loki needs two stores: an index store and a chunk store. Loki receives logs in separate streams. Each stream is identified by a set of labels. As the log entries from a stream arrive, they are gzipped as chunks and saved in the chunks store. The index then stores the stream's label set, and links them to the chunks. The chunk format refer to doc

Local storage

By default, Loki stores everything on disk. The index is stored in a BoltDB under /tmp/loki/index. The chunks are stored under /tmp/loki/chunks.

Google Cloud Storage

Loki has support for Google Cloud storage. Take a look at our production setup for the relevant configuration fields.

Cassandra

Loki can use Cassandra for the index storage. Please pull the latest Loki docker image or build from latest source code. Example config for using Cassandra:

schema_config:
  configs:
  - from: 2018-04-15
    store: cassandra
    object_store: filesystem
    schema: v9
    index:
      prefix: cassandra_table
      period: 168h

storage_config:
  cassandra:
    username: cassandra
    password: cassandra
    addresses: 127.0.0.1
    auth: true
    keyspace: lokiindex

  filesystem:
    directory: /tmp/loki/chunks

AWS S3 & DynamoDB

Example config for using S3 & DynamoDB:

schema_config:
  configs:
    - from: 0
      store: dynamo
      object_store: s3
      schema: v9
      index:
        prefix: dynamodb_table_name
        period: 0
storage_config:
  aws:
    s3: s3://access_key:secret_access_key@region/bucket_name
    dynamodbconfig:
      dynamodb: dynamodb://access_key:secret_access_key@region

You can also use an EC2 instance role instead of hard coding credentials like in the above example. If you wish to do this the storage_config example looks like this:

storage_config:
  aws:
    s3: s3://region/bucket_name
    dynamodbconfig:
      dynamodb: dynamodb://region

S3

Loki is using S3 as object storage. It stores log within directories based on OrgID. For example, Logs from org faker will stored in s3://BUCKET_NAME/faker/.

The S3 configuration is setup with url format: s3://access_key:secret_access_key@region/bucket_name.

For custom S3 endpoint (like Ceph Object Storage with S3 Compatible API), if it's using path-style url rather than virtual hosted bucket addressing, please set config like below:

storage_config:
  aws:
    s3: s3://access_key:secret_access_key@custom_endpoint/bucket_name
    s3forcepathstyle: true

DynamoDB

Loki uses DynamoDB for the index storage. It is used for querying logs, make sure you adjust your throughput to your usage.

DynamoDB access is very similar to S3, however you do not need to specify a table name in the storage section, as Loki will calculate that for you. You will need to set the table name prefix inside schema config section, and ensure the index.prefix table exists.

You can setup DynamoDB by yourself, or have table-manager setup for you. You can find out more info about table manager at Cortex project. There is an example table manager deployment inside the ksonnet deployment method. You can find it here The table-manager allows deleting old indices by rotating a number of different dynamodb tables and deleting the oldest one. If you choose to create the table manually you cannot easily erase old data and your index just grows indefinitely.

If you set your DynamoDB table manually, ensure you set the primary index key to h (string) and use r (binary) as the sort key. Also set the "period" attribute in the yaml to zero. Make sure adjust your throughput base on your usage.

DynamoDB's table manager client defaults provisioning capacity units read to 300 and writes to 3000. If you wish to override these defaults the config section should include:

table_manager:
  index_tables_provisioning:
    provisioned_write_throughput: 10
    provisioned_read_throughput: 10
  chunk_tables_provisioning:
    provisioned_write_throughput: 10
    provisioned_read_throughput: 10