Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 500 empty ring when referring to loki #7614

Open
Amfiass opened this issue Nov 7, 2022 · 11 comments
Open

Error 500 empty ring when referring to loki #7614

Amfiass opened this issue Nov 7, 2022 · 11 comments

Comments

@Amfiass
Copy link

Amfiass commented Nov 7, 2022

Hello!

Faced a problem when installing a loki cluster.
The cluster consists of 5 nodes:

  1. Nginx proxy
  2. loki reader x2
  3. loki writer x2

All have the same configuration file installed with a difference in the target item:

auth_enabled: false
target: "read"

memberlist:
  node_name: "loki-reader2.example.com"
  join_members:
    - "loki-reader1.example.com"
    - "loki-reader2.example.com"
    - "loki-writer1.example.com"
    - "loki-writer2.example.com"
  dead_node_reclaim_time: "30s"
  gossip_to_dead_nodes_time: "15s"
  left_ingesters_timeout: "30s"
  bind_addr: 
    - "0.0.0.0"
  bind_port: 7946

server:
  grpc_listen_port: 9095
  http_listen_port: 3100
  http_server_read_timeout: "1h"
  http_server_write_timeout: "5m"
  http_server_idle_timeout: "5m"
  grpc_server_max_recv_msg_size: 41943040
  grpc_server_max_send_msg_size: 41943040
  log_level: debug
common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: memberlist

querier:
  max_concurrent: 10
  query_ingesters_within: 5h
  query_timeout: 1h
  engine:
    timeout: 5m

schema_config:
  configs:
    - from: 2022-10-01
      store: boltdb-shipper
      object_store: aws
      schema: v11
      index:
        prefix: index_
        period: 24h

compactor:
  retention_enabled: true
  retention_delete_delay: "1h"
  delete_request_cancel_period: "1h"
  working_directory: /loki/compactor
  shared_store: s3
  retention_delete_worker_count: 10
  compactor_ring:
    kvstore:
      store: memberlist

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index-cache
    shared_store: s3
  aws:
    bucketnames: loki
    endpoint: https://s3.example.com
    access_key_id: -----------------
    secret_access_key: ----------------------
    insecure: true
    http_config:
      idle_conn_timeout: 5m
      response_header_timeout: 0s
      insecure_skip_verify: true
    s3forcepathstyle: true

limits_config:
  retention_period: "4320h"
  per_stream_rate_limit: "500MB"
  per_stream_rate_limit_burst: "1000MB"
  max_concurrent_tail_requests: 10000
  ingestion_rate_mb: 100
  ingestion_burst_size_mb: 100
  max_entries_limit_per_query: 100000
  
ingester:
  lifecycler:
    join_after: "60s"
    observe_period: "5s"
    ring:
      replication_factor: 1
      kvstore:
        store: memberlist
    final_sleep: "0s"
  max_chunk_age: "240h"

All healthchecks pass, the service starts.
According to the logs, Loki was able to locate all the members of the ring.
But when trying to connect the cluster to grafana, a 500 error:

Loki: Internal Server Error. 500. empty ring

In the service logs I see the following:

ts=2022-11-07T13:12:13.926196155Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.927323183Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=35.563µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.927996495Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=35.563µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.931994093Z caller=grpc_logging.go:46 method=/frontendv2pb.FrontendForQuerier/QueryResult duration=105.52µs msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.932669943Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.934128211Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.934462835Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=13.737µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.935224122Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=13.737µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.936167945Z caller=grpc_logging.go:46 duration=42.182µs method=/frontendv2pb.FrontendForQuerier/QueryResult msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.936833055Z caller=retry.go:73 org_id=fake msg="error processing request" try=1 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.938052577Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.938284912Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=45.012µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.938793954Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=45.012µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.939859299Z caller=grpc_logging.go:46 duration=44.651µs method=/frontendv2pb.FrontendForQuerier/QueryResult msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.940325981Z caller=retry.go:73 org_id=fake msg="error processing request" try=2 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.941421855Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.941910948Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=7.473µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.942403329Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=7.473µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.943278467Z caller=grpc_logging.go:46 method=/frontendv2pb.FrontendForQuerier/QueryResult duration=28.428µs msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.943797077Z caller=retry.go:73 org_id=fake msg="error processing request" try=3 err="rpc error: code = Code(500) desc = empty ring\n"
ts=2022-11-07T13:12:13.944847003Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-11-07T13:12:13.945103343Z caller=spanlogger.go:80 user=fake method=query.Label level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=8.999µs Summary.QueueTime=0s
ts=2022-11-07T13:12:13.945650128Z caller=spanlogger.go:80 user=fake method=query.Label level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=8.999µs status=500 label= throughput=0B total_bytes=0B total_entries=0
level=debug ts=2022-11-07T13:12:13.946855527Z caller=grpc_logging.go:46 duration=50.382µs method=/frontendv2pb.FrontendForQuerier/QueryResult msg="gRPC (success)"
level=error ts=2022-11-07T13:12:13.947119279Z caller=retry.go:73 org_id=fake msg="error processing request" try=4 err="rpc error: code = Code(500) desc = empty ring\n"
level=error ts=2022-11-07T13:12:13.947827961Z caller=stats.go:57 component=frontend msg="failed to record query metrics" err="expected one of the *LokiRequest, *LokiInstantRequest, *LokiSeriesRequest, *LokiLabelNamesRequest, got "

At the same time, the installation of loki on one server works correctly, logs are written and read.

Tell me, please, where to dig, I can not find any problems.

Also, I tried to connect the distributor to the loki, the logging does not happen either. A 404 error is thrown without any explanation:

level=debug ts=2022-11-07T11:35:11.566374631Z caller=logging.go:76 traceID=7dd739ced5088cbe orgID=fake msg="POST /loki/api/v1/push (404) 87.265µs"
level=debug ts=2022-11-07T11:35:11.653902837Z caller=logging.go:76 traceID=766c347a31371e33 orgID=fake msg="POST /loki/api/v1/push (404) 76.161µs"
level=debug ts=2022-11-07T11:35:11.906294651Z caller=logging.go:76 traceID=58fe0b96a10b3d6b orgID=fake msg="POST /loki/api/v1/push (404) 177.982µs"
level=debug ts=2022-11-07T11:35:11.922494924Z caller=logging.go:76 traceID=0f26b052f9416264 orgID=fake msg="POST /loki/api/v1/push (404) 88.356µs"

At the same time, the s3 storage is available, another lock is written to it in parallel, which is located on a separate server.

@andrewkwok2022
Copy link

hello @Amfiass, I have the similar issue (https://community.grafana.com/t/post-loki-api-v1-push-500-response-empty-ring-and-context-canceled/75544) as you and still no idea how to fix it. Kindly tell me if you find out the solution.

@Schmoho
Copy link

Schmoho commented Dec 6, 2022

I am having the same issue. Any hint or help would be appreciated.

@Amfiass
Copy link
Author

Amfiass commented Dec 6, 2022

@Schmoho @andrewkwok2022 The problem is probably in the key-value store. A possible solution is to change storage to consul or etcd. We solved the problem by deploying the Loki cluster in the kubernetes, since the definition of the members of the ring occurs through the kubernets API.

@Schmoho
Copy link

Schmoho commented Dec 6, 2022

I'm using inmemory kv and any kinds of cloud storage are not an option here

@javier-torres
Copy link

javier-torres commented Jan 31, 2023

Hello @Schmoho @andrewkwok2022

In the official documentation said this:

In this mode the component microservices of Loki are bundled into two targets: -target=read and -target=write. The BoltDB compactor service will run as part of the read target.

You only need deploy section compactor y read node. The config file must be different for -target=read and -target=write.

I had the same problem an change this its works fine.

This is mi basic config for two options:

### -target=read

# Custom Config
auth_enabled: false

server:
  http_listen_port: 3100

memberlist:
  join_members:
    - grafana-loki-read.poc.internal:7946
    - grafana-loki-write.poc.internal:7946

compactor:
  working_directory: /loki/compactor
  shared_store: s3

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: memberlist


storage_config:
  boltdb_shipper:
    active_index_directory: /loki/boltdb-shipper-active
    cache_location: /loki/boltdb-shipper-cache
    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
    shared_store: s3
  aws:
    s3: s3://eu-west-1
    bucketnames: bucket-name

schema_config:
  configs:
    - from: 2020-07-01
      store: boltdb-shipper
      object_store: aws
      schema: v11
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

### -target=write

# Custom Config
auth_enabled: false

server:
  http_listen_port: 3100

memberlist:
  join_members:
    - grafana-loki-read.poc.internal:7946
    - grafana-loki-write.poc.internal:7946

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: memberlist


storage_config:
  boltdb_shipper:
    active_index_directory: /loki/boltdb-shipper-active
    cache_location: /loki/boltdb-shipper-cache
    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
    shared_store: s3
  aws:
    s3: s3://eu-west-1
    bucketnames: bucket-name

schema_config:
  configs:
    - from: 2020-07-01
      store: boltdb-shipper
      object_store: aws
      schema: v11
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

@sazima
Copy link

sazima commented Sep 9, 2023

same issue

@remram44
Copy link

Also have this issue. I am running Loki as a single container with common: ring: kvstore: store: inmemory

@remram44
Copy link

Is there documentation for how this "ring" thing is configured? What is the difference between memberlist and inmemory?

A search for inmemory through the entire documentation has like 2 hits. Judging by this issue, this seems like a critical component which could probably be described somewhere?

@HuyDo-95g
Copy link

I also have a same issue.

@sojjan1337
Copy link

sojjan1337 commented Jan 12, 2024

running loki-distributed. promtail says: caller=client.go:419 component=client host=[loki.test.com](http://loki.test.com/) msg="error send ing batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): empty ring"

This error msg comes up as soon as loki is down, why can't promtail reconnect correctly when loki comes up again? If i reinstall promtail again after loki is up, then i have no issues. Or it's isnt a problem, because i can retreive logs as usual.

@Vinaum8
Copy link

Vinaum8 commented Jun 20, 2024

same error here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants