Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Helm chart 5.2.0 breaks existing installations using S3 as backend for block storage #7196

Closed
tobernguyen opened this issue Jan 23, 2024 · 2 comments
Labels

Comments

@tobernguyen
Copy link

Describe the bug

The removal of the default backend: s3 in block_storage in mimir configuration breaks existing installations using S3 as backend. This is the commit: 20b578a

To fix this for my installation, I had to add to values.yaml

mimir:
  structuredConfig:
    blocks_storage:
      backend: s3

To Reproduce

Steps to reproduce the behavior:

  1. Upgrade Helm from 5.1.x to 5.2.0 without setting mimir.structuredConfig.blocks_storage.backend and use AWS S3 for block storage.
  2. It broke my installation, most components reporting couldn't upload block to long-term storage. For example, in the ingestor:
ts=2024-01-23T19:20:21.940476241Z caller=shipper.go:162 level=error user=anonymous msg="uploading new block to long-term storage failed" block=01HMVVAR4WPFRCFWEPSE39MCGT err="upload chunks: upload file /data/tsdb/anonymous/01HMVVAR4WPFRCFWEPSE39MCGT/chunks/000001 as 01HMVVAR4WPFRCFWEPSE39MCGT/chunks/000001: mkdir /blocks: read-only file system"

Expected behavior

It shouldn't break existing installations using S3 (but not Minio). Or notifying this breaking change and the user needs to set mimir.structuredConfig.blocks_storage.backend to s3.

Environment

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Additional Context

Full mimir block in values.yaml

mimir:
  structuredConfig:
    blocks_storage:
      backend: s3
      s3:
        endpoint: s3.us-east-1.amazonaws.com
        bucket_name: REDACTED
      bucket_store:
        chunks_cache:
          memcached:
            timeout: 2s
    ruler_storage:
      backend: s3
      s3:
        endpoint: s3.us-east-1.amazonaws.com
        bucket_name: REDACTED
    alertmanager_storage:
      backend: s3
      s3:
        endpoint: s3.us-east-1.amazonaws.com
        bucket_name: REDACTED
    server:
      http_server_write_timeout: 30m
      http_server_read_timeout: 30m
      http_server_idle_timeout: 30m
    limits:
      accept_ha_samples: true
    distributor:
      ha_tracker:
        enable_ha_tracker: true
        kvstore:
          store: etcd
          etcd:
            endpoints:
              - mimir-etcd:2379
    frontend:
      max_outstanding_per_tenant: 2048
    query_scheduler:
      max_outstanding_requests_per_tenant: 2048
@narqo narqo added the helm label Jan 24, 2024
@dimitarvdimitrov
Copy link
Contributor

thanks for reporting. I only now realized the default for backend is filesystem, not s3. I'll kick off a 5.2.1 release with a fix

@dimitarvdimitrov
Copy link
Contributor

5.2.1 was just released reverting the offending commit. Sorry for the inconvenience.

This shouldn't cause any permanent data loss since the blocks would remain on ingesters' disk until successfully shipped.

There could be temporary data loss as the blocks move out of the time period which queriers query ingesters for. This should resolve after the ingesters upload the blocks to S3 and the store-gateways them (after installing 5.2.1 or updating the mimir config).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants