The Helm chart 5.2.0 breaks existing installations using S3 as backend for block storage #7196

tobernguyen · 2024-01-23T19:59:27Z

Describe the bug

The removal of the default backend: s3 in block_storage in mimir configuration breaks existing installations using S3 as backend. This is the commit: 20b578a

To fix this for my installation, I had to add to values.yaml

mimir:
  structuredConfig:
    blocks_storage:
      backend: s3

To Reproduce

Steps to reproduce the behavior:

Upgrade Helm from 5.1.x to 5.2.0 without setting mimir.structuredConfig.blocks_storage.backend and use AWS S3 for block storage.
It broke my installation, most components reporting couldn't upload block to long-term storage. For example, in the ingestor:

ts=2024-01-23T19:20:21.940476241Z caller=shipper.go:162 level=error user=anonymous msg="uploading new block to long-term storage failed" block=01HMVVAR4WPFRCFWEPSE39MCGT err="upload chunks: upload file /data/tsdb/anonymous/01HMVVAR4WPFRCFWEPSE39MCGT/chunks/000001 as 01HMVVAR4WPFRCFWEPSE39MCGT/chunks/000001: mkdir /blocks: read-only file system"

Expected behavior

It shouldn't break existing installations using S3 (but not Minio). Or notifying this breaking change and the user needs to set mimir.structuredConfig.blocks_storage.backend to s3.

Environment

Infrastructure: Kubernetes
Deployment tool: helm

Additional Context

Full mimir block in values.yaml

mimir:
  structuredConfig:
    blocks_storage:
      backend: s3
      s3:
        endpoint: s3.us-east-1.amazonaws.com
        bucket_name: REDACTED
      bucket_store:
        chunks_cache:
          memcached:
            timeout: 2s
    ruler_storage:
      backend: s3
      s3:
        endpoint: s3.us-east-1.amazonaws.com
        bucket_name: REDACTED
    alertmanager_storage:
      backend: s3
      s3:
        endpoint: s3.us-east-1.amazonaws.com
        bucket_name: REDACTED
    server:
      http_server_write_timeout: 30m
      http_server_read_timeout: 30m
      http_server_idle_timeout: 30m
    limits:
      accept_ha_samples: true
    distributor:
      ha_tracker:
        enable_ha_tracker: true
        kvstore:
          store: etcd
          etcd:
            endpoints:
              - mimir-etcd:2379
    frontend:
      max_outstanding_per_tenant: 2048
    query_scheduler:
      max_outstanding_requests_per_tenant: 2048

The text was updated successfully, but these errors were encountered:

dimitarvdimitrov · 2024-01-24T10:21:13Z

thanks for reporting. I only now realized the default for backend is filesystem, not s3. I'll kick off a 5.2.1 release with a fix

dimitarvdimitrov · 2024-01-24T12:08:43Z

5.2.1 was just released reverting the offending commit. Sorry for the inconvenience.

This shouldn't cause any permanent data loss since the blocks would remain on ingesters' disk until successfully shipped.

There could be temporary data loss as the blocks move out of the time period which queriers query ingesters for. This should resolve after the ingesters upload the blocks to S3 and the store-gateways them (after installing 5.2.1 or updating the mimir config).

narqo added the helm label Jan 24, 2024

This was referenced Jan 24, 2024

Revert "helm: omit backend: s3 when minio is disabled" #7199

Merged

Helm: release 5.2.1 #7202

Merged

dimitarvdimitrov closed this as completed Jan 24, 2024

QuantumEnigmaa mentioned this issue Jan 24, 2024

Ingest data within Mimir giantswarm/roadmap#3152

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Helm chart 5.2.0 breaks existing installations using S3 as backend for block storage #7196

The Helm chart 5.2.0 breaks existing installations using S3 as backend for block storage #7196

tobernguyen commented Jan 23, 2024

dimitarvdimitrov commented Jan 24, 2024

dimitarvdimitrov commented Jan 24, 2024

The Helm chart 5.2.0 breaks existing installations using S3 as backend for block storage #7196

The Helm chart 5.2.0 breaks existing installations using S3 as backend for block storage #7196

Comments

tobernguyen commented Jan 23, 2024

Describe the bug

To Reproduce

Expected behavior

Environment

Additional Context

dimitarvdimitrov commented Jan 24, 2024

dimitarvdimitrov commented Jan 24, 2024