Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not full ipv6 support, components can't discover each other or can't communicate with each other correctly #11362

Open
dbazhal opened this issue Dec 1, 2023 · 7 comments
Labels
type/bug Somehing is not working as expected

Comments

@dbazhal
Copy link

dbazhal commented Dec 1, 2023

Describe the bug
Deployed last loki (2.9.2) with last simple scalable helm chart (5.39.0). Running simple query.
Queries won't work with:

# k logs loki-read-85f5499f64-7w9g7 --tail 1
Defaulted container "loki" out of: loki, copy-vault-env (init)
level=error ts=2023-12-01T18:05:59.523140298Z caller=scheduler_processor.go:252 org_id=fake frontend=2a02:6bf:fa17:100:48c5::15:9095 msg="error health checking" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: address 2a02:6bf:fa17:100:48c5::15:9095: too many colons in address\""

(2a02:6bf:fa17:100:48c5::15 is the address of the reader instance. so it can't communicate with itself)

I used loki 2.8.2 with similar MY_POD_IP hacks earlier, but I was forced to set

frontend:
  scheduler_address: ""

to disable scheduler. With scheduler enabled, querier reported same error that he can't connect to query scheduler to return results - too many colons in address.
I was hoping that loki learned working with ipv6 since then.
Without MY_POD_IP hack on ipv6 stack instances can't even find each other.

To Reproduce

helm upgrade --install -n loki -f values.yaml loki grafana/loki --version 5.39.0

here's values.yaml

loki:
  commonConfig:
    instance_addr: "${MY_POD_IP}"
    ring:
      kvstore:
        store: memberlist
      instance_addr: "${MY_POD_IP}"
      instance_enable_ipv6: true
  extraMemberlistConfig:
    advertise_addr: "${MY_POD_IP}"
    bind_addr:
      - "${MY_POD_IP}"
  query_scheduler:
    use_scheduler_ring: true
    max_outstanding_requests_per_tenant: 4096
  ingester:
    max_chunk_age: 1h
    wal:
      dir: /var/loki/wal
    lifecycler:
      enable_inet6: true
      address: "${MY_POD_IP}"
  image:
    repository: grafana/loki
    tag: 2.8.2
  podAnnotations:
    vault.security.banzaicloud.io/vault-addr: "host"
    vault.security.banzaicloud.io/vault-path: "path"
    vault.security.banzaicloud.io/vault-role: "role"
    vault.security.banzaicloud.io/vault-skip-verify: "true"
  limits_config:
    retention_period: 14d
    enforce_metric_name: false
    reject_old_samples: false
    reject_old_samples_max_age: 14d
    max_cache_freshness_per_query: 10m
    ingestion_rate_mb: 4
    ingestion_burst_size_mb: 6
    max_global_streams_per_user: 5000
    split_queries_by_interval: 15m
    max_query_parallelism: 32
    max_query_lookback: 14d
    max_query_series: 1000
    max_chunks_per_query: 2000000
    max_streams_matchers_per_query: 1000
    query_timeout: 10m
  querier:
    query_ingesters_within: 1h
    engine:
      timeout: 10m
    max_concurrent: 10
  server:
    http_server_read_timeout: 10m
    http_server_write_timeout: 300s
  analytics:
    reporting_enabled: false
  storage_config:
    hedging: null
    boltdb_shipper:
      active_index_directory: /var/loki/boltdb-index
      cache_location: /var/loki/boltdb-cache
      query_ready_num_days: 7
      shared_store: s3
    tsdb_shipper:
      active_index_directory: /var/loki/tsdb-index
      cache_location: /var/loki/tsdb-cache
      query_ready_num_days: 7
      shared_store: s3
  storage:
    bucketNames:
      chunks: "loki-chunks"
    s3:
      region: "aws-region"
      accessKeyId: ${AWS_ACCESS_KEY_ID}
      secretAccessKey: ${AWS_SECRET_ACCESS_KEY}
  schemaConfig:
    configs:
      - from: 2020-01-01
        store: boltdb-shipper
        object_store: s3
        schema: v12
        index:
          prefix: boltdb_index_
          period: 24h
  compactor:
    working_directory: /var/loki/tsdb-compactor
    shared_store: s3
    retention_enabled: true
  auth_enabled: false

read:
  extraArgs:
    - '-config.expand-env=true'
  extraEnv:
    - name: AWS_ACCESS_KEY_ID
      value: "vault:x"
    - name: AWS_SECRET_ACCESS_KEY
      value: "vault:x"
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
  tolerations:
    - key: infrastructure_node
      operator: Exists
  nodeSelector:
    kubernetes.io/os: linux
    node_type: infrastructure

write:
  extraArgs:
    - '-config.expand-env=true'
  extraEnv:
    - name: AWS_ACCESS_KEY_ID
      value: "vault:x"
    - name: AWS_SECRET_ACCESS_KEY
      value: "vault:x"
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
  tolerations:
    - key: infrastructure_node
      operator: Exists
  nodeSelector:
    kubernetes.io/os: linux
    node_type: infrastructure

backend:
  extraArgs:
    - '-config.expand-env=true'
  extraEnv:
    - name: AWS_ACCESS_KEY_ID
      value: "vault:x"
    - name: AWS_SECRET_ACCESS_KEY
      value: "vault:x"
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
  tolerations:
    - key: infrastructure_node
      operator: Exists
  nodeSelector:
    kubernetes.io/os: linux
    node_type: infrastructure

gateway:
  extraEnv:
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
  tolerations:
    - key: infrastructure_node
      operator: Exists
  nodeSelector:
    kubernetes.io/os: linux
    node_type: infrastructure

monitoring:
  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false
  lokiCanary:
    enabled: false
  dashboards:
    enabled: false
  rules:
    enabled: false

test:
  enabled: false

here's final loki config from loki configmap

    analytics:
      reporting_enabled: false
    auth_enabled: false
    common:
      compactor_address: 'loki-backend'
      instance_addr: ${MY_POD_IP}
      path_prefix: /var/loki
      replication_factor: 3
      ring:
        instance_addr: ${MY_POD_IP}
        instance_enable_ipv6: true
        kvstore:
          store: memberlist
      storage:
        s3:
          access_key_id: ${AWS_ACCESS_KEY_ID}
          bucketnames: x
          insecure: false
          region: x
          s3forcepathstyle: false
          secret_access_key: ${AWS_SECRET_ACCESS_KEY}
    compactor:
      retention_enabled: true
      shared_store: s3
      working_directory: /var/loki/tsdb-compactor
    frontend:
      scheduler_address: query-scheduler-discovery.loki.svc.cluster.local.:9095
    frontend_worker:
      scheduler_address: query-scheduler-discovery.loki.svc.cluster.local.:9095
    index_gateway:
      mode: ring
    ingester:
      lifecycler:
        address: ${MY_POD_IP}
        enable_inet6: true
      max_chunk_age: 1h
      wal:
        dir: /var/loki/wal
    limits_config:
      enforce_metric_name: false
      ingestion_burst_size_mb: 50
      ingestion_rate_mb: 30
      max_cache_freshness_per_query: 10m
      max_chunks_per_query: 2000000
      max_global_streams_per_user: 10000
      max_query_lookback: 7d
      max_query_parallelism: 32
      max_query_series: 1000
      max_streams_matchers_per_query: 1000
      query_timeout: 10m
      reject_old_samples: false
      reject_old_samples_max_age: 7d
      retention_period: 7d
      split_queries_by_interval: 15m
    memberlist:
      advertise_addr: ${MY_POD_IP}
      bind_addr:
      - ${MY_POD_IP}
      join_members:
      - loki-memberlist
    querier:
      engine:
        timeout: 10m
      max_concurrent: 10
      query_ingesters_within: 1h
    query_range:
      align_queries_with_step: true
    query_scheduler:
      max_outstanding_requests_per_tenant: 4096
      use_scheduler_ring: true
    ruler:
      storage:
        s3:
          access_key_id: ${AWS_ACCESS_KEY_ID}
          bucketnames: x
          insecure: false
          region: x
          s3forcepathstyle: false
          secret_access_key: ${AWS_SECRET_ACCESS_KEY}
        type: s3
    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml
    schema_config:
      configs:
      - from: "2020-01-01"
        index:
          period: 24h
          prefix: boltdb_index_
        object_store: s3
        schema: v12
        store: boltdb-shipper
    server:
      grpc_listen_port: 9095
      http_listen_port: 3100
      http_server_read_timeout: 10m
      http_server_write_timeout: 300s
    storage_config:
      boltdb_shipper:
        active_index_directory: /var/loki/boltdb-index
        cache_location: /var/loki/boltdb-cache
        query_ready_num_days: 7
        shared_store: s3
      tsdb_shipper:
        active_index_directory: /var/loki/tsdb-index
        cache_location: /var/loki/tsdb-cache
        query_ready_num_days: 7
        shared_store: s3
    tracing:
      enabled: false

Deploy loki. Make simple query, look into reader logs.

Expected behavior
I would expect that every component will discover each other correctly and communicate between each other correctly.

Environment:
eks
helm

Screenshots, Promtail config, or terminal output
no

@dbazhal
Copy link
Author

dbazhal commented Dec 4, 2023

Just to add to the issue - when I set instance_address or address to [${MY_POD_IP}], in some components I run into addresses like [[::1]]:9005 and still no success in communication between components.

@JStickler JStickler added the type/bug Somehing is not working as expected label Dec 4, 2023
@periklis
Copy link
Collaborator

periklis commented Dec 4, 2023

I believe this issue is addressed by this PR:

@JStickler
Copy link
Contributor

@dbazhal Did the PR that Periklis reference resolve your issue? Can we close this?

@dbazhal
Copy link
Author

dbazhal commented Feb 5, 2024

@dbazhal Did the PR that Periklis reference resolve your issue? Can we close this?

I just hope that it did, waiting for the next release(it didn't get into 2.9.4).

@periklis
Copy link
Collaborator

periklis commented Feb 5, 2024

@dbazhal Actually it won't be released until 3.0 if we don't request a backport fo 2.9.x. Let me add the label and pursue this with the maintainers team.

@periklis
Copy link
Collaborator

periklis commented Feb 5, 2024

I triggered a manual backport but with #11870

@dbazhal
Copy link
Author

dbazhal commented Feb 6, 2024

@periklis thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Somehing is not working as expected
Projects
None yet
Development

No branches or pull requests

3 participants