Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error While Deploying Loki on AKS #12419

Closed
ritesh-makerble opened this issue Apr 1, 2024 · 6 comments
Closed

Error While Deploying Loki on AKS #12419

ritesh-makerble opened this issue Apr 1, 2024 · 6 comments

Comments

@ritesh-makerble
Copy link

ritesh-makerble commented Apr 1, 2024

level=error ts=2024-04-01T12:59:12.470714561Z caller=loki.go:386 msg="module failed" module=memberlist-kv error="invalid service state: Failed, expected: Running, failure: service &{0xc0001e8b40 { true 10000000000 4 30000000000 20 │
│ level=error ts=2024-04-01T12:59:12.470937763Z caller=loki.go:386 msg="module failed" module=querier error="failed to start querier, because it depends on module store, which has failed: context canceled"                            │
│ level=error ts=2024-04-01T12:59:12.471081065Z caller=loki.go:386 msg="module failed" module=compactor error="failed to start compactor, because it depends on module memberlist-kv, which has failed: invalid service state: Failed, e │
│ level=error ts=2024-04-01T12:59:12.471257767Z caller=loki.go:386 msg="module failed" module=query-scheduler error="failed to start query-scheduler, because it depends on module memberlist-kv, which has failed: invalid service stat │
│ level=error ts=2024-04-01T12:59:12.471381269Z caller=loki.go:386 msg="module failed" module=usage-report error="failed to start usage-report, because it depends on module memberlist-kv, which has failed: invalid service state: Fai │
│ level=error ts=2024-04-01T12:59:12.47152657Z caller=loki.go:386 msg="module failed" module=ingester-querier error="failed to start ingester-querier, because it depends on module memberlist-kv, which has failed: invalid service sta │
│ level=error ts=2024-04-01T12:59:12.471687772Z caller=loki.go:386 msg="module failed" module=store error="failed to start store, because it depends on module memberlist-kv, which has failed: invalid service state: Failed, expected: │
│ level=error ts=2024-04-01T12:59:12.471978876Z caller=loki.go:386 msg="module failed" module=ring error="failed to start ring, because it depends on module memberlist-kv, which has failed: invalid service state: Failed, expected: R │
│ level=error ts=2024-04-01T12:59:12.472110977Z caller=loki.go:386 msg="module failed" module=ingester error="failed to start ingester, because it depends on module ring, which has failed: context canceled"                           │
│ level=error ts=2024-04-01T12:59:12.472236979Z caller=loki.go:386 msg="module failed" module=query-frontend error="failed to start query-frontend, because it depends on module query-scheduler, which has failed: context canceled"    │
│ level=error ts=2024-04-01T12:59:12.47235488Z caller=loki.go:386 msg="module failed" module=distributor error="failed to start distributor, because it depends on module usage-report, which has failed: context canceled"              │
│ level=info ts=2024-04-01T12:59:12.471905075Z caller=module_service.go:114 msg="module stopped" module=query-frontend-tripperware                                                                                                       │
│ level=info ts=2024-04-01T12:59:12.47486741Z caller=modules.go:1090 msg="server stopped"                                                                                                                                                │
│ level=info ts=2024-04-01T12:59:12.474902711Z caller=module_service.go:114 msg="module stopped" module=server                                                                                                                           │
│ level=info ts=2024-04-01T12:59:12.474935511Z caller=loki.go:375 msg="Loki stopped"
@win5923
Copy link
Contributor

win5923 commented Apr 1, 2024

Can you provide the values.yaml? I'm currently using Loki on AKS without any issues.

@ritesh-makerble
Copy link
Author

global:
  image:
    # -- Overrides the Docker registry globally for all images
    registry: null
  # -- Overrides the priorityClassName for all pods
  priorityClassName: null
  # -- configures cluster domain ("cluster.local" by default)
  clusterDomain: "cluster.local"
  # -- configures DNS service name
  dnsService: "kube-dns"
  # -- configures DNS service namespace
  dnsNamespace: "kube-system"
# -- Overrides the chart's name
nameOverride: null
# -- Overrides the chart's computed fullname
fullnameOverride: null
# -- Overrides the chart's cluster label
clusterLabelOverride: null
# -- Image pull secrets for Docker images
imagePullSecrets: []
kubectlImage:
  # -- The Docker registry
  registry: docker.io
  # -- Docker image repository
  repository: bitnami/kubectl
  # -- Overrides the image tag whose default is the chart's appVersion
  tag: null
  # -- Overrides the image tag with an image digest
  digest: null
  # -- Docker image pull policy
  pullPolicy: IfNotPresent
loki:
  # Configures the readiness probe for all of the Loki pods
  readinessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 30
    timeoutSeconds: 1
  image:
    # -- The Docker registry
    registry: docker.io
    # -- Docker image repository
    repository: grafana/loki
    # -- Overrides the image tag whose default is the chart's appVersion
    # TODO: needed for 3rd target backend functionality
    # revert to null or latest once this behavior is relased
    tag: null
    # -- Overrides the image tag with an image digest
    digest: null
    # -- Docker image pull policy
    pullPolicy: IfNotPresent
  # -- Common annotations for all deployments/StatefulSets
  annotations: {}
  # -- Common annotations for all pods
  podAnnotations: {}
  # -- Common labels for all pods
  podLabels: {}
  # -- Common annotations for all services
  serviceAnnotations: {}
  # -- Common labels for all services
  serviceLabels: {}
  # -- The number of old ReplicaSets to retain to allow rollback
  revisionHistoryLimit: 10
  # -- The SecurityContext for Loki pods
  podSecurityContext:
    fsGroup: 10001
    runAsGroup: 10001
    runAsNonRoot: true
    runAsUser: 10001
  # -- The SecurityContext for Loki containers
  containerSecurityContext:
    readOnlyRootFilesystem: true
    capabilities:
      drop:
        - ALL
    allowPrivilegeEscalation: false
  # -- Should enableServiceLinks be enabled. Default to enable
  enableServiceLinks: true
  # -- Specify an existing secret containing loki configuration. If non-empty, overrides `loki.config`
  existingSecretForConfig: ""
  # -- Defines what kind of object stores the configuration, a ConfigMap or a Secret.
  # In order to move sensitive information (such as credentials) from the ConfigMap/Secret to a more secure location (e.g. vault), it is possible to use [environment variables in the configuration](https://grafana.com/docs/loki/latest/configuration/#use-environment-variables-in-the-configuration).
  # Such environment variables can be then stored in a separate Secret and injected via the global.extraEnvFrom value. For details about environment injection from a Secret please see [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/#use-case-as-container-environment-variables).
  configStorageType: ConfigMap
  # -- Name of the Secret or ConfigMap that contains the configuration (used for naming even if config is internal).
  externalConfigSecretName: '{{ include "loki.name" . }}'
  # -- Config file contents for Loki
  # @default -- See values.yaml
  config: |
    {{- if .Values.enterprise.enabled}}
    {{- tpl .Values.enterprise.config . }}
    {{- else }}
    auth_enabled: {{ .Values.loki.auth_enabled }}
    {{- end }}

    {{- with .Values.loki.server }}
    server:
      {{- toYaml . | nindent 2}}
    {{- end}}

    memberlist:
      bind_addr:
        - ${MY_POD_IP}
    {{- if .Values.loki.memberlistConfig }}
      {{- toYaml .Values.loki.memberlistConfig | nindent 2 }}
    {{- else }}
    {{- if .Values.loki.extraMemberlistConfig}}
    {{- toYaml .Values.loki.extraMemberlistConfig | nindent 2}}
    {{- end }}
      join_members:
        - {{ include "loki.memberlist" . }}
        {{- with .Values.migrate.fromDistributed }}
        {{- if .enabled }}
        - {{ .memberlistService }}
        {{- end }}
        {{- end }}
    {{- end }}

    {{- with .Values.loki.ingester }}
    ingester:
      extraEnv:
        - name: MY_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
      extraArgs:
        - -config.expand-env=true
    {{- if .Values.loki.commonConfig}}
    common:
    {{- toYaml .Values.loki.commonConfig | nindent 2}}
      storage:
      {{- include "loki.commonStorageConfig" . | nindent 4}}
    {{- end}}

    {{- with .Values.loki.limits_config }}
    limits_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml

    {{- with .Values.loki.memcached.chunk_cache }}
    {{- if and .enabled (or .host .addresses) }}
    chunk_store_config:
      chunk_cache_config:
        memcached:
          batch_size: {{ .batch_size }}
          parallelism: {{ .parallelism }}
        memcached_client:
          {{- if .host }}
          host: {{ .host }}
          {{- end }}
          {{- if .addresses }}
          addresses: {{ .addresses }}
          {{- end }}
          service: {{ .service }}
    {{- end }}
    {{- end }}

    {{- if .Values.loki.schemaConfig }}
    schema_config:
    {{- toYaml .Values.loki.schemaConfig | nindent 2}}
    {{- else }}
    schema_config:
      configs:
        - from: 2022-01-11
          store: boltdb-shipper
          object_store: {{ .Values.loki.storage.type }}
          schema: v12
          index:
            prefix: loki_index_
            period: 24h
    {{- end }}

    {{ include "loki.rulerConfig" . }}

    {{- if or .Values.tableManager.retention_deletes_enabled .Values.tableManager.retention_period }}
    table_manager:
      retention_deletes_enabled: {{ .Values.tableManager.retention_deletes_enabled }}
      retention_period: {{ .Values.tableManager.retention_period }}
    {{- end }}

    {{- with .Values.loki.memcached.results_cache }}
    query_range:
      align_queries_with_step: true
      {{- if and .enabled (or .host .addresses) }}
      cache_results: {{ .enabled }}
      results_cache:
        cache:
          default_validity: {{ .default_validity }}
          memcached_client:
            {{- if .host }}
            host: {{ .host }}
            {{- end }}
            {{- if .addresses }}
            addresses: {{ .addresses }}
            {{- end }}
            service: {{ .service }}
            timeout: {{ .timeout }}
      {{- end }}
    {{- end }}

    {{- with .Values.loki.storage_config }}
    storage_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.query_scheduler }}
    query_scheduler:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.compactor }}
    compactor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.analytics }}
    analytics:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.querier }}
    querier:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.index_gateway }}
    index_gateway:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend }}
    frontend:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend_worker }}
    frontend_worker:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.distributor }}
    distributor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    tracing:
      enabled: {{ .Values.loki.tracing.enabled }}
  # Should authentication be enabled
  auth_enabled: true
  # -- memberlist configuration (overrides embedded default)
  memberlistConfig: {}
  # -- Extra memberlist configuration
  extraMemberlistConfig: {}
  # -- Tenants list to be created on nginx htpasswd file, with name and password keys
  tenants: []
  # -- Check https://grafana.com/docs/loki/latest/configuration/#server for more info on the server configuration.
  server:
    http_listen_port: 3100
    grpc_listen_port: 9095
  # -- Limits config
  limits_config:
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    max_cache_freshness_per_query: 10m
    split_queries_by_interval: 15m
  # -- Provides a reloadable runtime configuration file for some specific configuration
  runtimeConfig: {}
  # -- Check https://grafana.com/docs/loki/latest/configuration/#common_config for more info on how to provide a common configuration
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
    compactor_address: '{{ include "loki.compactorAddress" . }}'
  # -- Storage config. Providing this will automatically populate all necessary storage configs in the templated config.
  storage:
    bucketNames:
      chunks: chunks
      ruler: ruler
      admin: admin
    type: s3
    s3:
      s3: null
      endpoint: null
      region: null
      secretAccessKey: null
      accessKeyId: null
      signatureVersion: null
      s3ForcePathStyle: false
      insecure: false
      http_config: {}
      # -- Check https://grafana.com/docs/loki/latest/configure/#s3_storage_config for more info on how to provide a backoff_config
      backoff_config: {}
    gcs:
      chunkBufferSize: 0
      requestTimeout: "0s"
      enableHttp2: true
    azure:
      accountName: null
      accountKey: null
      connectionString: null
      useManagedIdentity: false
      useFederatedToken: false
      userAssignedId: null
      requestTimeout: null
      endpointSuffix: null
    swift:
      auth_version: null
      auth_url: null
      internal: null
      username: null
      user_domain_name: null
      user_domain_id: null
      user_id: null
      password: null
      domain_id: null
      domain_name: null
      project_id: null
      project_name: null
      project_domain_id: null
      project_domain_name: null
      region_name: null
      container_name: null
      max_retries: null
      connect_timeout: null
      request_timeout: null
    filesystem:
      chunks_directory: /var/loki/chunks
      rules_directory: /var/loki/rules
  # -- Configure memcached as an external cache for chunk and results cache. Disabled by default
  # must enable and specify a host for each cache you would like to use.
  memcached:
    chunk_cache:
      enabled: false
      host: ""
      service: "memcached-client"
      batch_size: 256
      parallelism: 10
    results_cache:
      enabled: false
      host: ""
      service: "memcached-client"
      timeout: "500ms"
      default_validity: "12h"
  # -- Check https://grafana.com/docs/loki/latest/configuration/#schema_config for more info on how to configure schemas
  schemaConfig: {}
  # -- Check https://grafana.com/docs/loki/latest/configuration/#ruler for more info on configuring ruler
  rulerConfig: {}
  # -- Structured loki configuration, takes precedence over `loki.config`, `loki.schemaConfig`, `loki.storageConfig`
  structuredConfig: {}
  # -- Additional query scheduler config
  query_scheduler: {}
  # -- Additional storage config
  storage_config:
    hedging:
      at: "250ms"
      max_per_second: 20
      up_to: 3
  # --  Optional compactor configuration
  compactor: {}
  # --  Optional analytics configuration
  analytics: {}
  # --  Optional querier configuration
  querier: {}
  # --  Optional ingester configuration
  ingester: {}
  # --  Optional index gateway configuration
  index_gateway:
    mode: ring
  frontend:
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
  frontend_worker:
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
  # -- Optional distributor configuration
  distributor: {}
  # -- Enable tracing
  tracing:
    enabled: false
enterprise:
  # Enable enterprise features, license must be provided
  enabled: false
  # Default verion of GEL to deploy
  version: v1.8.6
  # -- Optional name of the GEL cluster, otherwise will use .Release.Name
  # The cluster name must match what is in your GEL license
  cluster_name: null
  # -- Grafana Enterprise Logs license
  # In order to use Grafana Enterprise Logs features, you will need to provide
  # the contents of your Grafana Enterprise Logs license, either by providing the
  # contents of the license.jwt, or the name Kubernetes Secret that contains your
  # license.jwt.
  # To set the license contents, use the flag `--set-file 'enterprise.license.contents=./license.jwt'`
  license:
    contents: "NOTAVALIDLICENSE"
  # -- Set to true when providing an external license
  useExternalLicense: false
  # -- Name of external license secret to use
  externalLicenseName: null
  # -- Name of the external config secret to use
  externalConfigName: ""
  # -- If enabled, the correct admin_client storage will be configured. If disabled while running enterprise,
  # make sure auth is set to `type: trust`, or that `auth_enabled` is set to `false`.
  adminApi:
    enabled: true
  # enterprise specific sections of the config.yaml file
  config: |
    {{- if .Values.enterprise.adminApi.enabled }}
    {{- if or .Values.minio.enabled (eq .Values.loki.storage.type "s3") (eq .Values.loki.storage.type "gcs") (eq .Values.loki.storage.type "azure") }}
    admin_client:
      storage:
        s3:
          bucket_name: {{ .Values.loki.storage.bucketNames.admin }}
    {{- end }}
    {{- end }}
    auth:
      type: {{ .Values.enterprise.adminApi.enabled | ternary "enterprise" "trust" }}
    auth_enabled: {{ .Values.loki.auth_enabled }}
    cluster_name: {{ include "loki.clusterName" . }}
    license:
      path: /etc/loki/license/license.jwt
  image:
    # -- The Docker registry
    registry: docker.io
    # -- Docker image repository
    repository: grafana/enterprise-logs
    # -- Docker image tag
    tag: null
    # -- Overrides the image tag with an image digest
    digest: null
    # -- Docker image pull policy
    pullPolicy: IfNotPresent
  adminToken:
    # -- Alternative name for admin token secret, needed by tokengen and provisioner jobs
    secret: null
    # -- Additional namespace to also create the token in. Useful if your Grafana instance
    # is in a different namespace
    additionalNamespaces: []
  # -- Alternative name of the secret to store token for the canary
  canarySecret: null
  # -- Configuration for `tokengen` target
  tokengen:
    # -- Whether the job should be part of the deployment
    enabled: true
    # -- Comma-separated list of Loki modules to load for tokengen
    targetModule: "tokengen"
    # -- Additional CLI arguments for the `tokengen` target
    extraArgs: []
    # -- Additional Kubernetes environment
    env: []
    # -- Additional labels for the `tokengen` Job
    labels: {}
    # -- Additional annotations for the `tokengen` Job
    annotations: {}
    # -- Tolerations for tokengen Job
    tolerations: []
    # -- Additional volumes for Pods
    extraVolumes: []
    # -- Additional volume mounts for Pods
    extraVolumeMounts: []
    # -- Run containers as user `enterprise-logs(uid=10001)`
    securityContext:
      runAsNonRoot: true
      runAsGroup: 10001
      runAsUser: 10001
      fsGroup: 10001
    # -- Environment variables from secrets or configmaps to add to the tokengen pods
    extraEnvFrom: []
    # -- The name of the PriorityClass for tokengen Pods
    priorityClassName: ""
  # -- Configuration for `provisioner` target
  provisioner:
    # -- Whether the job should be part of the deployment
    enabled: true
    # -- Name of the secret to store provisioned tokens in
    provisionedSecretPrefix: null
    # -- Additional tenants to be created. Each tenant will get a read and write policy
    # and associated token. Tenant must have a name and a namespace for the secret containting
    # the token to be created in. For example
    # additionalTenants:
    #   - name: loki
    #     secretNamespace: grafana
    additionalTenants: []
    # -- Additional Kubernetes environment
    env: []
    # -- Additional labels for the `provisioner` Job
    labels: {}
    # -- Additional annotations for the `provisioner` Job
    annotations: {}
    # -- The name of the PriorityClass for provisioner Job
    priorityClassName: null
    # -- Run containers as user `enterprise-logs(uid=10001)`
    securityContext:
      runAsNonRoot: true
      runAsGroup: 10001
      runAsUser: 10001
      fsGroup: 10001
    # -- Provisioner image to Utilize
    image:
      # -- The Docker registry
      registry: docker.io
      # -- Docker image repository
      repository: grafana/enterprise-logs-provisioner
      # -- Overrides the image tag whose default is the chart's appVersion
      tag: null
      # -- Overrides the image tag with an image digest
      digest: null
      # -- Docker image pull policy
      pullPolicy: IfNotPresent
    # -- Volume mounts to add to the provisioner pods
    extraVolumeMounts: []
# -- Options that may be necessary when performing a migration from another helm chart
migrate:
  # -- When migrating from a distributed chart like loki-distributed or enterprise-logs
  fromDistributed:
    # -- Set to true if migrating from a distributed helm chart
    enabled: false
    # -- If migrating from a distributed service, provide the distributed deployment's
    # memberlist service DNS so the new deployment can join its ring.
    memberlistService: ""
serviceAccount:
  # -- Specifies whether a ServiceAccount should be created
  create: true
  # -- The name of the ServiceAccount to use.
  # If not set and create is true, a name is generated using the fullname template
  name: null
  # -- Image pull secrets for the service account
  imagePullSecrets: []
  # -- Annotations for the service account
  annotations: {}
  # -- Labels for the service account
  labels: {}
  # -- Set this toggle to false to opt out of automounting API credentials for the service account
  automountServiceAccountToken: true
# RBAC configuration
rbac:
  # -- If pspEnabled true, a PodSecurityPolicy is created for K8s that use psp.
  pspEnabled: false
  # -- For OpenShift set pspEnabled to 'false' and sccEnabled to 'true' to use the SecurityContextConstraints.
  sccEnabled: false
  # -- Specify PSP annotations
  # Ref: https://kubernetes.io/docs/reference/access-authn-authz/psp-to-pod-security-standards/#podsecuritypolicy-annotations
  pspAnnotations: {}
  # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
  # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
  # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
  # -- Whether to install RBAC in the namespace only or cluster-wide. Useful if you want to watch ConfigMap globally.
  namespaced: false
# -- Section for configuring optional Helm test
test:
  enabled: true
  # -- Address of the prometheus server to query for the test
  prometheusAddress: "http://prometheus:9090"
  # -- Number of times to retry the test before failing
  timeout: 1m
  # -- Additional labels for the test pods
  labels: {}
  # -- Additional annotations for test pods
  annotations: {}
  # -- Image to use for loki canary
  image:
    # -- The Docker registry
    registry: docker.io
    # -- Docker image repository
    repository: grafana/loki-helm-test
    # -- Overrides the image tag whose default is the chart's appVersion
    tag: null
    # -- Overrides the image tag with an image digest
    digest: null
    # -- Docker image pull policy
    pullPolicy: IfNotPresent
# Monitoring section determines which monitoring features to enable
monitoring:
  # Dashboards for monitoring Loki
  dashboards:
    # -- If enabled, create configmap with dashboards for monitoring Loki
    enabled: true
    # -- Alternative namespace to create dashboards ConfigMap in
    namespace: null
    # -- Additional annotations for the dashboards ConfigMap
    annotations: {}
    # -- Labels for the dashboards ConfigMap
    labels:
      grafana_dashboard: "1"
  # Recording rules for monitoring Loki, required for some dashboards
  rules:
    # -- If enabled, create PrometheusRule resource with Loki recording rules
    enabled: true
    # -- Include alerting rules
    alerting: true
    # -- Specify which individual alerts should be disabled
    # -- Instead of turning off each alert one by one, set the .monitoring.rules.alerting value to false instead.
    # -- If you disable all the alerts and keep .monitoring.rules.alerting set to true, the chart will fail to render.
    disabled: {}
    #  LokiRequestErrors: true
    #  LokiRequestPanics: true
    # -- Alternative namespace to create PrometheusRule resources in
    namespace: null
    # -- Additional annotations for the rules PrometheusRule resource
    annotations: {}
    # -- Additional labels for the rules PrometheusRule resource
    labels: {}
    # -- Additional labels for PrometheusRule alerts
    additionalRuleLabels: {}
    # -- Additional groups to add to the rules file
    additionalGroups: []
    # - name: additional-loki-rules
    #   rules:
    #     - record: job:loki_request_duration_seconds_bucket:sum_rate
    #       expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job)
    #     - record: job_route:loki_request_duration_seconds_bucket:sum_rate
    #       expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route)
    #     - record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
    #       expr: sum(rate(container_cpu_usage_seconds_total[1m])) by (node, namespace, pod, container)
  # ServiceMonitor configuration
  serviceMonitor:
    # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
    enabled: true
    # -- Namespace selector for ServiceMonitor resources
    namespaceSelector: {}
    # -- ServiceMonitor annotations
    annotations: {}
    # -- Additional ServiceMonitor labels
    labels: {}
    # -- ServiceMonitor scrape interval
    # Default is 15s because included recording rules use a 1m rate, and scrape interval needs to be at
    # least 1/4 rate interval.
    interval: 15s
    # -- ServiceMonitor scrape timeout in Go duration format (e.g. 15s)
    scrapeTimeout: null
    # -- ServiceMonitor relabel configs to apply to samples before scraping
    # https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#relabelconfig
    relabelings: []
    # -- ServiceMonitor metric relabel configs to apply to samples before ingestion
    # https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#endpoint
    metricRelabelings: []
    # -- ServiceMonitor will use http by default, but you can pick https as well
    scheme: http
    # -- ServiceMonitor will use these tlsConfig settings to make the health check requests
    tlsConfig: null
    # -- If defined, will create a MetricsInstance for the Grafana Agent Operator.
    metricsInstance:
      # -- If enabled, MetricsInstance resources for Grafana Agent Operator are created
      enabled: true
      # -- MetricsInstance annotations
      annotations: {}
      # -- Additional MetricsInstance labels
      labels: {}
      # -- If defined a MetricsInstance will be created to remote write metrics.
      remoteWrite: null
  # Self monitoring determines whether Loki should scrape its own logs.
  # This feature currently relies on the Grafana Agent Operator being installed,
  # which is installed by default using the grafana-agent-operator sub-chart.
  # It will create custom resources for GrafanaAgent, LogsInstance, and PodLogs to configure
  # scrape configs to scrape its own logs with the labels expected by the included dashboards.
  selfMonitoring:
    enabled: true
    # -- Tenant to use for self monitoring
    tenant:
      # -- Name of the tenant
      name: "self-monitoring"
      # -- Namespace to create additional tenant token secret in. Useful if your Grafana instance
      # is in a separate namespace. Token will still be created in the canary namespace.
      secretNamespace: "{{ .Release.Namespace }}"
    # Grafana Agent configuration
    grafanaAgent:
      # -- Controls whether to install the Grafana Agent Operator and its CRDs.
      # Note that helm will not install CRDs if this flag is enabled during an upgrade.
      # In that case install the CRDs manually from https://github.com/grafana/agent/tree/main/production/operator/crds
      installOperator: true
      # -- Grafana Agent annotations
      annotations: {}
      # -- Additional Grafana Agent labels
      labels: {}
      # -- Enable the config read api on port 8080 of the agent
      enableConfigReadAPI: false
      # -- The name of the PriorityClass for GrafanaAgent pods
      priorityClassName: null
      # -- Resource requests and limits for the grafanaAgent pods
      resources: {}
      #   limits:
      #     memory: 200Mi
      #   requests:
      #     cpu: 50m
      #     memory: 100Mi
      # -- Tolerations for GrafanaAgent pods
      tolerations: []
    # PodLogs configuration
    podLogs:
      # -- PodLogs version
      apiVersion: monitoring.grafana.com/v1alpha1
      # -- PodLogs annotations
      annotations: {}
      # -- Additional PodLogs labels
      labels: {}
      # -- PodLogs relabel configs to apply to samples before scraping
      # https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#relabelconfig
      relabelings: []
      # -- Additional pipeline stages to process logs after scraping
      # https://grafana.com/docs/agent/latest/operator/api/#pipelinestagespec-a-namemonitoringgrafanacomv1alpha1pipelinestagespeca
      additionalPipelineStages: []
    # LogsInstance configuration
    logsInstance:
      # -- LogsInstance annotations
      annotations: {}
      # -- Additional LogsInstance labels
      labels: {}
      # -- Additional clients for remote write
      clients: null
  # The Loki canary pushes logs to and queries from this loki installation to test
  # that it's working correctly
  lokiCanary:
    enabled: true
    # -- The name of the label to look for at loki when doing the checks.
    labelname: pod
    # -- Additional annotations for the `loki-canary` Daemonset
    annotations: {}
    # -- Additional labels for each `loki-canary` pod
    podLabels: {}
    service:
      # -- Annotations for loki-canary Service
      annotations: {}
      # -- Additional labels for loki-canary Service
      labels: {}
    # -- Additional CLI arguments for the `loki-canary' command
    extraArgs: []
    # -- Environment variables to add to the canary pods
    extraEnv: []
    # -- Environment variables from secrets or configmaps to add to the canary pods
    extraEnvFrom: []
    # -- Resource requests and limits for the canary
    resources: {}
    # -- DNS config for canary pods
    dnsConfig: {}
    # -- Node selector for canary pods
    nodeSelector: {}
    # -- Tolerations for canary pods
    tolerations: []
    # -- The name of the PriorityClass for loki-canary pods
    priorityClassName: null
    # -- Image to use for loki canary
    image:
      # -- The Docker registry
      registry: docker.io
      # -- Docker image repository
      repository: grafana/loki-canary
      # -- Overrides the image tag whose default is the chart's appVersion
      tag: null
      # -- Overrides the image tag with an image digest
      digest: null
      # -- Docker image pull policy
      pullPolicy: IfNotPresent
    # -- Update strategy for the `loki-canary` Daemonset pods
    updateStrategy:
      type: RollingUpdate
      rollingUpdate:
        maxUnavailable: 1
# Configuration for the write pod(s)
write:
  # -- Number of replicas for the write
  replicas: 3
  autoscaling:
    # -- Enable autoscaling for the write.
    enabled: false
    # -- Minimum autoscaling replicas for the write.
    minReplicas: 2
    # -- Maximum autoscaling replicas for the write.
    maxReplicas: 6
    # -- Target CPU utilisation percentage for the write.
    targetCPUUtilizationPercentage: 60
    # -- Target memory utilization percentage for the write.
    targetMemoryUtilizationPercentage:
    # -- Behavior policies while scaling.
    behavior:
      # -- see https://github.com/grafana/loki/blob/main/docs/sources/operations/storage/wal.md#how-to-scale-updown for scaledown details
      scaleUp:
        policies:
          - type: Pods
            value: 1
            periodSeconds: 900
      scaleDown:
        policies:
          - type: Pods
            value: 1
            periodSeconds: 1800
        stabilizationWindowSeconds: 3600
  image:
    # -- The Docker registry for the write image. Overrides `loki.image.registry`
    registry: null
    # -- Docker image repository for the write image. Overrides `loki.image.repository`
    repository: null
    # -- Docker image tag for the write image. Overrides `loki.image.tag`
    tag: null
  # -- The name of the PriorityClass for write pods
  priorityClassName: null
  # -- Annotations for write StatefulSet
  annotations: {}
  # -- Annotations for write pods
  podAnnotations: {}
  # -- Additional labels for each `write` pod
  podLabels: {}
  # -- Additional selector labels for each `write` pod
  selectorLabels: {}
  service:
    # -- Annotations for write Service
    annotations: {}
    # -- Additional labels for write Service
    labels: {}
  # -- Comma-separated list of Loki modules to load for the write
  targetModule: "write"
  # -- Additional CLI args for the write
  extraArgs: []
  # -- Environment variables to add to the write pods
  extraEnv: []
  # -- Environment variables from secrets or configmaps to add to the write pods
  extraEnvFrom: []
  # -- Lifecycle for the write container
  lifecycle: {}
  # -- The default /flush_shutdown preStop hook is recommended as part of the ingester
  # scaledown process so it's added to the template by default when autoscaling is enabled,
  # but it's disabled to optimize rolling restarts in instances that will never be scaled
  # down or when using chunks storage with WAL disabled.
  # https://github.com/grafana/loki/blob/main/docs/sources/operations/storage/wal.md#how-to-scale-updown
  # -- Init containers to add to the write pods
  initContainers: []
  # -- Containers to add to the write pods
  extraContainers: []
  # -- Volume mounts to add to the write pods
  extraVolumeMounts: []
  # -- Volumes to add to the write pods
  extraVolumes: []
  # -- volumeClaimTemplates to add to StatefulSet
  extraVolumeClaimTemplates: []
  # -- Resource requests and limits for the write
  resources: {}
  # -- Grace period to allow the write to shutdown before it is killed. Especially for the ingester,
  # this must be increased. It must be long enough so writes can be gracefully shutdown flushing/transferring
  # all data and to successfully leave the member ring on shutdown.
  terminationGracePeriodSeconds: 300
  # -- Affinity for write pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  # affinity: |
  #   podAntiAffinity:
  #     requiredDuringSchedulingIgnoredDuringExecution:
  #       - labelSelector:
  #           matchLabels:
  #             {{- include "loki.writeSelectorLabels" . | nindent 10 }}
  #         topologyKey: kubernetes.io/hostname
  # -- DNS config for write pods
  dnsConfig: {}
  # -- Node selector for write pods
  nodeSelector: {}
  # -- Topology Spread Constraints for write pods
  topologySpreadConstraints: []
  # -- Tolerations for write pods
  affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node_group
          operator: In
          values:
            - spot-0
  tolerations: 
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  # -- The default is to deploy all pods in parallel.
  podManagementPolicy: "Parallel"
  persistence:
    # -- Enable volume claims in pod spec
    volumeClaimsEnabled: true
    # -- Parameters used for the `data` volume when volumeClaimEnabled if false
    dataVolumeParameters:
      emptyDir: {}
    # -- Enable StatefulSetAutoDeletePVC feature
    enableStatefulSetAutoDeletePVC: false
    # -- Size of persistent disk
    size: 10Gi
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: null
    # -- Selector for persistent disk
    selector: null
# Configuration for the table-manager
tableManager:
  # -- Specifies whether the table-manager should be enabled
  enabled: false
  image:
    # -- The Docker registry for the table-manager image. Overrides `loki.image.registry`
    registry: null
    # -- Docker image repository for the table-manager image. Overrides `loki.image.repository`
    repository: null
    # -- Docker image tag for the table-manager image. Overrides `loki.image.tag`
    tag: null
  # -- Command to execute instead of defined in Docker image
  command: null
  # -- The name of the PriorityClass for table-manager pods
  priorityClassName: null
  # -- Labels for table-manager pods
  podLabels: {}
  # -- Annotations for table-manager deployment
  annotations: {}
  # -- Annotations for table-manager pods
  podAnnotations: {}
  service:
    # -- Annotations for table-manager Service
    annotations: {}
    # -- Additional labels for table-manager Service
    labels: {}
  # -- Additional CLI args for the table-manager
  extraArgs: []
  # -- Environment variables to add to the table-manager pods
  extraEnv: []
  # -- Environment variables from secrets or configmaps to add to the table-manager pods
  extraEnvFrom: []
  # -- Volume mounts to add to the table-manager pods
  extraVolumeMounts: []
  # -- Volumes to add to the table-manager pods
  extraVolumes: []
  # -- Resource requests and limits for the table-manager
  resources: {}
  # -- Containers to add to the table-manager pods
  extraContainers: []
  # -- Grace period to allow the table-manager to shutdown before it is killed
  terminationGracePeriodSeconds: 30
  # -- Affinity for table-manager pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  # affinity: |
  #   podAntiAffinity:
  #     requiredDuringSchedulingIgnoredDuringExecution:
  #       - labelSelector:
  #           matchLabels:
  #             {{- include "loki.tableManagerSelectorLabels" . | nindent 10 }}
  #         topologyKey: kubernetes.io/hostname
  #     preferredDuringSchedulingIgnoredDuringExecution:
  #       - weight: 100
  #         podAffinityTerm:
  #           labelSelector:
  #             matchLabels:
  #               {{- include "loki.tableManagerSelectorLabels" . | nindent 12 }}
  #           topologyKey: failure-domain.beta.kubernetes.io/zone
  # -- DNS config table-manager pods
  dnsConfig: {}
  # -- Node selector for table-manager pods
  nodeSelector: {}
  # -- Tolerations for table-manager pods
  affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node_group
          operator: In
          values:
            - spot-0
  tolerations: 
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  # -- Enable deletes by retention
  retention_deletes_enabled: false
  # -- Set retention period
  retention_period: 0
# Configuration for the read pod(s)
read:
  # -- Number of replicas for the read
  replicas: 3
  autoscaling:
    # -- Enable autoscaling for the read, this is only used if `queryIndex.enabled: true`
    enabled: false
    # -- Minimum autoscaling replicas for the read
    minReplicas: 2
    # -- Maximum autoscaling replicas for the read
    maxReplicas: 6
    # -- Target CPU utilisation percentage for the read
    targetCPUUtilizationPercentage: 60
    # -- Target memory utilisation percentage for the read
    targetMemoryUtilizationPercentage:
    # -- Behavior policies while scaling.
    behavior: {}
    #  scaleUp:
    #   stabilizationWindowSeconds: 300
    #   policies:
    #   - type: Pods
    #     value: 1
    #     periodSeconds: 60
    #  scaleDown:
    #   stabilizationWindowSeconds: 300
    #   policies:
    #   - type: Pods
    #     value: 1
    #     periodSeconds: 180
  image:
    # -- The Docker registry for the read image. Overrides `loki.image.registry`
    registry: null
    # -- Docker image repository for the read image. Overrides `loki.image.repository`
    repository: null
    # -- Docker image tag for the read image. Overrides `loki.image.tag`
    tag: null
  # -- The name of the PriorityClass for read pods
  priorityClassName: null
  # -- Annotations for read deployment
  annotations: {}
  # -- Annotations for read pods
  podAnnotations: {}
  # -- Additional labels for each `read` pod
  podLabels: {}
  # -- Additional selector labels for each `read` pod
  selectorLabels: {}
  service:
    # -- Annotations for read Service
    annotations: {}
    # -- Additional labels for read Service
    labels: {}
  # -- Comma-separated list of Loki modules to load for the read
  targetModule: "read"
  # -- Whether or not to use the 2 target type simple scalable mode (read, write) or the
  # 3 target type (read, write, backend). Legacy refers to the 2 target type, so true will
  # run two targets, false will run 3 targets.
  legacyReadTarget: false
  # -- Additional CLI args for the read
  extraArgs: []
  # -- Containers to add to the read pods
  extraContainers: []
  # -- Environment variables to add to the read pods
  extraEnv: []
  # -- Environment variables from secrets or configmaps to add to the read pods
  extraEnvFrom: []
  # -- Lifecycle for the read container
  lifecycle: {}
  # -- Volume mounts to add to the read pods
  extraVolumeMounts: []
  # -- Volumes to add to the read pods
  extraVolumes: []
  # -- Resource requests and limits for the read
  resources: {}
  # -- Grace period to allow the read to shutdown before it is killed
  terminationGracePeriodSeconds: 30
  # -- Affinity for read pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  # affinity: |
  #   podAntiAffinity:
  #     requiredDuringSchedulingIgnoredDuringExecution:
  #       - labelSelector:
  #           matchLabels:
  #             {{- include "loki.readSelectorLabels" . | nindent 10 }}
  #         topologyKey: kubernetes.io/hostname
  # -- DNS config for read pods
  dnsConfig: {}
  # -- Node selector for read pods
  nodeSelector: {}
  # -- Topology Spread Constraints for read pods
  topologySpreadConstraints: []
  # -- Tolerations for read pods
  affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node_group
          operator: In
          values:
            - spot-0
  tolerations: 
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  # -- The default is to deploy all pods in parallel.
  podManagementPolicy: "Parallel"
  persistence:
    # -- Enable StatefulSetAutoDeletePVC feature
    enableStatefulSetAutoDeletePVC: true
    # -- Size of persistent disk
    size: 10Gi
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: null
    # -- Selector for persistent disk
    selector: null
# Configuration for the backend pod(s)
backend:
  # -- Number of replicas for the backend
  replicas: 3
  autoscaling:
    # -- Enable autoscaling for the backend.
    enabled: false
    # -- Minimum autoscaling replicas for the backend.
    minReplicas: 3
    # -- Maximum autoscaling replicas for the backend.
    maxReplicas: 6
    # -- Target CPU utilization percentage for the backend.
    targetCPUUtilizationPercentage: 60
    # -- Target memory utilization percentage for the backend.
    targetMemoryUtilizationPercentage:
    # -- Behavior policies while scaling.
    behavior: {}
    #    scaleUp:
    #     stabilizationWindowSeconds: 300
    #     policies:
    #     - type: Pods
    #       value: 1
    #       periodSeconds: 60
    #    scaleDown:
    #     stabilizationWindowSeconds: 300
    #     policies:
    #     - type: Pods
    #       value: 1
    #       periodSeconds: 180
  image:
    # -- The Docker registry for the backend image. Overrides `loki.image.registry`
    registry: null
    # -- Docker image repository for the backend image. Overrides `loki.image.repository`
    repository: null
    # -- Docker image tag for the backend image. Overrides `loki.image.tag`
    tag: null
  # -- The name of the PriorityClass for backend pods
  priorityClassName: null
  # -- Annotations for backend StatefulSet
  annotations: {}
  # -- Annotations for backend pods
  podAnnotations: {}
  # -- Additional labels for each `backend` pod
  podLabels: {}
  # -- Additional selector labels for each `backend` pod
  selectorLabels: {}
  service:
    # -- Annotations for backend Service
    annotations: {}
    # -- Additional labels for backend Service
    labels: {}
  # -- Comma-separated list of Loki modules to load for the read
  targetModule: "backend"
  # -- Additional CLI args for the backend
  extraArgs: []
  # -- Environment variables to add to the backend pods
  extraEnv: []
  # -- Environment variables from secrets or configmaps to add to the backend pods
  extraEnvFrom: []
  # -- Init containers to add to the backend pods
  initContainers: []
  # -- Volume mounts to add to the backend pods
  extraVolumeMounts: []
  # -- Volumes to add to the backend pods
  extraVolumes: []
  # -- Resource requests and limits for the backend
  resources: {}
  # -- Grace period to allow the backend to shutdown before it is killed. Especially for the ingester,
  # this must be increased. It must be long enough so backends can be gracefully shutdown flushing/transferring
  # all data and to successfully leave the member ring on shutdown.
  terminationGracePeriodSeconds: 300
  # -- Affinity for backend pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  # affinity: |
  #   podAntiAffinity:
  #     requiredDuringSchedulingIgnoredDuringExecution:
  #       - labelSelector:
  #           matchLabels:
  #             {{- include "loki.backendSelectorLabels" . | nindent 10 }}
  #         topologyKey: kubernetes.io/hostname
  # -- DNS config for backend pods
  dnsConfig: {}
  # -- Node selector for backend pods
  nodeSelector: {}
  # -- Topology Spread Constraints for backend pods
  topologySpreadConstraints: []
  # -- Tolerations for backend pods
  affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node_group
          operator: In
          values:
            - spot-0
  tolerations: 
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  # -- The default is to deploy all pods in parallel.
  podManagementPolicy: "Parallel"
  persistence:
    # -- Enable volume claims in pod spec
    volumeClaimsEnabled: true
    # -- Parameters used for the `data` volume when volumeClaimEnabled if false
    dataVolumeParameters:
      emptyDir: {}
    # -- Enable StatefulSetAutoDeletePVC feature
    enableStatefulSetAutoDeletePVC: true
    # -- Size of persistent disk
    size: 10Gi
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: null
    # -- Selector for persistent disk
    selector: null
# Configuration for the single binary node(s)
singleBinary:
  # -- Number of replicas for the single binary
  replicas: 0
  autoscaling:
    # -- Enable autoscaling
    enabled: false
    # -- Minimum autoscaling replicas for the single binary
    minReplicas: 1
    # -- Maximum autoscaling replicas for the single binary
    maxReplicas: 3
    # -- Target CPU utilisation percentage for the single binary
    targetCPUUtilizationPercentage: 60
    # -- Target memory utilisation percentage for the single binary
    targetMemoryUtilizationPercentage:
  image:
    # -- The Docker registry for the single binary image. Overrides `loki.image.registry`
    registry: null
    # -- Docker image repository for the single binary image. Overrides `loki.image.repository`
    repository: null
    # -- Docker image tag for the single binary image. Overrides `loki.image.tag`
    tag: null
  # -- The name of the PriorityClass for single binary pods
  priorityClassName: null
  # -- Annotations for single binary StatefulSet
  annotations: {}
  # -- Annotations for single binary pods
  podAnnotations: {}
  # -- Additional labels for each `single binary` pod
  podLabels: {}
  # -- Additional selector labels for each `single binary` pod
  selectorLabels: {}
  service:
    # -- Annotations for single binary Service
    annotations: {}
    # -- Additional labels for single binary Service
    labels: {}
  # -- Comma-separated list of Loki modules to load for the single binary
  targetModule: "all"
  # -- Labels for single binary service
  extraArgs: []
  # -- Environment variables to add to the single binary pods
  extraEnv: []
  # -- Environment variables from secrets or configmaps to add to the single binary pods
  extraEnvFrom: []
  # -- Extra containers to add to the single binary loki pod
  extraContainers: []
  # -- Init containers to add to the single binary pods
  initContainers: []
  # -- Volume mounts to add to the single binary pods
  extraVolumeMounts: []
  # -- Volumes to add to the single binary pods
  extraVolumes: []
  # -- Resource requests and limits for the single binary
  resources: {}
  # -- Grace period to allow the single binary to shutdown before it is killed
  terminationGracePeriodSeconds: 30
  # -- Affinity for single binary pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  # affinity: |
  #   podAntiAffinity:
  #     requiredDuringSchedulingIgnoredDuringExecution:
  #       - labelSelector:
  #           matchLabels:
  #             {{- include "loki.singleBinarySelectorLabels" . | nindent 10 }}
  #         topologyKey: kubernetes.io/hostname
  # -- DNS config for single binary pods
  dnsConfig: {}
  # -- Node selector for single binary pods
  nodeSelector: {}
  # -- Tolerations for single binary pods
  affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node_group
          operator: In
          values:
            - spot-0
  tolerations: 
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  persistence:
    # -- Enable StatefulSetAutoDeletePVC feature
    enableStatefulSetAutoDeletePVC: true
    # -- Enable persistent disk
    enabled: true
    # -- Size of persistent disk
    size: 10Gi
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: null
    # -- Selector for persistent disk
    selector: null
# Use either this ingress or the gateway, but not both at once.
# If you enable this, make sure to disable the gateway.
# You'll need to supply authn configuration for your ingress controller.
ingress:
  enabled: false
  ingressClassName: ""
  annotations: {}
  #    nginx.ingress.kubernetes.io/auth-type: basic
  #    nginx.ingress.kubernetes.io/auth-secret: loki-distributed-basic-auth
  #    nginx.ingress.kubernetes.io/auth-secret-type: auth-map
  #    nginx.ingress.kubernetes.io/configuration-snippet: |
  #      proxy_set_header X-Scope-OrgID $remote_user;
  labels: {}
  #    blackbox.monitoring.exclude: "true"
  paths:
    write:
      - /api/prom/push
      - /loki/api/v1/push
    read:
      - /api/prom/tail
      - /loki/api/v1/tail
      - /loki/api
      - /api/prom/rules
      - /loki/api/v1/rules
      - /prometheus/api/v1/rules
      - /prometheus/api/v1/alerts
    singleBinary:
      - /api/prom/push
      - /loki/api/v1/push
      - /api/prom/tail
      - /loki/api/v1/tail
      - /loki/api
      - /api/prom/rules
      - /loki/api/v1/rules
      - /prometheus/api/v1/rules
      - /prometheus/api/v1/alerts
  # -- Hosts configuration for the ingress, passed through the `tpl` function to allow templating
  hosts:
    - loki.example.com
  # -- TLS configuration for the ingress. Hosts passed through the `tpl` function to allow templating
  tls: []
#    - hosts:
#       - loki.example.com
#      secretName: loki-distributed-tls

# Configuration for the memberlist service
memberlist:
  service:
    publishNotReadyAddresses: false
# Configuration for the gateway
gateway:
  # -- Specifies whether the gateway should be enabled
  enabled: true
  # -- Number of replicas for the gateway
  replicas: 1
  # -- Enable logging of 2xx and 3xx HTTP requests
  verboseLogging: true
  autoscaling:
    # -- Enable autoscaling for the gateway
    enabled: false
    # -- Minimum autoscaling replicas for the gateway
    minReplicas: 1
    # -- Maximum autoscaling replicas for the gateway
    maxReplicas: 3
    # -- Target CPU utilisation percentage for the gateway
    targetCPUUtilizationPercentage: 60
    # -- Target memory utilisation percentage for the gateway
    targetMemoryUtilizationPercentage:
    # -- See `kubectl explain deployment.spec.strategy` for more
    # -- ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
    # -- Behavior policies while scaling.
    behavior: {}
    #    scaleUp:
    #     stabilizationWindowSeconds: 300
    #     policies:
    #     - type: Pods
    #       value: 1
    #       periodSeconds: 60
    #    scaleDown:
    #     stabilizationWindowSeconds: 300
    #     policies:
    #     - type: Pods
    #       value: 1
    #       periodSeconds: 180
  deploymentStrategy:
    type: RollingUpdate
  image:
    # -- The Docker registry for the gateway image
    registry: docker.io
    # -- The gateway image repository
    repository: nginxinc/nginx-unprivileged
    # -- The gateway image tag
    tag: 1.24-alpine
    # -- Overrides the gateway image tag with an image digest
    digest: null
    # -- The gateway image pull policy
    pullPolicy: IfNotPresent
  # -- The name of the PriorityClass for gateway pods
  priorityClassName: null
  # -- Annotations for gateway deployment
  annotations: {}
  # -- Annotations for gateway pods
  podAnnotations: {}
  # -- Additional labels for gateway pods
  podLabels: {}
  # -- Additional CLI args for the gateway
  extraArgs: []
  # -- Environment variables to add to the gateway pods
  extraEnv: []
  # -- Environment variables from secrets or configmaps to add to the gateway pods
  extraEnvFrom: []
  # -- Lifecycle for the gateway container
  lifecycle: {}
  # -- Volumes to add to the gateway pods
  extraVolumes: []
  # -- Volume mounts to add to the gateway pods
  extraVolumeMounts: []
  # -- The SecurityContext for gateway containers
  podSecurityContext:
    fsGroup: 101
    runAsGroup: 101
    runAsNonRoot: true
    runAsUser: 101
  # -- The SecurityContext for gateway containers
  containerSecurityContext:
    readOnlyRootFilesystem: true
    capabilities:
      drop:
        - ALL
    allowPrivilegeEscalation: false
  # -- Resource requests and limits for the gateway
  resources: {}
  # -- Containers to add to the gateway pods
  extraContainers: []
  # -- Grace period to allow the gateway to shutdown before it is killed
  terminationGracePeriodSeconds: 30
  # -- Affinity for gateway pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  # affinity: |
  #   podAntiAffinity:
  #     requiredDuringSchedulingIgnoredDuringExecution:
  #       - labelSelector:
  #           matchLabels:
  #             {{- include "loki.gatewaySelectorLabels" . | nindent 10 }}
  #         topologyKey: kubernetes.io/hostname
  # -- DNS config for gateway pods
  dnsConfig: {}
  # -- Node selector for gateway pods
  nodeSelector: {}
  # -- Topology Spread Constraints for gateway pods
  topologySpreadConstraints: []
  # -- Tolerations for gateway pods
  affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node_group
          operator: In
          values:
            - spot-0
  tolerations: 
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  # Gateway service configuration
  service:
    # -- Port of the gateway service
    port: 80
    # -- Type of the gateway service
    type: ClusterIP
    # -- ClusterIP of the gateway service
    clusterIP: null
    # -- (int) Node port if service type is NodePort
    nodePort: null
    # -- Load balancer IPO address if service type is LoadBalancer
    loadBalancerIP: null
    # -- Annotations for the gateway service
    annotations: {}
    # -- Labels for gateway service
    labels: {}
  # Gateway ingress configuration
  ingress:
    # -- Specifies whether an ingress for the gateway should be created
    enabled: false
    # -- Ingress Class Name. MAY be required for Kubernetes versions >= 1.18
    ingressClassName: ""
    # -- Annotations for the gateway ingress
    annotations: {}
    # -- Labels for the gateway ingress
    labels: {}
    # -- Hosts configuration for the gateway ingress, passed through the `tpl` function to allow templating
    hosts:
      - host: gateway.loki.example.com
        paths:
          - path: /
            # -- pathType (e.g. ImplementationSpecific, Prefix, .. etc.) might also be required by some Ingress Controllers
            # pathType: Prefix
    # -- TLS configuration for the gateway ingress. Hosts passed through the `tpl` function to allow templating
    tls:
      - secretName: loki-gateway-tls
        hosts:
          - gateway.loki.example.com
  # Basic auth configuration
  basicAuth:
    # -- Enables basic authentication for the gateway
    enabled: false
    # -- The basic auth username for the gateway
    username: null
    # -- The basic auth password for the gateway
    password: null
    # -- Uses the specified users from the `loki.tenants` list to create the htpasswd file
    # if `loki.tenants` is not set, the `gateway.basicAuth.username` and `gateway.basicAuth.password` are used
    # The value is templated using `tpl`. Override this to use a custom htpasswd, e.g. in case the default causes
    # high CPU load.
    htpasswd: >-
      {{ if .Values.loki.tenants }}

        {{- range $t := .Values.loki.tenants }}
      {{ htpasswd (required "All tenants must have a 'name' set" $t.name) (required "All tenants must have a 'password' set" $t.password) }}

        {{- end }}
      {{ else }} {{ htpasswd (required "'gateway.basicAuth.username' is required" .Values.gateway.basicAuth.username) (required "'gateway.basicAuth.password' is required" .Values.gateway.basicAuth.password) }} {{ end }}
    # -- Existing basic auth secret to use. Must contain '.htpasswd'
    existingSecret: null
  # Configures the readiness probe for the gateway
  readinessProbe:
    httpGet:
      path: /
      port: http
    initialDelaySeconds: 15
    timeoutSeconds: 1
  nginxConfig:
    # -- Enable listener for IPv6, disable on IPv4-only systems
    enableIPv6: true
    # -- NGINX log format
    logFormat: |-
      main '$remote_addr - $remote_user [$time_local]  $status '
              '"$request" $body_bytes_sent "$http_referer" '
              '"$http_user_agent" "$http_x_forwarded_for"';
    # -- Allows appending custom configuration to the server block
    serverSnippet: ""
    # -- Allows appending custom configuration to the http block, passed through the `tpl` function to allow templating
    httpSnippet: >-
      {{ if .Values.loki.tenants }}proxy_set_header X-Scope-OrgID $remote_user;{{ end }}
    # -- Override Read URL
    customReadUrl: null
    # -- Override Write URL
    customWriteUrl: null
    # -- Override Backend URL
    customBackendUrl: null
    # -- Allows overriding the DNS resolver address nginx will use.
    resolver: ""
    # -- Config file contents for Nginx. Passed through the `tpl` function to allow templating
    # @default -- See values.yaml
    file: |
      {{- include "loki.nginxFile" . | indent 2 -}}
networkPolicy:
  # -- Specifies whether Network Policies should be created
  enabled: false
  # -- Specifies whether the policies created will be standard Network Policies (flavor: kubernetes)
  # or Cilium Network Policies (flavor: cilium)
  flavor: kubernetes
  metrics:
    # -- Specifies the Pods which are allowed to access the metrics port.
    # As this is cross-namespace communication, you also need the namespaceSelector.
    podSelector: {}
    # -- Specifies the namespaces which are allowed to access the metrics port
    namespaceSelector: {}
    # -- Specifies specific network CIDRs which are allowed to access the metrics port.
    # In case you use namespaceSelector, you also have to specify your kubelet networks here.
    # The metrics ports are also used for probes.
    cidrs: []
  ingress:
    # -- Specifies the Pods which are allowed to access the http port.
    # As this is cross-namespace communication, you also need the namespaceSelector.
    podSelector: {}
    # -- Specifies the namespaces which are allowed to access the http port
    namespaceSelector: {}
  alertmanager:
    # -- Specify the alertmanager port used for alerting
    port: 9093
    # -- Specifies the alertmanager Pods.
    # As this is cross-namespace communication, you also need the namespaceSelector.
    podSelector: {}
    # -- Specifies the namespace the alertmanager is running in
    namespaceSelector: {}
  externalStorage:
    # -- Specify the port used for external storage, e.g. AWS S3
    ports: []
    # -- Specifies specific network CIDRs you want to limit access to
    cidrs: []
  discovery:
    # -- (int) Specify the port used for discovery
    port: null
    # -- Specifies the Pods labels used for discovery.
    # As this is cross-namespace communication, you also need the namespaceSelector.
    podSelector: {}
    # -- Specifies the namespace the discovery Pods are running in
    namespaceSelector: {}
  egressWorld:
    # -- Enable additional cilium egress rules to external world for write, read and backend.
    enabled: false
  egressKubeApiserver:
    # -- Enable additional cilium egress rules to kube-apiserver for backend.
    enabled: false
# -------------------------------------
# Configuration for `minio` child chart
# -------------------------------------
minio:
  enabled: false
  replicas: 1
  # Minio requires 2 to 16 drives for erasure code (drivesPerNode * replicas)
  # https://docs.min.io/docs/minio-erasure-code-quickstart-guide
  # Since we only have 1 replica, that means 2 drives must be used.
  drivesPerNode: 2
  rootUser: enterprise-logs
  rootPassword: supersecret
  buckets:
    - name: chunks
      policy: none
      purge: false
    - name: ruler
      policy: none
      purge: false
    - name: admin
      policy: none
      purge: false
  persistence:
    size: 5Gi
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
# Create extra manifests via values. Would be passed through `tpl` for templating
extraObjects: []
# - apiVersion: v1
#   kind: ConfigMap
#   metadata:
#     name: loki-alerting-rules
#   data:
#     loki-alerting-rules.yaml: |-
#       groups:
#         - name: example
#           rules:
#           - alert: example
#             expr: |
#               sum(count_over_time({app="loki"} |~ "error")) > 0
#             for: 3m
#             labels:
#               severity: warning
#               category: logs
#             annotations:
#               message: "loki has encountered errors"

sidecar:
  image:
    # -- The Docker registry and image for the k8s sidecar
    repository: kiwigrid/k8s-sidecar
    # -- Docker image tag
    tag: 1.24.3
    # -- Docker image sha. If empty, no sha will be used
    sha: ""
    # -- Docker image pull policy
    pullPolicy: IfNotPresent
  # -- Resource requests and limits for the sidecar
  resources: {}
  #   limits:
  #     cpu: 100m
  #     memory: 100Mi
  #   requests:
  #     cpu: 50m
  #     memory: 50Mi
  # -- The SecurityContext for the sidecar.
  securityContext: {}
  # -- Set to true to skip tls verification for kube api calls.
  skipTlsVerify: false
  # -- Ensure that rule files aren't conflicting and being overwritten by prefixing their name with the namespace they are defined in.
  enableUniqueFilenames: false
  # -- Readiness probe definition. Probe is disabled on the sidecar by default.
  readinessProbe: {}
  # -- Liveness probe definition. Probe is disabled on the sidecar by default.
  livenessProbe: {}
  rules:
    # -- Whether or not to create a sidecar to ingest rule from specific ConfigMaps and/or Secrets.
    enabled: true
    # -- Label that the configmaps/secrets with rules will be marked with.
    label: loki_rule
    # -- Label value that the configmaps/secrets with rules will be set to.
    labelValue: ""
    # -- Folder into which the rules will be placed.
    folder: /rules
    # -- Comma separated list of namespaces. If specified, the sidecar will search for config-maps/secrets inside these namespaces.
    # Otherwise the namespace in which the sidecar is running will be used.
    # It's also possible to specify 'ALL' to search in all namespaces.
    searchNamespace: null
    # -- Method to use to detect ConfigMap changes. With WATCH the sidecar will do a WATCH request, with SLEEP it will list all ConfigMaps, then sleep for 60 seconds.
    watchMethod: WATCH
    # -- Search in configmap, secret, or both.
    resource: both
    # -- Absolute path to the shell script to execute after a configmap or secret has been reloaded.
    script: null
    # -- WatchServerTimeout: request to the server, asking it to cleanly close the connection after that.
    # defaults to 60sec; much higher values like 3600 seconds (1h) are feasible for non-Azure K8S.
    watchServerTimeout: 60
    #
    # -- WatchClientTimeout: is a client-side timeout, configuring your local socket.
    # If you have a network outage dropping all packets with no RST/FIN,
    # this is how long your client waits before realizing & dropping the connection.
    # Defaults to 66sec.
    watchClientTimeout: 60
    # -- Log level of the sidecar container.
    logLevel: INFO

@win5923
Copy link
Contributor

win5923 commented Apr 2, 2024

Replace the kube-dns to coredns.

global:
   dnsService: "coredns"

@ritesh-makerble
Copy link
Author

same issue @win5923, can you provide ur values.yml file which you used?

@win5923
Copy link
Contributor

win5923 commented Apr 2, 2024

@ritesh-makerble Sure, I use *** to conceal sensitive information.
BTW, I can also create it using the default values.yaml.

# Ref: https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml
# -- Overrides the chart's computed fullname
fullnameOverride: loki

loki:
  # -- Check https://grafana.com/docs/loki/latest/configuration/#ruler for more info on configuring ruler
  rulerConfig:
    enable_api: true
    alertmanager_url: http://kube-prometheus-stack-alertmanager.monitor:9093/
    storage:
      type: azure
      azure:
        # Your Azure storage account name
        account_name: essentialsblob  
        # For the account-key, see docs: https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal
        account_key: ***
        # See https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#containers
        container_name: loki-alert
        request_timeout: 0
    # Remote-write configuration to send rule samples to a Prometheus remote-write
    # endpoint.
    # remote_write:
    #   enabled: true
    #   client:
    #     url: http://kube-prometheus-stack-prometheus.monitor:9090/prometheus/api/v1/write

  # -- Limits config
  limits_config:
    retention_period: 4380h
  # --  Optional compactor configuration
  compactor:
    retention_enabled: true
  # -- Check https://grafana.com/docs/loki/latest/configure/#common_config for more info on how to provide a common configuration
  commonConfig:
    replication_factor: 1
  # -- Storage config. Providing this will automatically populate all necessary storage configs in the templated config.
  storage:
    bucketNames:
      chunks: chunks
      ruler: ruler
      admin: admin
    type: azure
    filesystem:
      chunks_directory: /var/loki/chunks
      rules_directory: /var/loki/rules
  # New tsdb-shipper configuration
  # Ref: https://grafana.com/docs/loki/latest/operations/storage/tsdb/
  storage_config:
    tsdb_shipper:
      active_index_directory: /var/loki/tsdb-index
      cache_location: /var/loki/tsdb-cache
      shared_store: azure
    azure:
      # Your Azure storage account name
      account_name: essentialsblob  
      # For the account-key, see docs: https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal
      account_key: ***
      # See https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#containers
      container_name: loki-storage
      request_timeout: 0

  # -- Check https://grafana.com/docs/loki/latest/configuration/#schema_config for more info on how to configure schemas
  schemaConfig: 
    configs:
        # New TSDB schema below
      - from: "2023-01-05" # <---- A date in the future
        index:
          period: 24h
          prefix: index_
        object_store: azure
        schema: v12
        store: tsdb
  
  # -- Additional query scheduler config
  query_scheduler:
    max_outstanding_requests_per_tenant: 32768
  querier:
    max_concurrent: 16

# Monitoring section determines which monitoring features to enable
monitoring:
  # Dashboards for monitoring Loki
  dashboards:
    # -- If enabled, create configmap with dashboards for monitoring Loki
    enabled: false
    # -- Alternative namespace to create dashboards ConfigMap in
    namespace: monitor
  # Recording rules for monitoring Loki, required for some dashboards
  rules:
    # -- If enabled, create PrometheusRule resource with Loki recording rules
    enabled: true
    # -- Include alerting rules
    alerting: true
    # -- Specify which individual alerts should be disabled
    # -- Instead of turning off each alert one by one, set the .monitoring.rules.alerting value to false instead.
    # -- If you disable all the alerts and keep .monitoring.rules.alerting set to true, the chart will fail to render.
    disabled: {}
    #  LokiRequestErrors: true
    #  LokiRequestPanics: true
    # -- Alternative namespace to create PrometheusRule resources in
    namespace: monitor
  # Self monitoring determines whether Loki should scrape its own logs.
  # This feature currently relies on the Grafana Agent Operator being installed,
  # which is installed by default using the grafana-agent-operator sub-chart.
  # It will create custom resources for GrafanaAgent, LogsInstance, and PodLogs to configure
  # scrape configs to scrape its own logs with the labels expected by the included dashboards.
  selfMonitoring:
    enabled: false
    tenant:
      # -- Name of the tenant
      name: "promtail"
    grafanaAgent:
      # -- Controls whether to install the Grafana Agent Operator and its CRDs.
      # Note that helm will not install CRDs if this flag is enabled during an upgrade.
      # In that case install the CRDs manually from https://github.com/grafana/agent/tree/main/production/operator/crds
      installOperator: false
  # The Loki canary pushes logs to and queries from this loki installation to test
  # that it's working correctly
  lokiCanary:
    enabled: true
    # Basic auth configuration
    basicAuth:
    # -- The basic auth username for the gateway
    username: ***
    # -- The basic auth password for the gateway
    password: ***

# Configuration for the write pod(s)
write:
  # -- Number of replicas for the write
  replicas: 1
  autoscaling:
    # -- Enable autoscaling for the write.
    enabled: false
  persistence:
    # -- Enable volume claims in pod spec
    volumeClaimsEnabled: false

# Configuration for the read pod(s)
read:
  # -- Number of replicas for the read
  replicas: 1
  autoscaling:
    # -- Enable autoscaling for the read, this is only used if `queryIndex.enabled: true`
    enabled: false

# Configuration for the gateway
gateway:
  # -- Specifies whether the gateway should be enabled
  enabled: true
  # -- Number of replicas for the gateway
  replicas: 1
  # Basic auth configuration
  basicAuth:
    # -- Enables basic authentication for the gateway
    enabled: true
    # -- The basic auth username for the gateway
    username: ***
    # -- The basic auth password for the gateway
    password: ***

# Configuration for the backend pod(s)
backend:
  # -- Number of replicas for the backend
  replicas: 1
  autoscaling:
    # -- Enable autoscaling for the backend.
    enabled: false
  persistence:
    # -- Enable volume claims in pod spec
    volumeClaimsEnabled: false

# If you set the singleBinary.replicas value to 1, this chart configures Loki to run the all target in a monolithic mode
# for filesystem storage
# Configuration for the single binary node(s)
singleBinary:
  # -- Number of replicas for the single binary
  replicas: 0

# -- Section for configuring optional Helm test
test:
  enabled: false
  # -- Address of the prometheus server to query for the test
  prometheusAddress: "http://kube-prometheus-stack-prometheus.monitor:9090/prometheus"
  # -- Number of times to retry the test before failing
  timeout: 1m

# Configuration for the table-manager
# Ref: https://grafana.com/docs/loki/latest/operations/storage/table-manager/
tableManager:
  # -- Specifies whether the table-manager should be enabled
  enabled: false

@win5923
Copy link
Contributor

win5923 commented Apr 2, 2024

Perhaps you can modify the setting of ${MY_POD_IP} to this, refer to: #6370 (comment)

env:
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP

extraArgs:
  config.expand-env: true

config:
  memberlist:
    bind_addr:
      - ${MY_POD_IP}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants