Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate Loki Component? #643

Closed
mizeng opened this issue Jun 3, 2019 · 11 comments
Closed

Separate Loki Component? #643

mizeng opened this issue Jun 3, 2019 · 11 comments

Comments

@mizeng
Copy link
Contributor

mizeng commented Jun 3, 2019

https://github.com/grafana/loki/blob/master/docs/operations.md#scalability mentioned #ingestor, distributor, and querier# can running in different Loki processes with their respective roles. (BTW, I was not familiar with libsonnet, could anyone help to show 3 different loki config examples for different roles?)

However, for the sake of performance, I was wondering if ingestor, distributor, and querier can run in different node(like VM, like pod)?
The reason is that if they sit on one same node, the memory/cpu usage will be impacted for each other. (I can not find isolation of cpu/mem for different role)I find that when I did a query, it used up all the memory of node, then the whole Loki get restarted and ingestor break.

Please ask questions you have in the mailing list: https://groups.google.com/forum/#!forum/lokiproject

Or join our #loki slack channel at http://slack.raintank.io/

@mizeng
Copy link
Contributor Author

mizeng commented Jun 4, 2019

@daixiang0 could you help to take a look? I tried start multiple process in my local with different roles, but can not work.

@mizeng
Copy link
Contributor Author

mizeng commented Jun 4, 2019

I changed some code, start table-manager, distributor, ingester in one process, listening http port 3100 and grpc port 9095.
Start querier in another process, listening http port 3101 and grpc port 9096.

Then problem comes, the querier can not find ingester from the ring, thus can not return query result; If let querier listen grpc port 9095, it failed due to error initialising module: server: listen tcp :9095: bind: address already in use.

So how can I achieve "ingestor, distributor, and querier can running in different Loki processes with their respective roles"?

@sh0rez
Copy link
Member

sh0rez commented Jun 4, 2019

The mentioned production setup consists of the following:

Main components:

distributor.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: distributor
spec:
  minReadySeconds: 10
  replicas: 3
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: distributor
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: distributor
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=distributor
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: distributor
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        resources:
          limits:
            cpu: "1"
            memory: 200Mi
          requests:
            cpu: 500m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

ingester.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: ingester
spec:
  minReadySeconds: 60
  replicas: 3
  revisionHistoryLimit: 10
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: ingester
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: ingester
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=ingester
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: ingester
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 15
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "2"
            memory: 10Gi
          requests:
            cpu: "1"
            memory: 5Gi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      terminationGracePeriodSeconds: 4800
      volumes:
      - configMap:
          name: loki
        name: loki

querier.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: querier
spec:
  minReadySeconds: 10
  replicas: 3
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: querier
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: querier
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=querier
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: querier
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

table-manager.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: table-manager
spec:
  minReadySeconds: 10
  replicas: 1
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: table-manager
    spec:
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=table-manager
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: table-manager
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        resources:
          limits:
            cpu: 200m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

All of these share the same config:

config.yml

---
apiVersion: v1
data:
  config.yaml: |
    chunk_store_config:
      chunk_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached.loki.svc.cluster.local
          service: memcached-client
      max_look_back_period: 0
      write_dedupe_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached-index-writes.loki.svc.cluster.local
          service: memcached-client
    ingester:
      chunk_block_size: 262144
      chunk_idle_period: 15m
      lifecycler:
        claim_on_rollout: false
        heartbeat_period: 5s
        interface_names:
        - eth0
        join_after: 10s
        num_tokens: 512
        ring:
          heartbeat_timeout: 1m
          kvstore:
            consul:
              consistentreads: true
              host: consul.loki.svc.cluster.local:8500
              httpclienttimeout: 20s
              prefix: ""
            store: consul
          replication_factor: 3
    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h
    schema_config:
      configs:
      - from: 2018-04-15
        index:
          period: 168h
          prefix: loki_index_
        object_store: gcs
        schema: v9
        store: bigtable
    server:
      graceful_shutdown_timeout: 5s
      grpc_server_max_recv_msg_size: 67108864
      http_server_idle_timeout: 120s
    storage_config:
      bigtable:
        instance: ""
        project: ""
      gcs:
        bucket_name: ""
      index_queries_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached-index-queries.loki.svc.cluster.local
          service: memcached-client
    table_manager:
      chunk_tables_provisioning:
        inactive_read_throughput: 0
        inactive_write_throughput: 0
        provisioned_read_throughput: 0
        provisioned_write_throughput: 0
      index_tables_provisioning:
        inactive_read_throughput: 0
        inactive_write_throughput: 0
        provisioned_read_throughput: 0
        provisioned_write_throughput: 0
      retention_deletes_enabled: false
      retention_period: 0
kind: ConfigMap
metadata:
  name: loki

These individual components are running in separate docker containers (actually Kubernetes pods) and have resourceLimits in place, to prevent a single service from impacting the others.

Furthermore, a gateway (nginx) and memcached are running in front of it.

I hope this helps, maybe take a look at the Kubernetes manifests in the <details> above.

@mizeng
Copy link
Contributor Author

mizeng commented Jun 4, 2019

@sh0rez Thanks a lot for the reply!
I checked config.yml, seems no "server" type config. So I assume above individual components use default http port/grpc port, right?
Then how can they communicate to each other through the Ring?

My trial in local machine, querier (in one process, grpc port 9096) can not find the ingester (in another process, grpc port 9095)to get logs. Do I miss something?

@sh0rez
Copy link
Member

sh0rez commented Jun 4, 2019

Do I miss something?
No, sorry, I did not provide the full manifests, because I thought they were unnecessary. As this is deployed on Kubernetes, all components run on their default ports (see the pod specs) and they all have a matching service. In config.yml they are configured to talk to each other using these services.

For the ring however, Hashicorp Consul is used, which is deployed to the cluster as well.
These behaviors are not kubernetes-specific, you could also implement this e.g. using docker networks and named containers, or multiple VM's with hostnames.

At the moment, you probably need to use consul for the ring when running in distributed mode. (Refer to Cortex docs, which this functionality of Loki is taken from. Maybe @tomwilkie can tell more about this?

Does that help?

@mizeng
Copy link
Contributor Author

mizeng commented Jun 5, 2019

definitely help a lot, thanks! I will read the docs you provided, and then go back again if I still have questions.

@mizeng
Copy link
Contributor Author

mizeng commented Jun 5, 2019

@sh0rez btw, I can not find "consul/consul.libsonnet" and "ksonnet-util/kausal.libsonnet" in Loki.

@sh0rez
Copy link
Member

sh0rez commented Jun 5, 2019

Hi, according to the Jsonnetfile, these are external dependencies, located in grafana/jsonnet-libs.

"name": "ksonnet-util",
"source": {
"git": {
"remote": "https://github.com/grafana/jsonnet-libs",
"subdir": "ksonnet-util"
}
},
"version": "master"
},
{
"name": "consul",
"source": {
"git": {
"remote": "https://github.com/grafana/jsonnet-libs",
"subdir": "consul"
}

  • consul/consul.libsonnet provides manifests to install consul (worth a look)
  • ksonnet-util/kausal.libsonnet on the other hand is a helper to create Kubernetes objects using the mixin-style of jsonnet.

@mizeng
Copy link
Contributor Author

mizeng commented Jun 5, 2019

@sh0rez much appreciate for your help! To save other users' time, I would like to write a new doc focus on how to separate these components for the beginners who didn't read any Cortex code and didn't know anything about jsonnet.

@yang3808282
Copy link

The mentioned production setup consists of the following:

Main components:

distributor.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: distributor
spec:
  minReadySeconds: 10
  replicas: 3
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: distributor
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: distributor
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=distributor
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: distributor
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        resources:
          limits:
            cpu: "1"
            memory: 200Mi
          requests:
            cpu: 500m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

ingester.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: ingester
spec:
  minReadySeconds: 60
  replicas: 3
  revisionHistoryLimit: 10
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: ingester
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: ingester
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=ingester
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: ingester
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 15
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "2"
            memory: 10Gi
          requests:
            cpu: "1"
            memory: 5Gi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      terminationGracePeriodSeconds: 4800
      volumes:
      - configMap:
          name: loki
        name: loki

querier.yml
table-manager.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: table-manager
spec:
  minReadySeconds: 10
  replicas: 1
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: table-manager
    spec:
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=table-manager
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: table-manager
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        resources:
          limits:
            cpu: 200m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

All of these share the same config:

config.yml

---
apiVersion: v1
data:
  config.yaml: |
    chunk_store_config:
      chunk_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached.loki.svc.cluster.local
          service: memcached-client
      max_look_back_period: 0
      write_dedupe_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached-index-writes.loki.svc.cluster.local
          service: memcached-client
    ingester:
      chunk_block_size: 262144
      chunk_idle_period: 15m
      lifecycler:
        claim_on_rollout: false
        heartbeat_period: 5s
        interface_names:
        - eth0
        join_after: 10s
        num_tokens: 512
        ring:
          heartbeat_timeout: 1m
          kvstore:
            consul:
              consistentreads: true
              host: consul.loki.svc.cluster.local:8500
              httpclienttimeout: 20s
              prefix: ""
            store: consul
          replication_factor: 3
    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h
    schema_config:
      configs:
      - from: 2018-04-15
        index:
          period: 168h
          prefix: loki_index_
        object_store: gcs
        schema: v9
        store: bigtable
    server:
      graceful_shutdown_timeout: 5s
      grpc_server_max_recv_msg_size: 67108864
      http_server_idle_timeout: 120s
    storage_config:
      bigtable:
        instance: ""
        project: ""
      gcs:
        bucket_name: ""
      index_queries_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached-index-queries.loki.svc.cluster.local
          service: memcached-client
    table_manager:
      chunk_tables_provisioning:
        inactive_read_throughput: 0
        inactive_write_throughput: 0
        provisioned_read_throughput: 0
        provisioned_write_throughput: 0
      index_tables_provisioning:
        inactive_read_throughput: 0
        inactive_write_throughput: 0
        provisioned_read_throughput: 0
        provisioned_write_throughput: 0
      retention_deletes_enabled: false
      retention_period: 0
kind: ConfigMap
metadata:
  name: loki

These individual components are running in separate docker containers (actually Kubernetes pods) and have resourceLimits in place, to prevent a single service from impacting the others.

Furthermore, a gateway (nginx) and memcached are running in front of it.

I hope this helps, maybe take a look at the Kubernetes manifests in the <details> above.

image
there are right or wrong?

@MuhammadNaeemAkhtar
Copy link

The mentioned production setup consists of the following:

Main components:

distributor.yml
ingester.yml
querier.yml
table-manager.yml
All of these share the same config:

config.yml
These individual components are running in separate docker containers (actually Kubernetes pods) and have resourceLimits in place, to prevent a single service from impacting the others.

Furthermore, a gateway (nginx) and memcached are running in front of it.

I hope this helps, maybe take a look at the Kubernetes manifests in the <details> above.

Hi @sh0rez ,
In this case what could be the grafana datasource and log ingestion. I mean which pod service link need to use for log ingestion and which for grafana datasource.
Thank YOu!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants