[helm loki/promtail] update of DaemonSet takes a long with a lot of nodes #1878

stefanandres · 2020-04-01T13:03:27Z

Is your feature request related to a problem? Please describe.
When deploying a change for the promtail DaemonSet, the configured default is to only rollover one pod at a time.
This may take 1 minute per pod. When you have a Cluster of more than a few Nodes, this will take an immensive amount of time for simples changes.

Current default:

  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

Describe the solution you'd like
Since having a slow rollover does not give you any more availablility, we may configure to allow more parallel rollover.

Additional context

I'd suggest to change the MaxUnavailble settings in the UpdateStrategy.

Now we can also discuss a sane default for the helm chart.

The option should be configureable
What default should be use here? If we continue to use 1 updates are slow, but misconfiguration of the pod would mean that only one logging Pod would be broken and not all of them. If we use something like 100% options are fast, but when the Pod is misconfigured all logs would be broken.

Let me know what you think of it and I can whip up some PR ;)

The text was updated successfully, but these errors were encountered:

cyriltovena · 2020-04-01T20:44:29Z

I think 25% is a sane default. And yes we would love a contribution to improve this.

Thank you :)

This commit makes UpdateStrategy of the promtail daemonset configurable. On large installations, you may want to increase the value of maxUnavailable. This fixes grafana#1878

This commit makes UpdateStrategy of the promtail daemonset configurable. On large installations, you may want to increase the value of maxUnavailable. This fixes #1878

This commit makes UpdateStrategy of the promtail daemonset configurable. On large installations, you may want to increase the value of maxUnavailable. This fixes grafana/loki#1878

* querier.sum-shards Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * addresses pr comments Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * instruments frontend sharding, splitby Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * LabelsSeriesID unexported again Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * removes unnecessary codec interface in astmapping Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * simplifies VectorSquasher as we never use matrices Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * combines queryrange series & value files Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * removes noops struct embedding strategy in schema, provides noop impls on all schemas instead Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * NewSubtreeFolder no longer can return an error as it inlines the jsonCodec Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * account for QueryIngestersWithin renaming Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes rebase import collision Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes rebase conflicts Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * -marks absent as non parallelizable Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * upstream promql compatibility changes Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * addresses pr comments Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * import collisions Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * linting - fixes goimports -local requirement Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes merge conflicts Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * addresses pr comments Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * stylistic changes Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * s/downstream/sharded/ Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * s/sum_shards/parallelise_shardable_queries/ Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * query-audit docs Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * notes sharded parallelizations are only supported by chunk store Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * doc suggestions Signed-off-by: Owen Diehl <ow.diehl@gmail.com>

This commit makes UpdateStrategy of the promtail daemonset configurable. On large installations, you may want to increase the value of maxUnavailable. This fixes grafana#1878

slim-bean added the type/enhancement Something existing could be improved label Apr 2, 2020

slim-bean assigned stefanandres Apr 2, 2020

stefanandres mentioned this issue Apr 6, 2020

[helm loki/promtail] make UpdateStrategy configurable #1898

Merged

2 tasks

cyriltovena closed this as completed in #1898 Apr 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[helm loki/promtail] update of DaemonSet takes a long with a lot of nodes #1878

[helm loki/promtail] update of DaemonSet takes a long with a lot of nodes #1878

stefanandres commented Apr 1, 2020

cyriltovena commented Apr 1, 2020

[helm loki/promtail] update of DaemonSet takes a long with a lot of nodes #1878

[helm loki/promtail] update of DaemonSet takes a long with a lot of nodes #1878

Comments

stefanandres commented Apr 1, 2020

cyriltovena commented Apr 1, 2020