Helm charts enhancement #147

guidoiaquinti · 2021-10-06T12:47:55Z

👋 Hi! I’m going to list here few random ideas on how we could improve our helm charts divided by topic:

📈 Scaling

we should support vertical and horizontal scaling of all our dependendencies: Kafka, ClickHouse and PostgreSQL
- vertical service a scale: this is usually an operation used as first mitigation in case of resource contention. It usually involves adding more CPU/memory/storage to a pod.
- horizontal service scale: this is usually an operation that can take some time (depending on the dataset) and usually requires dataset partitioning/sharding and a cluster rebalance operation.
related to ☝️ we should make sure we mount service data dir on top of resizable storage

🚨 Monitoring & Alerting

As part of the helm charts, we should ship a basic monitoring/alerting stack. I know we have some debugging information already built-in into PostHog and we could probably extend that but I don’t think it will covers most of the cases we might need (e.g. how can we troubleshoot a problem when a PostHog installation is down?)

📑 Documentation

We should document all the maintenance operations & alerts in a runbook.

Please share your ideas and I'll add them to this post. Thank you!

The text was updated successfully, but these errors were encountered:

macobo · 2021-10-06T12:50:18Z

Scaling & Documentation

Related issue: #129

As part of the helm charts, we should ship a basic monitoring/alerting stack.

This is kind of done (but undocumented) w/ the prometheus setup.

guidoiaquinti · 2021-10-06T12:51:37Z

This is kind of done (but undocumented) w/ the prometheus setup.

Do we ship basic alerts and related runbooks as well? This could have caught few of the issues I've seen in the last few days.

macobo · 2021-10-06T12:54:39Z

Here's what we ship by default around alerting: https://github.com/PostHog/charts-clickhouse/blob/main/charts/posthog/values.yaml#L592-L671

Runbooks I think would live alongside our documentation in the handbook w/ troubleshooting sections.

tiina303 · 2021-10-06T14:50:48Z

Priorities, in my view atm:

high:

vertical scaling for Kafka, ... (ticket Resizing Kafka on all platforms #146) this I would say is high priority as we keep seeing problems/questions and we have no docs & the solution to nuke data isn't great.

mid:

alerting on PostHog not working & troubleshooting easier for users (there's a bunch of random questions in user slack that are around k8s basics & stuff like plugin server got into a bad state - try restart - ok that worked, great)
docs around maintenance operations/alerts

low:

horizontal scaling: We have hpa for most things & people use it & I haven't had much user questions about it - I just see them update the charts with improvements to it, so seems like it's working pretty great.

guidoiaquinti · 2022-10-11T15:26:05Z

In the last year we have implemented the majority of the improvements above. I'm going to close this issue as we are now tracking the remaining tasks individually.

guidoiaquinti added documentation Improvements or additions to documentation enhancement New feature or request labels Oct 6, 2021

fuziontech added the helm Helm chart work label Jan 19, 2022

guidoiaquinti closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm charts enhancement #147

Helm charts enhancement #147

guidoiaquinti commented Oct 6, 2021

macobo commented Oct 6, 2021

guidoiaquinti commented Oct 6, 2021 •

edited

Loading

macobo commented Oct 6, 2021 •

edited

Loading

tiina303 commented Oct 6, 2021 •

edited

Loading

guidoiaquinti commented Oct 11, 2022

Helm charts enhancement #147

Helm charts enhancement #147

Comments

guidoiaquinti commented Oct 6, 2021

📈 Scaling

🚨 Monitoring & Alerting

📑 Documentation

macobo commented Oct 6, 2021

guidoiaquinti commented Oct 6, 2021 • edited Loading

macobo commented Oct 6, 2021 • edited Loading

tiina303 commented Oct 6, 2021 • edited Loading

guidoiaquinti commented Oct 11, 2022

guidoiaquinti commented Oct 6, 2021 •

edited

Loading

macobo commented Oct 6, 2021 •

edited

Loading

tiina303 commented Oct 6, 2021 •

edited

Loading