Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce cluster instance identity other than IP addresses #1835

Closed
mxinden opened this issue Apr 12, 2019 · 2 comments
Closed

Introduce cluster instance identity other than IP addresses #1835

mxinden opened this issue Apr 12, 2019 · 2 comments

Comments

@mxinden
Copy link
Member

mxinden commented Apr 12, 2019

Scenario: Two Alertmanager clusters (A, B) are running in the same Kubernetes clusters.

If Alertmanager cluster A is scaled down by one instance with IP address X and within a small time range Alertmanager cluster B is scaled up by one instance with the recycled IP address X the two clusters will merge resulting in one big cluster.

We are hitting this problem in the Prometheus Operator end-to-end test suite (prometheus-operator/prometheus-operator#2544) on a single node (small CIDR space) Kubernetes cluster.

The probability of this happening in production systems is questionable. Most setups probably only include a single Alertmanager cluster, in addition, CIDR ranges might be a lot bigger and IP address recycling might not happen as frequently.

This could be prevented via a unique identifier per Alertmanager cluster, disallowing instances with different identifiers to join. In addition #1819 introducing mutual TLS could stop accidental cluster merges in case trust chains a scoped per cluster.

The purpose of this issue is to document the failure for the future and give anyone hitting the same issue a central place to discuss further precedence.

@brancz
Copy link
Member

brancz commented Apr 12, 2019

While TLS chain of trust could accidentally solve this, I don’t think this is the appropriate solution. As you proposed a separate mechanism sounds reasonable.

As for the probability, this is actually not all that low I recall Kubernetes IP recycling to have caused various problems across the board.

@simonpasquier
Copy link
Member

Closed by #3354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants