Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add design document for Control Plane Ingress HA #3417

Merged
merged 1 commit into from
Jun 23, 2021

Conversation

TeddyAndrieux
Copy link
Collaborator

@TeddyAndrieux TeddyAndrieux commented Jun 9, 2021

Design for Control Plane Ingress HA


Sees: #2381

@TeddyAndrieux TeddyAndrieux added the kind:design Solution design choices label Jun 9, 2021
@bert-e
Copy link
Contributor

bert-e commented Jun 9, 2021

Hello teddyandrieux,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Jun 9, 2021

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

Comment on lines 40 to 42
`MetalLB <https://metallb.universe.tf/>`_ so that we can rely on layer2
ARP requests when possible or let MetalLB advertise IPs using BGP when
the user router support it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to support BGP-based address management?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, it's more a question for @thomasdanan
But to me, since it's built-in metallb it's only some field to fill in the metallb config it's easy to "support" it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we need to test it and make it configurable... If most of our customer base and target audience for ARTESCA is OK with the L2 approach, I think we should try to keep it constrained. Once we get a clear use-case to address, we can add it easily using your suggested approach 👌

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, I can remove it for the moment and stay on Layer2 only.
Maybe @xaf-scality you have an opinion on this one as well ?

docs/developer/architecture/control-plane-ingress.rst Outdated Show resolved Hide resolved
~~~~~~~~~~~~~~~~~~~~~

MetalLB is not deployed in every environment so it needs to be configurable
from the Bootstrap configuration file, that’s why we have a new field about
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this done? What's for configuration format? How is this backwards-compatible? How can it be made forward-compatible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check #3418, basically, I have a metallb key in networks:controlPlane and in this dict I have enabled (which default to false) and an optional config field that get merge with a default MetalLB configuration

# MetalLB disabled by default
networks_data["controlPlane"].setdefault("metalLB", {}).setdefault("enabled", False)
if networks_data["controlPlane"]["metalLB"]["enabled"]:
if not networks_data["controlPlane"].get("ingressIP"):
errors.append(
"'ingressIP' for 'controlPlane' network is mandatory when 'metalLB'"
"is enabled"
)
else:
address_pools = (
networks_data["controlPlane"]["metalLB"]
.setdefault("config", {})
.setdefault("address-pools", [])
)
if not address_pools:
address_pools.append({})
address_pools[0].setdefault("name", "ingress-ip")
address_pools[0].setdefault("protocol", "layer2")
# Enfore address to Ingress IP
address_pools[0]["addresses"] = [
"{}/32".format(networks_data["controlPlane"]["ingressIP"])
]
address_pools[0]["auto-assign"] = False

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as I answered in another comment below, if we do not support BGP we do not need to expose config and we can also hide "metallb" from Bootstrap config and documentation, and just put manageVIP: true/false (or something else) in the bootstrap config

.. code-block:: yaml

address-pools:
- name: ingress-ip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's find a more descriptive name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, any suggestion?
Maybe metalk8s-control-plane-entrypoint-ip ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

control-plane-ingress-ip?

Comment on lines 113 to 162
But, since BGP router configuration can be really different between various
environments, we need to make sure this
`MetalLB configuration <https://metallb.universe.tf/configuration/#bgp-configuration>`_
is exposed, that’s why we also have the ability to override default MetalLB
configuration from the Bootstrap configuration file in order to provide BGP
router information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies we'd support BGP-based environments. Should we, now? Could we, later?

Copy link
Collaborator Author

@TeddyAndrieux TeddyAndrieux Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can say we will do this later, just it's really easy to expose this MetalLB functionally, but maybe we do not want to expose it for the moment and do not talk about it at all

docs/developer/architecture/control-plane-ingress.rst Outdated Show resolved Hide resolved
Comment on lines 152 to 198
- Describe all new Bootstrap configuration fields (with some links
to MetalLB documentation to configure MetalLB to use BGP protocol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'd directly expose (a subset of) the MetalLB configuration in BootstrapConfiguration? Not sure that's wise. What if we want to switch MetalLB to something else later? If it really required to have access to all of MetalLBs tunables? Is that a feature we provide?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends if we want to support BGP or not, if we only support Layer2 mode then we do not need to expose MetalLB config and just add "something" to enable or not MetalLB (talking about MetaLLB directly in doc/bootstrap config or not)

@bert-e
Copy link
Contributor

bert-e commented Jun 14, 2021

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

The following reviewers are expecting changes from the author, or must review again:

@TeddyAndrieux TeddyAndrieux force-pushed the improvement/design-for-cp-ingress-ha branch from 3aee727 to 648509e Compare June 16, 2021 08:32
Copy link
Contributor

@gdemonet gdemonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR

Generally OK 👌 with the design.
Simply questioning the deployment strategy change for the Ingress controller, with MetalLB in place (maybe I misunderstood something).
Also suggesting that we relegate BGP support to a later (optional) iteration.

docs/developer/architecture/control-plane-ingress.rst Outdated Show resolved Hide resolved
Comment on lines 40 to 42
`MetalLB <https://metallb.universe.tf/>`_ so that we can rely on layer2
ARP requests when possible or let MetalLB advertise IPs using BGP when
the user router support it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we need to test it and make it configurable... If most of our customer base and target audience for ARTESCA is OK with the L2 approach, I think we should try to keep it constrained. Once we get a clear use-case to address, we can add it easily using your suggested approach 👌

docs/developer/architecture/control-plane-ingress.rst Outdated Show resolved Hide resolved
then we expect the DNS server to switch to another IP of a working Control
Plane node.

This approach was rejected because it’s not a real HA and we expect DNS server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a real HA

Meaning...? Something along the lines of "transparent for clients", which isn't the case for DNS-based failover?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be transparent if the DNS server is active and properly switch from an IP to another in case of failure, but as is it's a "False HA" since we do not have any HA we just move the "problem": "we do not need the IP to be HA but a specific DNS"

Comment on lines +131 to +137
NOTE: Changing this Control Plane Ingress IP means we need to reconfigure
all Kubernetes APIServer since we use this Ingress IP as an OIDC provider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will change with oauth2-proxy I think, but we'll need to reconfigure the proxy anyway 😇

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely yes, but since we do not have any design for oauth2-proxy yet I do not mention it

.. code-block:: yaml

address-pools:
- name: ingress-ip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

control-plane-ingress-ip?

Comment on lines +176 to +173
deployments, since MetalLB will be the entry point in the Kubernetes cluster
we do not need to use a DaemonSet running on every Control Plane nodes,
instead, we will use a Deployment with 2 replicas.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this cause unnecessary mesh networking? I guess it's not super critical for control plane, but I don't really see the benefit of switching to a Deployment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's less ressources, here we stick with 2 replicas for the full cluster (no matter the number of Control Plane nodes).
To me it's more like, "why should we use a DaemonSet when we can use a Deployment ?"
I think @NicolasT also had other arguments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK OK, I would just vote for lowering the priority of this switch:

  • it's an architecture change, so it has an impact on users willing to operate/monitor our stuff
  • it's not a huge win (an idle Ingress controller doesn't take much resources)

We also need to configure the Service for Ingress Controller differently
depending on if we use MetalLB or not when we use it we want to use a
LoadBalancer service, set the LoadBalancerIP to IngressIP provided by
the user and set externalTrafficPolicy to Local. If we do not use MetalLB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use Local, why don't we set up the Ingress controller pods to run locally to each node? What happens if no replica runs locally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm to me, when you use Local anyway the VIP will sit on one node that has some "pod" of the Service running locally,so it cannot happens

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, I misunderstood this behaviour then.

gdemonet
gdemonet previously approved these changes Jun 23, 2021
Copy link
Contributor

@gdemonet gdemonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor typos) :shipit:

docs/developer/architecture/control-plane-ingress.rst Outdated Show resolved Hide resolved
docs/developer/architecture/control-plane-ingress.rst Outdated Show resolved Hide resolved
@TeddyAndrieux TeddyAndrieux dismissed NicolasT’s stale review June 23, 2021 11:10

all comment get fixed/answered (document can still be updated later if need to)

@bert-e
Copy link
Contributor

bert-e commented Jun 23, 2021

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

@TeddyAndrieux
Copy link
Collaborator Author

/approve

@bert-e
Copy link
Contributor

bert-e commented Jun 23, 2021

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/2.10

The following branches will NOT be impacted:

  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jun 23, 2021

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/2.10

The following branches have NOT changed:

  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

Please check the status of the associated issue None.

Goodbye teddyandrieux.

@bert-e bert-e merged commit e813922 into development/2.10 Jun 23, 2021
@bert-e bert-e deleted the improvement/design-for-cp-ingress-ha branch June 23, 2021 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:design Solution design choices
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Control plane Ingress controller is tied to Bootstrap
4 participants