Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft work-in-progress bring your own cluster docs #57

Merged
merged 20 commits into from
Aug 30, 2023

Conversation

johnbelamaric
Copy link
Member

/hold

Work in progress

@henderiw
Copy link
Contributor

Will this purely the mgmt cluster or also the workload cluster?
If so the networking is important to be sorted not?

For google multi-networking:

https://cloud.google.com/kubernetes-engine/docs/concepts/about-multinetwork-support-for-pods

  • The above on google for multi-networking works very nicely with what we did. So we should just add a google provider that provisions the networks and vpcs. The provider on the network allow to do this.

I am doing something similar but I am making it such that I can use with any cloud or even on premises. I just need a daemon with host access to create the additional interfaces. I also can do dynamic interface creation independent on the pod lifecycle.

@johnbelamaric
Copy link
Member Author

For now, this is mgmt cluster. For workload clusters, it will require KCC running and a derivative of the workload cluster package. We can provision vanilla GKE clusters in that way but to do the full R1 example network on GKE, we will need to use the GKE multi-networking functionality.

Unfortunately, we can't quite do full provisioning of GKE clusters with those options yet, because that is only in public preview, and so we do not have support for it in the KCC resources for GKE clusters and node pools yet. If GKE mutli-networking goes GA, and then the KCC support is added, we can do the full cluster provisioning including the multi-network via Nephio. We can still do the workload provisioning though, integrating with the GKE multi-network API instead of the Multus API.

For GA network resources like VPCs, we already have KCC resources and so can just create packages for those.

cc @mskrocki @diviner524

@henderiw
Copy link
Contributor

The provider I am talking about would indeed interact with KCC. So what it does is select the ip addresses to ensure we don't overlap between clusters, but we would use KCC resources.

One q is do we have the ability for VLANs with the multi-networking once we go GA?

@johnbelamaric
Copy link
Member Author

One q is do we have the ability for VLANs with the multi-networking once we go GA?

I don't think so, I don't see anything in the docs about VLAN support. Do the workloads need to be VLAN-aware? Isn't simply attaching the right interfaces to the Pods enough, regardless of the underlying technology to keep the traffic sorted?

Limitations: https://cloud.google.com/kubernetes-engine/docs/how-to/setup-multinetwork-support-for-pods#general-limitations

@henderiw
Copy link
Contributor

No we don't need VLANs per se. Was just wondering. They are used in enterprise scenario's, but this is always a chicken and egg discussion with multi-tenant vs non multi-tenant UPF. What we see a lot is the current solutions are not optimal for single tenant as the resource consumption is too high.

Copy link
Member

@adetalhouet adetalhouet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to review from Nephio Stock Repositories

install-guide/byoc.md Outdated Show resolved Hide resolved
kpt live init porch-dev
kpt live apply porch-dev --reconcile-timeout=15m --output=table
```

Copy link
Member

@adetalhouet adetalhouet Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need to deploy the resource backend first;

 kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/resource-backend@v1.0.1
 kpt fn render resource-backend/
 kpt live init resource-backend/
 kpt live apply resource-backend/ --reconcile-timeout 15m --output=table

without it, the nephio controller fails with

2023-08-22T23:04:39.604Z ERROR controller-runtime.source.EventHandler failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: ipam.resource.nephio.org/v1alpha1: the server could not find the requested resource"}
2023-08-22T23:04:39.605Z ERROR controller-runtime.source.EventHandler failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: vlan.resource.nephio.org/v1alpha1: the server could not find the requested resource"}
2023-08-22T23:04:39.606Z ERROR controller-runtime.source.EventHandler failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: inv.nephio.org/v1alpha1: the server could not find the requested resource"}
2023-08-22T23:04:39.616Z ERROR controller-runtime.source.EventHandler failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: inv.nephio.org/v1alpha1: the server could not find the requested resource"}

I believe gittea should be deployed before the nephio-controllers as well

 kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/gitea@v1.0.1
 kpt fn render gitea/
 kpt live init gitea/
 kpt live apply gitea/ --reconcile-timeout 15m --output=table

This PR addresses most of the gitea issues: nephio-project/nephio-example-packages#77

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yeah, in R2 we should try to eliminate those dependencies. We shouldn't need the resource-backend or gitea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. but for now, the BYOC workflow should bring the dependencies before installing nephio-controller

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we turn off the repo provisioning reconciler, we may not need to install Gitea. But for now we can just require it, and refine in R2.

kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/nephio-controllers@v1.0.1
kpt fn render nephio-controllers
kpt live init nephio-controllers
kpt live apply nephio-controllers --reconcile-timeout=15m --output=table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I'm getting OOMKilled on the nephio-controller. Had to bump the limit to 256Mi
Once fixed, I encountered, hence the comment above for gitea.

2023-08-22T23:16:05.836Z ERROR Cannot get secret, please follow README and create the gitea secret {"error": "secrets \"git-user-secret\" not found"}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't really check in detail. I wonder if the OOM kill is due to some leak in the error handling code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOM kill happened once I was passed the errors; i.e. OOM kill with 124Mi running normally.

Comment on lines 104 to 127
Before we apply it to the cluster, however, we should configure it.

By default, it expects the webui to be reached via `http://localhost:7007`. If
you plan to expose the webui via a load balancer service instead, then you need
to configure the scheme, hostname, port, and service. Note that if you wish to
use HTTPS, you should set the `scheme` to `https`, but you will need to
terminate the TLS at the load balancer as the container currently only supports
HTTP.

This information is captured in the application ConfigMap for the webui, which
is generated by a KRM function. We can change the values in
`nephio-webui/gen-configmap.yaml` just using a text editor (change the
`hostname` and `port` values under `params:`), and those will take effect later
when we run `kpt fn render`. As an alternative to a text editor, you can run
these commands:

```bash
kpt fn eval nephio-webui --image gcr.io/kpt-fn/search-replace:v0.2.0 --match-kind GenConfigMap -- 'by-path=params.scheme' 'put-value=SCHEME'
kpt fn eval nephio-webui --image gcr.io/kpt-fn/search-replace:v0.2.0 --match-kind GenConfigMap -- 'by-path=params.hostname' 'put-value=HOSTNAME'
kpt fn eval nephio-webui --image gcr.io/kpt-fn/search-replace:v0.2.0 --match-kind GenConfigMap -- 'by-path=params.port' 'put-value=PORT'
```

If you want to expose the UI via a load balancer service, you can manually
change the Service `type` to `LoadBalancer`, or run:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - none of that configuration is needed for OpenShift, as the underlying Service will be exposed through a Route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just assume a Gateway or Ingress implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be a fair assumption.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "that" you mean the LB piece, right? Because I think you need the URL piece in order for Backstage to work properly, don't you? I think you probably also need it for OAuth or OIDC to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, having everything set to localhost is working fine for me. Not sure it is necessary to change the URL when it is fronted with an Ingress. Same for OIDC; based on my testing so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oo, nice

@johnbelamaric
Copy link
Member Author

@adetalhouet I am thinking of re-organizing this into a few opinionated examples, instead of trying to allow a separate choice for each component. So, I am thinking of reworking this as:

  • A discussion of the different components and options, and how they fit together, without details on setting each up.
  • A separate page for each opinionated environment: one for GCP, one for OpenShift, etc.

Each opinionated environment would pick specific solutions, and show how to set it up with that. We can even create separate package repositories that create derivative packages based on the R1 originals. For example, for the GCP one, I would swap out KinD CAPI clusters with GKE-based clusters provisioned by KCC (which in fact will have a separate repository, and not use the mgmt repository for GCP infrastructure deployment).

I think that will be a lot simpler. Over time, if needed, we can grow create slightly different environments (choose GitHub instead of Gitea, for example). But I think if we just have a couple good examples: Nephio-in-a-box (current R1 demo), GCP, OpenShift, maybe if someone wants to do EKS, that will get us a long way. WDYT?

@adetalhouet
Copy link
Member

@johnbelamaric sounds like a good idea. I'll let you finish this PR; and will submit a follow-up one for OpenShift.

@johnbelamaric
Copy link
Member Author

I have reorganized this, and marked different pages as work-in-progress. Maybe we should merge it and that way we can collaboratively make edits?

See https://nephio.slack.com/archives/C03QV5ZSHL5/p1693262820452229 as well for where it is heading.

/hold cancel

@adetalhouet
Copy link
Member

LGTM, thank you

install-guide/byoc.md Outdated Show resolved Hide resolved
install-guide/byoc.md Outdated Show resolved Hide resolved
install-guide/byoc.md Outdated Show resolved Hide resolved
install-guide/byoc.md Outdated Show resolved Hide resolved
install-guide/byoc.md Outdated Show resolved Hide resolved
Comment on lines 193 to 215
## Gitea Installation

While you may use other Git providers as well, Gitea is required in the R1
setup. To install Gitea, use `kpt`. From your `nephio-install` directory, run:

```bash
kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/gitea@v1.0.1
```

We need to make a few changes. The R1 Gitea package is designed for the sandbox
environment with Metal LB. Let's change that to:
- Use Secrets Manager to manage the Gitea secrets
- Use an internal load balancer for the Gitea git Service so that it is
accessible in our VPC
- Use Gateway API to expose the Gitea Web UI to the Internet for
consumption by our workstation


```bash
kpt fn render gitea/
kpt live init gitea/
kpt live apply gitea/ --reconcile-timeout 15m --output=table
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this Gitea process will be common among the different installations, so maybe we can put it in a separated document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original thinking, too, but it won't be, because we need to reconfigure it based on a few different things. In particular the services (and later the secrets). We might be able to make it common enough that the only difference is the upstream package used. But I am not sure yet.

Comment on lines 246 to 251
```
kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/gitea@v1.0.1
kpt fn render gitea/
kpt live init gitea/
kpt live apply gitea/ --reconcile-timeout 15m --output=table
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the instructions for setting up a gitea service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, wip. I will be adding instructions on how to modify the gitea package to work in the opinionated GCP install

johnbelamaric and others added 6 commits August 29, 2023 21:05
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
@henderiw
Copy link
Contributor

/lgtm

@electrocucaracha
Copy link
Member

/lgtm

@johnbelamaric
Copy link
Member Author

/approve

@nephio-prow
Copy link
Contributor

nephio-prow bot commented Aug 30, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nephio-prow nephio-prow bot added the approved label Aug 30, 2023
@nephio-prow nephio-prow bot merged commit a8e012c into nephio-project:main Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants