-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial draft work-in-progress bring your own cluster docs #57
Conversation
Will this purely the mgmt cluster or also the workload cluster? For google multi-networking: https://cloud.google.com/kubernetes-engine/docs/concepts/about-multinetwork-support-for-pods
I am doing something similar but I am making it such that I can use with any cloud or even on premises. I just need a daemon with host access to create the additional interfaces. I also can do dynamic interface creation independent on the pod lifecycle. |
For now, this is mgmt cluster. For workload clusters, it will require KCC running and a derivative of the workload cluster package. We can provision vanilla GKE clusters in that way but to do the full R1 example network on GKE, we will need to use the GKE multi-networking functionality. Unfortunately, we can't quite do full provisioning of GKE clusters with those options yet, because that is only in public preview, and so we do not have support for it in the KCC resources for GKE clusters and node pools yet. If GKE mutli-networking goes GA, and then the KCC support is added, we can do the full cluster provisioning including the multi-network via Nephio. We can still do the workload provisioning though, integrating with the GKE multi-network API instead of the Multus API. For GA network resources like VPCs, we already have KCC resources and so can just create packages for those. |
The provider I am talking about would indeed interact with KCC. So what it does is select the ip addresses to ensure we don't overlap between clusters, but we would use KCC resources. One q is do we have the ability for VLANs with the multi-networking once we go GA? |
I don't think so, I don't see anything in the docs about VLAN support. Do the workloads need to be VLAN-aware? Isn't simply attaching the right interfaces to the Pods enough, regardless of the underlying technology to keep the traffic sorted? Limitations: https://cloud.google.com/kubernetes-engine/docs/how-to/setup-multinetwork-support-for-pods#general-limitations |
No we don't need VLANs per se. Was just wondering. They are used in enterprise scenario's, but this is always a chicken and egg discussion with multi-tenant vs non multi-tenant UPF. What we see a lot is the current solutions are not optimal for single tenant as the resource consumption is too high. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to review from Nephio Stock Repositories
install-guide/byoc.md
Outdated
kpt live init porch-dev | ||
kpt live apply porch-dev --reconcile-timeout=15m --output=table | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you need to deploy the resource backend first;
kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/resource-backend@v1.0.1
kpt fn render resource-backend/
kpt live init resource-backend/
kpt live apply resource-backend/ --reconcile-timeout 15m --output=table
without it, the nephio controller fails with
2023-08-22T23:04:39.604Z ERROR controller-runtime.source.EventHandler failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: ipam.resource.nephio.org/v1alpha1: the server could not find the requested resource"}
2023-08-22T23:04:39.605Z ERROR controller-runtime.source.EventHandler failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: vlan.resource.nephio.org/v1alpha1: the server could not find the requested resource"}
2023-08-22T23:04:39.606Z ERROR controller-runtime.source.EventHandler failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: inv.nephio.org/v1alpha1: the server could not find the requested resource"}
2023-08-22T23:04:39.616Z ERROR controller-runtime.source.EventHandler failed to get informer from cache {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: inv.nephio.org/v1alpha1: the server could not find the requested resource"}
I believe gittea should be deployed before the nephio-controllers as well
kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/gitea@v1.0.1
kpt fn render gitea/
kpt live init gitea/
kpt live apply gitea/ --reconcile-timeout 15m --output=table
This PR addresses most of the gitea issues: nephio-project/nephio-example-packages#77
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah, in R2 we should try to eliminate those dependencies. We shouldn't need the resource-backend or gitea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. but for now, the BYOC workflow should bring the dependencies before installing nephio-controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we turn off the repo provisioning reconciler, we may not need to install Gitea. But for now we can just require it, and refine in R2.
install-guide/byoc.md
Outdated
kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/nephio-controllers@v1.0.1 | ||
kpt fn render nephio-controllers | ||
kpt live init nephio-controllers | ||
kpt live apply nephio-controllers --reconcile-timeout=15m --output=table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason I'm getting OOMKilled
on the nephio-controller
. Had to bump the limit to 256Mi
Once fixed, I encountered, hence the comment above for gitea.
2023-08-22T23:16:05.836Z ERROR Cannot get secret, please follow README and create the gitea secret {"error": "secrets \"git-user-secret\" not found"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't really check in detail. I wonder if the OOM kill is due to some leak in the error handling code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOM kill happened once I was passed the errors; i.e. OOM kill with 124Mi running normally.
install-guide/byoc.md
Outdated
Before we apply it to the cluster, however, we should configure it. | ||
|
||
By default, it expects the webui to be reached via `http://localhost:7007`. If | ||
you plan to expose the webui via a load balancer service instead, then you need | ||
to configure the scheme, hostname, port, and service. Note that if you wish to | ||
use HTTPS, you should set the `scheme` to `https`, but you will need to | ||
terminate the TLS at the load balancer as the container currently only supports | ||
HTTP. | ||
|
||
This information is captured in the application ConfigMap for the webui, which | ||
is generated by a KRM function. We can change the values in | ||
`nephio-webui/gen-configmap.yaml` just using a text editor (change the | ||
`hostname` and `port` values under `params:`), and those will take effect later | ||
when we run `kpt fn render`. As an alternative to a text editor, you can run | ||
these commands: | ||
|
||
```bash | ||
kpt fn eval nephio-webui --image gcr.io/kpt-fn/search-replace:v0.2.0 --match-kind GenConfigMap -- 'by-path=params.scheme' 'put-value=SCHEME' | ||
kpt fn eval nephio-webui --image gcr.io/kpt-fn/search-replace:v0.2.0 --match-kind GenConfigMap -- 'by-path=params.hostname' 'put-value=HOSTNAME' | ||
kpt fn eval nephio-webui --image gcr.io/kpt-fn/search-replace:v0.2.0 --match-kind GenConfigMap -- 'by-path=params.port' 'put-value=PORT' | ||
``` | ||
|
||
If you want to expose the UI via a load balancer service, you can manually | ||
change the Service `type` to `LoadBalancer`, or run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - none of that configuration is needed for OpenShift, as the underlying Service will be exposed through a Route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just assume a Gateway or Ingress implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be a fair assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "that" you mean the LB piece, right? Because I think you need the URL piece in order for Backstage to work properly, don't you? I think you probably also need it for OAuth or OIDC to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, having everything set to localhost is working fine for me. Not sure it is necessary to change the URL when it is fronted with an Ingress. Same for OIDC; based on my testing so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oo, nice
@adetalhouet I am thinking of re-organizing this into a few opinionated examples, instead of trying to allow a separate choice for each component. So, I am thinking of reworking this as:
Each opinionated environment would pick specific solutions, and show how to set it up with that. We can even create separate package repositories that create derivative packages based on the R1 originals. For example, for the GCP one, I would swap out KinD CAPI clusters with GKE-based clusters provisioned by KCC (which in fact will have a separate repository, and not use the I think that will be a lot simpler. Over time, if needed, we can grow create slightly different environments (choose GitHub instead of Gitea, for example). But I think if we just have a couple good examples: Nephio-in-a-box (current R1 demo), GCP, OpenShift, maybe if someone wants to do EKS, that will get us a long way. WDYT? |
@johnbelamaric sounds like a good idea. I'll let you finish this PR; and will submit a follow-up one for OpenShift. |
I have reorganized this, and marked different pages as work-in-progress. Maybe we should merge it and that way we can collaboratively make edits? See https://nephio.slack.com/archives/C03QV5ZSHL5/p1693262820452229 as well for where it is heading. /hold cancel |
LGTM, thank you |
install-guide/gcp.md
Outdated
## Gitea Installation | ||
|
||
While you may use other Git providers as well, Gitea is required in the R1 | ||
setup. To install Gitea, use `kpt`. From your `nephio-install` directory, run: | ||
|
||
```bash | ||
kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/gitea@v1.0.1 | ||
``` | ||
|
||
We need to make a few changes. The R1 Gitea package is designed for the sandbox | ||
environment with Metal LB. Let's change that to: | ||
- Use Secrets Manager to manage the Gitea secrets | ||
- Use an internal load balancer for the Gitea git Service so that it is | ||
accessible in our VPC | ||
- Use Gateway API to expose the Gitea Web UI to the Internet for | ||
consumption by our workstation | ||
|
||
|
||
```bash | ||
kpt fn render gitea/ | ||
kpt live init gitea/ | ||
kpt live apply gitea/ --reconcile-timeout 15m --output=table | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this Gitea process will be common among the different installations, so maybe we can put it in a separated document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my original thinking, too, but it won't be, because we need to reconfigure it based on a few different things. In particular the services (and later the secrets). We might be able to make it common enough that the only difference is the upstream package used. But I am not sure yet.
install-guide/gcp.md
Outdated
``` | ||
kpt pkg get --for-deployment https://github.com/nephio-project/nephio-example-packages.git/gitea@v1.0.1 | ||
kpt fn render gitea/ | ||
kpt live init gitea/ | ||
kpt live apply gitea/ --reconcile-timeout 15m --output=table | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the instructions for setting up a gitea service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, wip. I will be adding instructions on how to modify the gitea package to work in the opinionated GCP install
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
/lgtm |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnbelamaric The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold
Work in progress