Deploying AGW and Orc8r On-Premises and Bare Metal #8852
Replies: 9 comments 26 replies
-
Thank you for the detailed notes. I am able to bring up the orchestrator and AGW separately. But still can't add the AGW to NMS because I can't create a 'Network' in the UI. When I try to create a new LTE Network using UI, I get a cryptic "Not Found" error. Seems like the error is stemming from the fact that lte-orc8r chart is not deployed in bare-metal-ansible instructions above. I tried to manually issue helm command to deploy lte charts, but the new pods are stuck in ContainerCreating state . This issue (#6252) has some clues but I don't know what to modify in values.yaml
|
Beta Was this translation helpful? Give feedback.
-
When I look at the PODS, looks like some persistent volumes do not exist, which makes me want to think, I missed some initialization step
|
Beta Was this translation helpful? Give feedback.
-
I tried this method. Everything went well, I can access master.nms.mydomain, create an organization, login as a user of this org, but then I have an error message: Indeed, https://organization.nms.mydomain/nms/apicontroller/magma/v1/networks answers "503 service unavailable". I don't know what to check order to debug this. Any idea ? Thanks |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
try to install an older version of markupsafe, like this: pip3 install markupsafe==2.0.1 It worked for me |
Beta Was this translation helpful? Give feedback.
-
I'm trying to install v1.8.0 and the problem with markupsafe is still present. source .venv/bin/activate but then I have this error: ERROR! this task 'ansible.builtin.command' has extra params, which is only allowed in the following modules: include_tasks, shell, group_by, include, include_vars, set_fact, win_shell, import_role, script, import_tasks, include_role, command, add_host, raw, meta, win_command it seems to be a problem with ansible v2.9.6: so I'll try with another version. |
Beta Was this translation helpful? Give feedback.
-
Just an update. I recently completed a local bring up of the 1.8.0 Orc8r and AGW. I don't plan to produce something as detailed as above especially since the whole process was much simpler than 1.6. Here are some of the top level comments: Even though I have dedicated servers for both AGW and Orc8r, I deployed into a KVM VM on each server. That made it easier to deal with networking and whenever I had an issue, I could spin up a new VM. I used @ShubhamTatvamasi guide for the Orc8r. Pretty straightforward. I used Install Docker-based Access Gateway on Ubuntu for the AGW. Note that the AGW won't check in until after the final I have not yet provisioned an eNB for the network. Will do that over the next few weeks. After we do that successfully, we'll migrate our 1.6 network over to 1.8. A few known issues as of this writing:
|
Beta Was this translation helpful? Give feedback.
-
I am getting the following errors during installation: Unable to restart service docker: Job for docker.service failed because the control process exited with error code.\nSee "systemctl status docker.service" and "journalctl -xe" for details. It seems that the docker.service template is failing to start/restart how do I proceed? |
Beta Was this translation helpful? Give feedback.
-
Hello @jblakley |
Beta Was this translation helpful? Give feedback.
-
TL;DR
Recounts the tips and tricks of deploying a private, on-premises, baremetal Magma Orchestrator (orc8r) and Access Gateway (AGW). No eNodeB has yet been provisioned.
What and Why?
I recently completed the bring up of AGW and Orc8r in a "closed" baremetal on-premises environment. Since baremetal is a relatively new deployment model for magma, especially the orchestrator, I thought I'd share some of my heartaches, learnings, tips & tricks, and suggestions for improvements. I'd welcome others to communicate their learning as well -- although the magma slack channel is a better place for detailed back and forth.
Why Baremetal?
l started my learning on magma with the quickstart -- it is an excellent tool to get a first experience on magma "in an afternoon". However, at this time, on magma 1.6, the recommended deployment model is baremetal AGW on Ubuntu 20.04 and AWS for the orchestrator. AGW baremetal makes sense for performance and networking reasons -- it's hard to plug an S1 interface into the cloud. Cloud-based Orc8r is a fine alternative but I chose to bring on-premises because:
What Baremetal?
Our AGW is an Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz with two 1Gbps NICs with 32GB RAM running Ubuntu 20.04. The Orc8r is an Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128GB RAM running Ubuntu 20.04.
This deployment is on magma 1.6.0 and will eventually be rolled into an already operational private LTE network.
Key Challenges
The key references for deployment come from these resources:
The challenges I describe below arose during the course of following these directions. From the dialogs in the slack channels, I'm not alone. There are some error messages that I ran into that I don't discuss here either because they were trivial or they didn't seem to cause any problems (yet).
AGW System Constraints
The baremetal AGW requires the network interfaces be specifically named. The SG1 interface must be named eth0 and the S1 interface must be named eth1. In addition, netplan, the default Ubuntu 20.04 networking scheme, must be disabled and replaced with ifupdown. If the system is not initialy configured this way, the agw_install_ubuntu.sh script will roll back to ifupdown and rename the interfaces. However, due to constraints on our networking environment, the automated reconfiguration assigned the wrong ports to those names. That issue required that I manually convert to ifupdown and appropriately rename prior to running the install script. Instructions for this conversion can be found here. My /etc/network/interfaces file looks like this:
Certificates
Documentation for handling certificates is somewhat unclear in the baremetal instructions. Portions of the AWS Orc8r documentation apply, however there are few things that don't:
The Orc8r deployment script, deploy.sh, creates the certificates for Orc8r and stores them in /etc/orc8r/certs.
The rootCA.pem certificate to install on the AGW is there.
The admin_operator certificate is also there but you still need to run:
openssl pkcs12 -export -inkey admin_operator.key.pem -in admin_operator.pem -out admin_operator.pfx
to generate admin_operator.pfx for browsers.
The AGW certificate, gateway.crt, and its key, gateway.key, are generated when the AGW checks into the Orc8r. They won't be there until after you complete the bringup process and the gateway has successfully checked in.
Helm Charts
To build and publish helm charts, I used option #2 from here. The repository needs to be named magma-charts. I made the repository public since the package script did not seem to pick up the credentials. Since I have an LTE network, my package script command line was:
${MAGMA_ROOT}/orc8r/tools/helm/package.sh -d all
A temporary(?) anomaly is that the chart versions are 1.5.23 not 1.6.0 as expected.
The Ansible Variable File
From the slack forum, many people seem to struggle with setting ansible_vars.yaml variables to get them to work. Here is an anonymized working version of mine:
IP Connections
In my environment and without using public IP addresses, it was difficult to create an IP connection between the Orc8r and the AGW. After trying a number approaches, what finally worked best was to create a wireguard VPN on 192.168.1.0, assign the metallb addresses to be in this subnet and route AGW traffic to those services through the VPN. To do this persistently, see Section 3 here.
For debugging IP and DNS, these commands were invaluable:
Their output looks like:
DNS
Getting the DNS right from within the Orc8r k8s cluster and between the AGW and the Orc8r were the most challenging (especially with my lack of DNS experience). I wanted to use a private domain name and didn't want dependency on an external DNS server. What I did:
Deployed a bind9 DNS server on the Orc8r host node and updated netplan to point to its public address as a nameserver. Here's the config file:
Updated the k8s cluster coredns and localnodedns pods to use it as the only forwarding nameserver. Also having 8.8.8.8 as forwarding nameserver caused issues with internal resolution within the cluster.
Coredns :53 Block:
Nodelocaldns :53 Block:
To be safe, I also added the load balancer IPs to the /etc/hosts of the AGW:
Don't forget to create an Orc8r and NMS user and import your admin_operator certificate into your browser.
Once you get this far you should be able:
Here's the NMS dashboard for the network at this stage. I have not yet added an eNodeB although I have connected it to the S1 interface.
Suggestions for Improvement
Please feel free to share your comments and tips of your own!
Beta Was this translation helpful? Give feedback.
All reactions