Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Consider agent-based node join #2476

Open
axsms opened this issue Jul 10, 2022 · 5 comments
Open

[FEATURE] Consider agent-based node join #2476

axsms opened this issue Jul 10, 2022 · 5 comments
Assignees
Labels
kind/enhancement Issues that improve or augment existing functionality

Comments

@axsms
Copy link

axsms commented Jul 10, 2022

Is your feature request related to a problem? Please describe.
Installing on various hardware in remote datacenters can take hours due to limited resources in the region or a higher cost to add resources close to target bare metal machines. Please remember that much of this hardware and underlying network is unmanaged or requires remote hands, which is very expensive. This would work on VMs too, of course.

Describe the solution you'd like
Create agent-based joining where a supported OS is installed on bare metal and the then installed agent is customized via cli or SSH to connect to admin designated Rancher or master node.

Possible Benefits:

  • OS and underlying hardware would be configured by standardized methods and workflows
  • OS ISO installs take minutes, where Harvester ISO is taking hours in some cases (Even with ISOs in the same DC)
  • Networking is confirmed at time of OS install in using standard methods and tools, allowing sysadmin types to validate public and private networks at this stage instead of adding Harvester to the calculus (On Slack, this seems to be a very common area of failure/confusion and therefore requires increased support that will/is likely a limit to adoption)
  • Troubleshooting underlying issues before Harvester is added increases ownership of issues by the individuals doing the work and lowers issues attributed to Harvester

Describe alternatives you've considered

  1. Current documented options for install are limited in some datacenter scenarios.
  2. Adding more interactive scripts to install and validate underlying configurations and test connectivity, including the cluster token, before the full install completes.

Additional context
Because we rent commodity hardware from unrelated datacenters, usually when deep price cuts are available, we do not have standardized infrastructure. We can always count on spine and leaf switches, but networking/connectivity issues are hard for us to troubleshoot with Harvester. Asking the DC to work with us on determining why our servers work when any other OS is installed, but not when we install Harvester is not possible for obvious reasons. It is for this reason that I believe an agent-based option would simplify matters and increase adoption rates. I don't know whether this is or would become a common use case. I do see a pulling back from lock-in with the usual hyperscale providers and a great desire from me and my clients to avoid their high prices for large projects.

I genuinely appreciate how committed the Suse team has been in delivering a fantastic product. I believe it has a great future. Unfortunately, I have spent MANY thousands of USD in various datacenters and manhours trying to get a consistent experience, in hopes of deploying dozens of clusters for long-term installs in this same manor. Yet, we have not been successful. I can chalk that up to a lack of understanding how to troubleshoot the issues. For now, I will table my company's interests in deploying Harvester and see if there are indicators that this will not be an issue for us in the future. Thank you for your time.

@axsms axsms added the kind/enhancement Issues that improve or augment existing functionality label Jul 10, 2022
@axsms axsms changed the title Consider agent-based node join [FEATURE] [FEATURE] Consider agent-based node join Jul 10, 2022
@axsms axsms closed this as completed Sep 2, 2022
@yasker yasker reopened this Sep 17, 2022
@yasker
Copy link
Member

yasker commented Sep 17, 2022

@axsms Sorry we haven't got time to look at this request yet. Is there any reason you've closed it? (besides it's taking a while...)

Just let you know, we're also working on #2346 (seeder) which should help with the lab provisioning.

@axsms
Copy link
Author

axsms commented Sep 17, 2022

@yasker, no worries. I should have commented as closing. I see that a number of my concerns are being addressed in other requests.

That being said, I had recently rethought the closing, since I do think agents may solve a number of issues and provide opportunities in remote management, similar to Remote Monitoring and Management (RMM). And if the agent were coupled with WireGuard — Wow.

#2346 seems to be a great start at what I am requesting, thanks for pointing it out. Though, I think it has opportunities in onboarding nodes and clusters, regardless of location. L3 is essential for us.

@yasker
Copy link
Member

yasker commented Sep 19, 2022

Hmm, provisioning over L3 is going to be a much bigger effort. And Harvester is designed to managed the hardware from the OS layer, it won't be possible to switch out the OS.

However, we're working on an alternative way of installing Harvester via cloud-init enabled disk image (instead of ISO). See #2198

@axsms
Copy link
Author

axsms commented Sep 20, 2022

#2198 looks pretty good. With bare metal options at edge, we are hoping to avoid the need for L2 infrastructure until it can be provided through Harvester, since most of the vendors we work with provide a VLAN and public IP blocks for each node. They will often option a second NIC, but there's no L2 available until we build it.

At the moment, we set up a hypervisor with necessary VMs to mimic a typical on-prem environment — virtual firewall (pfSense), LAN, DHCP, DNS, and VPN. We are hoping we can avoid using a node for that purpose when we really only intend on using Harvester at each DC. This is how we envision HCI at the edge. Are we off course in our thinking? Would #2198 and #2346 get us to where we think we can go?

@iosifnicolae2
Copy link

#2198 looks pretty good. With bare metal options at edge, we are hoping to avoid the need for L2 infrastructure until it can be provided through Harvester, since most of the vendors we work with provide a VLAN and public IP blocks for each node. They will often option a second NIC, but there's no L2 available until we build it.

At the moment, we set up a hypervisor with necessary VMs to mimic a typical on-prem environment — virtual firewall (pfSense), LAN, DHCP, DNS, and VPN. We are hoping we can avoid using a node for that purpose when we really only intend on using Harvester at each DC. This is how we envision HCI at the edge. Are we off course in our thinking? Would #2198 and #2346 get us to where we think we can go?

In our case (we're using Hetzner), it extremly complicated to do this because we have 1 NIC and their virtual networking solution does not allow us to have access to internet through the pfSense router..

It would be awesome if we can configure from Harvester 3 network types:

  1. mgmt network (no internet access, used by nodes to communicate between them)
  2. storage network (usually a switch with static ip addresses)
  3. internet network (this interface will be used to have internet access in VMs)
  4. internal VM network (this network will be used for VM-to-VM communication)

In our case, we will have something like this:

  1. mgmt network (static ip on mgmt-br.4000 + VIP static ip)
  2. storage network (static ip on mgmt-br.4001)
  3. internet network (DHCP mgmt-br) <- this is a custom use-case for Hetzner case
  4. internal VM network (static ip on mgmt-br.4002)

I've added more details here: #1762 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Issues that improve or augment existing functionality
Projects
Status: Evaluating
Development

No branches or pull requests

4 participants