-
Notifications
You must be signed in to change notification settings - Fork 656
iPXE boot doesn't load cloud-config (DHCP based DNS is not being used) #1790
Comments
oh wonderful - that change was made to fix AWS cloud-init. |
@gizmotronic - can you post the log file of boot with |
@SvenDowideit It took me a bit to figure out how to get past the rate limited printk, but here it is: rancheros-dmesg.txt This was really helpful. My cloud-config can't load because the host it's coming from is on my internal network, which can't be resolved by Google DNS. I don't expect it to be using Google because my DHCP server is providing the necessary (private) DNS configuration. In case some future reader is wondering how I turned off the rate limiting, I added |
yes, that's not good. can you add your local dns server seting to the boot cmdline for now? |
this might be the root cause of something I'm trcking down atm - thank you for the analysis |
I've confirmed that setting the local DNS server is an effective workaround. |
argh! it works for my server (on an installed disk), guess I need to kick it with pxe-dust
|
yup, and the dns servers are also set and working at cloud-init-save when I pixieboot it with a local dns only datasource url :(
|
@gizmotronic I guess the next question is - what do oyu get when you run |
The VM I'm using (typical of my environment) is diskless. The problem doesn't happen when a disk is attached, whether with a local install or used as a state partition. I'm readily able to reproduce the problem with v1.0.0, but it appears to be resolved in v1.0.1. I've just booted v1.0.1 with 6 different configurations and had no trouble with any of them. I was also able to build and boot ToT without any trouble. Any ideas on what might have changed to fix this? |
we think there's a race condition when there's a delay in the DHCP - but so far, its all theories - I can't manage to make it happen. |
@gizmotronic I made a test build with #1921 in it - see https://github.com/rancher/os/releases/tag/v1.1.0-test1 is there any chance you could see if this also solves your problem? hopefully, its related to #1812 |
I had no trouble booting v1.1.0-test1 in my test VM. |
Thanks @SvenDowideit I have hit on the same problem, you test build solve the problem. Could we release this asap? |
Have you tried v1.0.3? It contains one part of the 1.1.0 test changes. |
I'm hoping this will be r resolved by #1921 |
I guess I did, only 1.1.0 works. But I could double check when in office tomorrow. |
I run my test again, using https://releases.rancher.com/os/latest/vmlinuz |
Sorry, my fault! I have an disk attach to my machine, and it is format with RANCHER_STATE. It stores the config I loaded early! New test shows: only v1.1.0-test work! |
Even v1.1.0-test is no more working. :( |
#1921 backported to v1.0.4 too |
Hi @SvenDowideit I run v1.04. This is still not working. Here is my boot script #!ipxe set base-url http://10.10.10.1:8000 kernel ${base-url}/vmlinuz rancher.autologin=tty1 rancher.state.dev=LABEL=RANCHER_STATE rancher.state.autoformat=[/dev/sda,/dev/vda] rancher.cloud_init.datasources=[url:${base-url}/cloud-config] initrd ${.base-url}/initrd [rancher@rancher ~]$ sudo ros config export EXTRA_CMDLINE: /init rancher: autologin: tty1 cloud_init: datasources: - url:http://10.10.10.1:8000/cloud-config environment: EXTRA_CMDLINE: /init state: autoformat: - /dev/sda - /dev/vda dev: LABEL=RANCHER_STATE ssh_authorized_keys: [] [rancher@rancher ~]$ wget http://10.10.10.1:8000/cloud-config Connecting to 10.10.10.1:8000 (10.10.10.1:8000) cloud-config 100% |*************************************************************************************************************| 531 0:00:00 ETA [rancher@rancher ~]$ cat cloud-config #cloud-config hostname: host-119 ssh_authorized_keys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbw3HEUCApEnStLH5NibRhP6KipG3l8ENCdXTBnDzQ51dUsD/sVgEIA1OwJUcEcNWCgSbnP7GE7hdsRfySNUjNcGDEIv70uR59b0r/nJ6ySgAcRL9RlvuiW/Vas7ZUS6JW/8uOVrb1D32Z0pV804nAU4Afym3NiIpH9GqSZMg9Etge764pT8aiWMx1RKl8UiYznIuBnT/gzWGOnm+s/udRAx9g8xAYd67Gzw4H05RlnR/3yHOdMTXJhlcovsDOpoKBEsG+MmM2W/S9G/ia84zbfWkUSI7bLy+UxMs6nJEAYCr66JWTr+EB4IpOYmiu5H6cyuZhSTy9QnHRGigv6RUP liyi.meng@ericsson.com rancher: network: interfaces: eth0: dhcp: true[rancher@rancher ~]$ cat /etc/issue , , ______ _ _____ _____TM ,------------|'------'| | ___ \\ | | / _ / ___| / . '-' |- | |_/ /__ _ _ __ ___| |__ ___ _ __ | | | \\ '--. \\/| | | | // _' | '_ \\ / __| '_ \\ / _ \\ '__' | | | |'--. \\ | .________.'----' | |\\ \\ (_| | | | | (__| | | | __/ | | \\_/ /\\__/ / | | | | \\_| \\_\\__,_|_| |_|\\___|_| |_|\\___|_| \\___/\\____/ \\___/ \\___/ \s \r RancherOS v1.0.4 \n \l eth0: 10.10.10.119 eth1: 10.168.122.222 lo: 127.0.0.1 [rancher@rancher ~]$ |
@SvenDowideit BTW, for a personal question, are you the only guy work on RancherOS now? :) |
I just re-did the handling of resolve.conf during boot, it should make dhcp based DNS much more functional it'll be released in v1.1.0 (ga) this week. |
This has resolved the issue for me. I was having trouble with v1.0.4 late last week but v1.1.0 boots without issue. Thank you! |
This is still not working if you have more than one interface on your machine. To reproduce, create a KVM VM with two interfaces, pxe boot from one of the interface. In case you also have another dhcp server running on another interface, it is almost 100% fails. I will propose RancherOS add another kernel parameter to indicate from which interface that the iPXE boot is supposed to happen. |
@liyimeng can you please raise a new issue for that - I keep forgetting that there's an extra problem |
RancherOS Version: (ros os version)
0.9.2-rc2 and later
Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.)
vSphere Hypervisor 6.5.0 (ESXi) virtual machine, iPXE, diskless
RancherOS loads on a diskless VM using iPXE, but starting with 0.9.2-rc2, it's unable to load the cloud-config specified using the rancher.cloud_init.datasources kernel command line parameter.
I've isolated the problem to changeset 79a7e59. Reverting this change allows 1.0.0 to work normally.
The text was updated successfully, but these errors were encountered: