Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPi 3 "diskless" network boot ignores DHCPOFFER #894

Closed
hlev opened this issue Oct 24, 2017 · 5 comments

Comments

@hlev
Copy link

commented Oct 24, 2017

I have not been able to make this work reliably with my desired setup:

DHCP server (router) - 192.168.102.1
TFTP server (dnsmasq, a host on the network), for example 192.168.102.99
RPI3 in the same physical LAN

Tried setting in the DHCPOFFER:

  • both next-server (isaddr) and Option 66 to TFTP server address
  • only Option 66 to TFTP server address

Both work intermittently.
wireshark dissections and tcpdump show that the DHCPOFFERs acted upon are practically identical to the ones ignored.

Akin to #862 where it was mentioned that a random broadcast ping on the subnet usually fixes the Pi's behavior, I noticed that where the router makes an ARP request between the Pi's DHCPDISCOVER and its DHCPOFFER response as in "Who has <IP_i_know_previously_was_associated_with_this_Pi_but_not_currently_leased>? Tell router." then it works, so if there is no valid lease the Pi boots fine. Also if I manually remove the lease on the router and reset the Pi it boots fine. An ARP broadcast/unicast request is sent depending on whether the Pi's MAC is in the ARP cache of the router and the Pi proceeds with the offer and loading bootcode.bin in both cases.

But if the Pi is being reset when the lease is in "bound" status, the ARP request is not sent, just a DHCPOFFER for each DHCPDISCOVER (4).

I can make it work with dnsmasq on the TFTP server doing DHCP too (only with dhcp-reply-delay though), but the point is that I'd like to figure out whether I can deploy to a LAN where I don't necessarily control the upstream router that normally handles DHCP for every host. Assuming it also sends Option 66 pointing to my TFTP server I would only rely on that to boot the Pis and would not mix in a second concurrent DHCP server.

Could someone confirm whether this is possible to be resolved at all given this seems entirely 1st stage bootloader-bound?

@ghollingworth

This comment has been minimized.

Copy link
Contributor

commented Oct 27, 2017

Does adding the dhcp-reply-delay fix the problem reliably?

If it does then the problem is to do with a bad bit of logic in the code where I check the ip_address, tftp_address and timeout in the dhcp_reply function (i.e. waiting for a reply from a dhcp server)

ipaddr = 0;
tftpaddr = 0;
timeout = TIME + 1_SECONDS;

while(ipaddr == 0 || tftpaddr == 0 || !(TIME > timeout))
{
    wait_rx_packet();

    process_rx_packet();
}

Unfortunately this means it will keep repeating the loop until the timeout has elapsed even it it has already received an IP address and TFTP boot address. When you combine this with a bug in the wait_rx_packet() which means it only checks it's timeout when you receive a packet from the network. So no other devices on the network -> no network traffic -> no packets received -> no timeout -> never completing the wait_rx_packet -> never exiting this function!

The solution should be to make sure the device receives the odd broadcast packet (has to be broadcast otherwise it'll not make it through the packet filter), which I've tried in the past until ping -b but not absolutely sure this always works...

Gordon

@hlev

This comment has been minimized.

Copy link
Author

commented Oct 27, 2017

Thanks Gordon.

Yes, a dchp-reply-delay fixes it reliably. Your response as well explains another clue:

  1. I set dnsmasq to allocate IPs ending 11-20 in the /24 network. The upstream router allocates the whole range ending 2-254, itself being 1. Nothing special.
  2. At boot, the upstream router responds quicker, given dnsmasq is delayed.
  3. The Pi happily takes the IP the upstream router offers, but since the router's offer does not contain anything else, it keeps looping.
  4. When finally the offer from dnsmasq arrives, the Pi apparently "cherry-picks" the tftpaddr and Option 43 from the offer and proceeds with booting.

I only noticed this by looking at the dnsmasq logs, which showed the IP of the Pi fetching the boot files was not from the 11-20 range dnsmasq could possibly allocate, but one that the router picked. I was puzzled at first but it worked so all is well.

I guess this could be another possible bug in process_rx_packet() then? I would have expected the Pi to release the incomplete lease, and take the one that is whole.

@ghollingworth

This comment has been minimized.

Copy link
Contributor

commented Oct 27, 2017

Yes,

It will take the first IP address it is offered... Although I'm surprised it doesn't also overwrite with the second one.

Are you sure the dnsmasq isn't just enabled as a proxy and is therefore not providing a second IP address?

I'll add it to a list of things to test if I ever write a newer version of this!

Thanks

@hlev

This comment has been minimized.

Copy link
Author

commented Oct 27, 2017

I'm certain it wasn't just a proxy configuration, but it is not a major issue as the Pi can boot and then re-negotiate the IP with the preferred server.

Thanks for explaining the details of the flow. Let's close this one.

@hlev hlev closed this Oct 27, 2017

@hlev

This comment has been minimized.

Copy link
Author

commented Oct 30, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.