Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPi3B+ PXE boot failures #991

Closed
mangodan2003 opened this issue May 8, 2018 · 29 comments
Closed

RPi3B+ PXE boot failures #991

mangodan2003 opened this issue May 8, 2018 · 29 comments
Labels
Bootcode/Netboot/PXE Specific flag for issues with the bootcode and network booting Waiting for internal response Waiting for response from OP

Comments

@mangodan2003
Copy link

mangodan2003 commented May 8, 2018

We're booting a bunch of PXE client RPis using internal ROM and seeing apparently random boot failures with various different modes of failure. One of the odd things about this is that they seem to be more prone to failing to boot at the start of the day. Having tried one boot, power cycling multiple times after seems to be more reliable. I cannot see the RPi itself would be effected by the elapsed time between boots as it would have no way (that I am aware of) to know this time. So I assume it is related to something like issuing DHCP leases, or ARP entries/cache etc. I wonder if anybody else can suggest anything it might be that we should try to investigate.

Setup, one RPi Master with SD card which runs dnsmasq as a dhcp proxy/tft server nfs server etc.
presently 8 clients without any SD card. All have 3 Amp PSU located as a board mounted to the RPI (no long skinny wires with high voltage drops).

We allow the Master to boot first as although when all powered together they sometimes all boot there is a race condition between the usual 5 tries at DHCP of the client bootloader and dnsmasq proxy of the master being ready.

All pis :
vcgencmd get_throttled
throttled=0x0

failure modes :

  1. Apparently no attempt whatsoever as a DHCP client - no DHCPDISCOVER received by master.
  2. only one attempt made, OFFER made, nothing further, no DHCPREQUEST.
  3. failure to boot if two or more clients offered same IP (posted as a separate issue RPI3B+ bootloader possibly doesn't handle dhcp correctly #986 and worked around for now).
  4. Boot fails if USB clients attached (worked around by powering up after booting)
  5. occasional boot issues later on but think they are outside the scope of this issue.

Often a few pis fail to boot at a time, and seemingly each exhibiting the exact same mode of failure.

We have 2 of these setups (1 as a test rig on my bench, the other on site) and have tried a few Ethernet configurations:

2 x netgear prosafe gs108 or 1 x GS116 - the first option seems more reliable but more testing is needed to be sure.

1 x netgear prosafe gs108 and a ZyXEL GS1900-BHP.

I'll follow up with some tcp dump data.

@pelwell
Copy link
Contributor

pelwell commented May 8, 2018

Duplicate of #986?

@mangodan2003
Copy link
Author

mangodan2003 commented May 8, 2018

well, no. As mentioned above, that one covered a specific case, this, whilst mentioning that case covers over modes of failure.

@maxnet
Copy link

maxnet commented May 10, 2018

One of the odd things about this is that they seem to be more prone to failing to boot at the start of the day

I know someone else also claimed he needed to generate some artificial broadcast traffic for things to work properly:
https://www.raspberrypi.org/blog/piserver/#comment-1375406

Never needed to myself though.
And the typical network has plenty of that already.

@ghollingworth
Copy link
Contributor

One of the bugs in the bootrom is that there is a fifo in the ethernet controller which doesn't get flushed, so if you only receive a single packet it will not get into the bootrom. So you need to occasionally receive broadcast packets to flush through the others...

I've tried in the past using ping -b on the server to send occasional broadcast packets on the network... Like maxnet says, it depends on your network and whats on there...

Gordon

@mangodan2003
Copy link
Author

Yes I've seen the comments re broadcast packets and have done some limited experimenting with a broadcast ping running but it didn't seem to help much if at all.

Is anybody booting multiple clients and not seeing any issues?

@maxnet
Copy link

maxnet commented May 11, 2018

Is anybody booting multiple clients and not seeing any issues?

You emphasize multiple clients.
Does that mean that you are not seeing all those issues if you let the clients boot after each other, instead of powering up all at once?

Can imagine that to be the case with the IP being assigned to multiple clients problem.
(probably caused by the bootrom cutting too many corners, and starting to use the IP immediately upon receiving the dhcpoffer without accepting the lease properly by dhcprequest. In which case the DHCP server is indeed free to offer it to other clients as well.)
But not for your other issues.

@mangodan2003
Copy link
Author

We've worked around the IP being assigned to multiple clients issue by setting up reserved IP/MAC address assignment on the DHCP server. But that was only one of several ways in which boot could fail early on.

Having previously tried and given up with the original RPI 3, I bought 5 RPI 3B+ when they came out to test with. With 4 clients and 1 master on my bench for two weeks I hadn't noticed any problems. We Then got 3 more to test on site with. It seemed ok. Having scaled up 8 clients on site it became apparent there were still problems. I subsequently bought another 4 to see if I could reproduce the problem on the bench (bringing the number of clients up to 8 also), and found that indeed there are still issues with booting. Whether its due to the number of clients booting simultaneously or just that we are more likely to see a failure when we have more of them I don't know. It's hard to investigate, particularly as a lot of the time it does work, and as mentioned above it seems more likely that there will be a failure when the system has not been booted for some time.

The most frequent failure mode seen now is that a client doesn't boot having either not tried at all for DHCP or only tries once or twice, where as normally it will try upto 5 times.

When I have a bit more time to study this I'll go back to trying more thoroughly with fewer clients to see if there the failures definitely do only occur when booting more then one or a small number of clients.

In the mean time I have worked around this by creating a boot script to initialise and subsequently keep in sync an empty microSD card if inserted which has the added advantage of better boot times after the initial successful PXE boot.

@maxnet
Copy link

maxnet commented May 11, 2018

The most frequent failure mode seen now is that a client doesn't boot having either not tried at all for
DHCP or only tries once or twice, where as normally it will try upto 5 times.

And that is also the case when only using your simple unmanaged Netgears, right?

As the moment you start adding managed switches to the mix, functionality like the spanning tree protocol can drop packets for up to 30 or so seconds after the network link goes up, before port state goes to forwarding.
(Normally not a problem when network booting normal computers, as PXE specification requires clients to keep sending DHCP discovers for a minute. But bootrom does not follow formal spec).

@ghollingworth
Copy link
Contributor

Is this just another timeout problem, where you have 10 Pi's in a single connection you've got to reply to every device within the first 30seconds or so with bootcode.bin

I wonder if there is some rate limiting in the DHCP server which means it won't reply to all the devices?

Might try setting up a bigger network here to understand what has happened and whether this is a problem or whether it's failing with dropped packets or something?

@mangodan2003
Copy link
Author

I don't think there is any such limiting on the DHCP server - especially as a lot of the time it works - as in all clients boot. It's just not reliable - as in sometimes some clients don't boot.

I do not have any managed switched hubs on the network here, all basic switched hubs. There might be on site. I'll check later.

I have just run into another problem which I'll check existing bugs for (and may not be related as is later on once linux is running) but maybe is involved with the failures later in boot that I mentioned above. Whilst trying to rsync the system to an microsd card I am frequently seeing hangs, the host becomes unreachable. Having connected keyboard and display to one of them and reproduced the issue sometimes there is a kernel oops. Also keyboard no longer responds so possibly takes out USB altogether or maybe even the whole OS. Sometimes the keyboard will still input and there is no apparent oops. I am trying to repeat this situation to see if its possible to get Ethernet back when in this state but suspect as the system is hosted via Ethernet that there will be a limit to what I can do before it hangs trying to reach the server.

I do wonder if what ever causes this bug is either hardware related and so effecting the bootloader as well or software related with the same/similar bugs in both the bootcode and kernel code.

@mangodan2003
Copy link
Author

I'm now trying to reproduce it just using iperf / no writes to SD card, If I cannot ill try boot from SD card and try.

@maxnet
Copy link

maxnet commented May 16, 2018

I do not have any managed switched hubs on the network here, all basic switched hubs.

The Zyxel model you mentioned is advertised as managed.
But if you have same problem when only using the Netgears, that is not the problem.

BTW I also can imagine it may take longer for some DHCP servers to reply if multiple requests come in, as DHCP servers tend to do ping checks to see if the IP they intend to hand out is really free, and some only do one check at a time.
You may want to experiment with a no-ping setting if the DHCP server used has one.

@mangodan2003
Copy link
Author

If it helps I'm using dnsmasq as the dhcp server both as the main DHCP server for the LAN locally and on site and as the dhcp-proxy on the master RPi s in both locations. This is the only software acting as DHCP server for any of these tests.

Apologies re the the Zyxel. I know it's a PoE switch, I had not realised it was managed. I will double check later and eliminate it from the test setup.

@mangodan2003
Copy link
Author

Yes as pointed out the Zyxel is managed. I've removed it from my test rig, just netgear prosafe GS108 and a GS605 now. so far no boot problems encountered but it's too early to tell. as previously mentioned the problem seems more likely to occur when considerable time (hours) has elapsed since previous boot.
Also on site (where the problems seem more frequent) there are no managed switched hubs.

@maxnet
Copy link

maxnet commented May 18, 2018

If it helps I'm using dnsmasq as the dhcp server both as the main DHCP server for the LAN locally and
on site and as the dhcp-proxy on the master RPi s in both locations.

dnsmasq skips ping checks if you have static IP assignments, so then that's not the problem either.

==

Another point of concern is the server side of things.
You are network booting 8 client Pi's off a single server Pi which has an amount of available network bandwidth that is kinda little for a server.

If bandwidth is maxed out it will start dropping packets.
And I am not sure how well the TFTP client code of the bootrom deals with that, and retries properly.
If I simulate 0.1% packet loss by doing on my own test server tc qdisc add dev eth0 root netem loss 0.1% my Pi 3+ fails to boot as well, so I am guessing not so good...

(I recall dnsmasq only prints to log when a TFTP transfer completes, and not when one starts.
So if you only see one DHCP discover/offer sequence in the log, but nothing about TFTP it can also mean that it did start a TFTP transfer for bootcode.bin but stalled somewhere.
Can only see what is really going on by sniffing packets)

@mangodan2003
Copy link
Author

yes I'm aware there is a potentially a bottleneck with the initial UDP based boot process.. however what seems curious to me is that most of the time this is not a problem, for example I have logged 19 boots today, all passed the initial boot stage, on 2 occasions hosts failed later on in boot or having booted (they seemed to have locked up, or Ethernet locked up - but this is a separate topic).

I can be reasonably confident that if I leave it all on or off now and don't try booting/power cycling again till tomorrow (or more likely monday) morning that one or more of them will give up booting very early on having not even obtained an IP address and not tried the full 5 times to do so. This is what I find strange and don't understand.
I'll likely then be able to boot them over and over without a problem.

@maxnet
Copy link

maxnet commented May 18, 2018

having not even obtained an IP address and not tried the full 5 times to do so.

How did you conclude it did not obtain an IP-address?
Did you sniff packets and verified there was not any traffic after the first discover/offer sequence?

Or are you drawing that conclusion by looking at dnsmasq logs?
Those can be a bit hard to interpret in this case.
Given that the bootrom never properly claims the address offered by DHCP request (it just starts using the address), so no log entries for that.
And no log entries for the tftp traffic if the transfer did not complete.

@mangodan2003
Copy link
Author

I'm using dhcpdump, however having just checked again I am now not seeing what I had or thought I had been seeing.. though as you say I had previously been relying on the dnsmasq logs (and via syslog) so that may well have led to some confusion. I will check again with dhcpdump when I next boot it.

I thought I had been seeing a DHCPDISCOVER,DHCPOFFER(from main server and from rpi master),DHCPREQUEST,DHCPACK.. and sometimes DHCPDISCOVER a number (up to 5) of times before boot starts. but sometimes when boot fails early on I would only have seen one or 2 DHCPDISCOVER,DHCPOFFERs and nothing further.. I was expecting to see a DHCPREQUEST,DHCPACK but if the bootrom doesn't do these then i see what you are saying.

so it seems the ,DHCPREQUEST,DHCPACK parts don't come till later when the OS dhcp client runs. I'm feeling somewhat tried and confused now, I could have sworn the DHCPREQUEST,DHCPACK had previously been happening too soon before to be from the OS. I've also just run rpi-update in hope of alleviating the Ethernet issues later but guess that can't have changed anything as all the early boot stuff is embedded in ROM?. I have already noticed some delays and inaccuracies of the timing of logs via syslog so I'll refrain from using that at all now and rely solely on dhcpdump / tcpdump as they seem to just be more confusing than helpful with this.

@maxnet
Copy link

maxnet commented May 18, 2018

so it seems the ,DHCPREQUEST,DHCPACK parts don't come till later when the OS dhcp client runs.

Yes, DHCP is done 3 times.

  1. by bootrom
  2. pretty soon after that by Linux kernel so that it can mount nfsroot
  3. by dhcpcd under Linux, which is then responsible for renewing it so it does not expire

I think you got confused by the acks from 2 and 3.

@ghollingworth
Copy link
Contributor

ghollingworth commented May 19, 2018 via email

@maxnet
Copy link

maxnet commented May 19, 2018

Attached a packet capture (tftp traffic only) with packet loss as well as a normal network boot without loss for comparison.
Can view the files with wireshark.

Packet loss is while downloading start.elf in block 95
Seems dnsmasq tries to resend that block 2 seconds later because it does not get acknowledged, but bootcode does not seem to care anymore about it, and already continued to download fixup.dat instead after only 1 second.

Is with a Pi 3+
Fails with a 4 blinks led pattern.

packetcaptures-tftp-only.zip

@ghollingworth
Copy link
Contributor

Ah OK, it doesn't handle dropping an acknowledgement packet, that'll be the problem. Although this is the same as Pi 3...

I'm guessing the more Pis you attach to a network the higher the probability it'll fail

@maxnet
Copy link

maxnet commented Jun 29, 2018 via email

@JamesH65
Copy link
Contributor

JamesH65 commented Jan 9, 2019

Any thing to add from any of the participents? Is this closeable?

@JamesH65 JamesH65 added Waiting for response from OP Waiting for internal response Bootcode/Netboot/PXE Specific flag for issues with the bootcode and network booting labels Jan 9, 2019
@maxnet
Copy link

maxnet commented Jan 10, 2019

Any thing to add from any of the participents? Is this closeable?

While it should now deal better with packet loss (due to pelwell's fix in the linked issue), I doubt the other issue mentioned: that bootcode starts using the IP-address without claiming it (meaning DHCP server is free to hand out same IP-address to other clients booting at same time) ever got fixed.

@ghollingworth
Copy link
Contributor

That's true although DHCP servers won't hand out the same IP address to two different requests within a small period of time. They temporarily reserve the address while waiting for the acknowledgement. It will get then get ackd when the Linux kernel boots

@maxnet
Copy link

maxnet commented Jan 10, 2019

That's true although DHCP servers won't hand out the same IP address to two different requests within a
small period of time.

Negative.
They are not required to.

I recall there was a thread in the dnsmasq mailing list about this topic some time ago, but will need to look it up.

@maxnet
Copy link

maxnet commented Jan 10, 2019

By default dnsmasq operates in a mode in which it does some hashing magic on the MAC address to attempt to give a client the same IP over time, if it is still available.
No reservations between OFFER and REQUEST are done in that mode.
If multiple clients boot at the same time, the only reason why it works most of the time, is that you are having the dumb luck that their MAC address hash to a different IP.

It does have a mode to do sequential IP assignment instead:

--dhcp-sequential-ip

Dnsmasq is designed to choose IP addresses for DHCP clients
 using a hash of the client's MAC address. This normally allows a
 client's address to remain stable long-term, even if the client sometimes 
allows its DHCP lease to expire. In this default mode IP addresses are 
distributed pseudo-randomly over the entire available address range. 
There are sometimes circumstances (typically server deployment) 
where it is more convenient to have IP addresses allocated sequentially, 
starting from the lowest available address, and setting this flag enables 
this mode. Note that in the sequential mode, clients which allow a lease 
to expire are much more likely to move IP address; 
for this reason it should not be generally used.

It does do a little bit of soft reservation in that mode, but it is not the default, and the reservation is not in name of the full MAC address either, but on a hash as well.

==

[Dnsmasq-discuss] multiple offers with same IP to different MAC addresses
http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2016q1/010353.html

dnsmasq author:

The log you include is exactly how it's supposed to work in the case of
a hash collision: whichever client REQUESTs the address first gets it,
and the other one gets a NAK and has to try again.

==

So you should be prepared for the situation that the address can be handed out to multiple clients.
If you have not claimed it by DHCPREQUEST, and received an ACK (and not a NAK), it is not yours to use.

@ghollingworth
Copy link
Contributor

Fair enough...

Ah well, good to know... But unfixable. For any other PXE boot issues, it's worth rpi-updating and trying again, a couple of issues have been fixed... The first was a problem with TFTP across subnets, the other was a rebooting issue after a crash (or occasionally a really short power cycle) which would fail with a flashing LED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bootcode/Netboot/PXE Specific flag for issues with the bootcode and network booting Waiting for internal response Waiting for response from OP
Projects
None yet
Development

No branches or pull requests

5 participants