Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXE RPi3 B+ - TFTP in different subnet (similar to #983 and #670) #1078

Closed
antonio-c-mariani opened this issue Nov 30, 2018 · 17 comments
Closed
Labels
Bootcode/Netboot/PXE Specific flag for issues with the bootcode and network booting

Comments

@antonio-c-mariani
Copy link

Our campus network has many sub-nets. It has two centralizes servers, one for DHCP and another one for PXE+TFTP+NFS. We have been using this architecture for many years to remotely load Linux clients. Now we intend to use the same infrastructure to load the kernel and the Raspbian for RPI3 B+. We successfully configured the servers and were able to load the Kernel and Raspian, but only because our Cisco router is configured to use ARP proxy.

The image below shows that the RPI3 B+ makes an arp request looking for the sub-net router (10.1.1.1) and then correctly starts to load the "bootcode.bin" file from remote TFTP server (192.168.56.101).
pxe1

After the "bootcode.bin" file is loaded another arp request is made but this time looking for the TFTP server (192.168.56.101) as if it were on the same network where the client (RPI3 B+) is.
pxe2

Thanks to the arp proxy at Cisco router the RPI3 B+ can load the remaining files and boot normally.

The issues #983 and #670 talks about the same problem related to RPI3 B. This problem remains in the B+ version? Is this problem related to "bootcode.bin" file? Is it possible to fix it?

@Flole998
Copy link

Flole998 commented Dec 5, 2018

Yes it is related to the bootcode.bin, yes it's still there in the B+, yes I am also having it and yes it is possible to fix it (if someone wants to do it). I'm going to return quite a lot of devices this week so this will definitely be noticed somewhere. Also I am having issues when a kernel panic happens (I have watchdog enabled, not sure if that makes a difference though) it stops at loading start.elf and hangs there forever, if you could confirm this with the newest bootcode it would be helpful in terms of saying that's another issue in the Netboot functionality.

My experiences can be found at the bottom of #859 if you are interested. Someone with an older bootcode doesn't have the start.elf issue there.

@antonio-c-mariani
Copy link
Author

We are using the "bootcode.bin" that comes with Raspbian-2018-11-13. It seems to be the same as it is in the master branch of github (from a month ago).

@Flole998
Copy link

Flole998 commented Dec 7, 2018

Same here, but we have to seperate between 2 issues here: The TFTP on a different subnet, which I am having with the bootcode.bin and the "stuck at start.elf"-issue after a kernel panic which only I seem to have (it would be awesome if you could check that aswell, provoke a kernel panic and see if it comes up fine again or get's stuck after loading start.elf).

@antonio-c-mariani
Copy link
Author

I would appreciate it if we could focus on the first issue here (the ARP request).

Do you think there is any other information needed to identify and fix the problem?

@Flole998
Copy link

The problem is already identified (at least to me it's clear what's happening here), so all that's needed is someone with access to the sources to fix this.

@antonio-c-mariani
Copy link
Author

Recently ghollingworth said (#859):

... the router problem was fixed with the 3B+

Well if the gateway is advertised in the DHCP reply then it should use it... Obviously it'll still do an ARP for the gateway itself but shouldn't ARP for the TFTP server, just go through the gateway.

In our case the DHCP server (dnsmasq) configuration includes:

dhcp-range=10.1.1.2,10.1.1.100,255.255.255.0
dhcp-boot="bootcode.bin","192.168.56.101",192.168.56.101
dhcp-option=3,10.1.1.1

Even so, RPI 3B+ is looking for the tftp server on the local network instead of just "go through the gateway", as shown in the pictures above. Are we doing something wrong? We appreciate any answer about that.

@Flole998
Copy link

@ghollingworth was talking about the bootrom, the bootcode itself isn't fixed yet. So no you're doing nothing wrong (instead of maybe not having returned the devices yet), this is still a bug and still not fixed (and also nobody said that someone started working on this yet, so probably nobody is looking into this yet)

@antonio-c-mariani
Copy link
Author

Maybe the issue is a bit more tricky. I've booted a rpi3b+ with a SD card containing only boot/bootcode.bin file. The image below shows that after loading the bootcode.bin the RPI3 B+ makes an arp request looking for the sub-net router (10.1.1.1) and then correctly starts to load the remains files from remote TFTP server (192.168.56.101).

rpi3

It seems the bootcode.bin is working fine. But then, who is messing things up?

I'd really appreciate hearing some comments from the Raspberry staff. We depend on this to decide the next step.

@ghollingworth
Copy link
Contributor

I've been trying to reproduce, but I'm having trouble setting up a suitable network... Now done for Christmas, so won't be able to get back to it until the new year

@Flole998
Copy link

Great to hear that you are working on it for everyone who's still suffering from that. The network setup shouldn't be too complex, it's even possible with 2 raspberries (plus one to test the bootcode). Anyways, happy holidays and hear from you next year!

@antonio-c-mariani
Copy link
Author

I set up a testing environment using a laptop configured as a DHCP server (dnsmasq - 10.1.1.1/24) and a virtual machine (VirtualBox) as a TFTP server (tftp-hpa - 192.168.56.101).

I wish you all the best. Thanks.

@ghollingworth
Copy link
Contributor

So after learning all about iptables, nf_nat_tftp and nf_conntrack_tftp I've finally got it reproduced on my desk!

Ten minutes later I've got a fix! Although it requires re-sending the DHCP request / reply which may make the process a little slower...

Can you check this and I'll push the change

bootcode.zip

@Flole998
Copy link

Flole998 commented Jan 3, 2019

Just checked it on 2 RPI 3B+ and it works, unfortunately the start.elf issue is still there (Watchdog resets (not sure if this is the trigger for that, I did a reboot and it did that there aswell), bootcode.bin is downloaded, bootsig.bin is NAKed, start.elf is NAKed, pi gets stuck and doesn't request anything else).

@ghollingworth
Copy link
Contributor

OK, I've been able to reproduce the problem, am trying to understand it, can you create a separate issue for it? Then we can make sure people see this one just related to the booting issue.

I'll close this once I've submitted a patch and it's been pulled...

@antonio-c-mariani
Copy link
Author

I've tested the new bootcode.bin and it is seems ok.
Thanks.

@Flole998
Copy link

Flole998 commented Jan 4, 2019

By the way: UART was still enabled, causing Kodi to completely not work (probably because the GPU Debug output is slowing everything down), that should be disabled before this gets pushed

popcornmix added a commit that referenced this issue Jan 9, 2019
kernel: ASoC: Add support for AudioSense-Pi add-on soundcard
See: raspberrypi/linux#2793

kernel: USB Audio: generic DSD detection for XMOS-based implemtations
See: raspberrypi/linux#2790

firmware: Added ability to have an third transpose buffer
See: #837

firmware: isp: Correct the conversion tables changed in adding the gamma block
See: #1084

firmware: raspberrypi_full variant: Drop unused Camplus sw stages

bootcode: Reset WiFi and BT devices before resetting the expander
See: #1088

bootcode: Fix Ethernet boot on a different subnet
See: #1078
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Jan 9, 2019
kernel: ASoC: Add support for AudioSense-Pi add-on soundcard
See: raspberrypi/linux#2793

kernel: USB Audio: generic DSD detection for XMOS-based implemtations
See: raspberrypi/linux#2790

firmware: Added ability to have an third transpose buffer
See: raspberrypi/firmware#837

firmware: isp: Correct the conversion tables changed in adding the gamma block
See: raspberrypi/firmware#1084

firmware: raspberrypi_full variant: Drop unused Camplus sw stages

bootcode: Reset WiFi and BT devices before resetting the expander
See: raspberrypi/firmware#1088

bootcode: Fix Ethernet boot on a different subnet
See: raspberrypi/firmware#1078
@popcornmix
Copy link
Contributor

Potential fix now in latest rpi-update firmware

@JamesH65 JamesH65 added the Bootcode/Netboot/PXE Specific flag for issues with the bootcode and network booting label Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bootcode/Netboot/PXE Specific flag for issues with the bootcode and network booting
Projects
None yet
Development

No branches or pull requests

5 participants