Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTE usage broken with 20.7-RC1 (HBSD/FBSD 12.1) #67

Closed
tk-wfischer opened this issue Jul 22, 2020 · 33 comments
Closed

LTE usage broken with 20.7-RC1 (HBSD/FBSD 12.1) #67

tk-wfischer opened this issue Jul 22, 2020 · 33 comments
Assignees
Labels
upstream Third party issue

Comments

@tk-wfischer
Copy link

Important notices
Before you add a new report, we ask you kindly to acknowledge the following:

[X] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md

[X] I have searched the existing issues and I'm convinced that mine is new.

Describe the bug
When configuring a LTE connection like described in https://docs.opnsense.org/manual/how-tos/cellular.html, the system soon reboots after the LTE connection has been enabled. About 2 seconds after enabling the LTE connection, a lot of outputs runs through the console and after a while the firewall reboots. On the next login the dashboard mentions "A problem was detected. Click here for more information." (I did this and submitted the output multiple times).
Info: Full report of my tests can be found on the forum here: https://forum.opnsense.org/index.php?topic=17417.0

LTE worked without an issue with OPNsense 18.7, 19.1, 19.7, 20.1. Now, 20.7 causes the following issue:

To Reproduce
Steps to reproduce the behavior:

  1. Install OPNsense 20.7beta or 20.7-RC1
  2. Configure the modem via "Interfaces -> Point-to-Point -> Devices" as described in https://www.thomas-krenn.com/de/wiki/OPNsense_LTE_Verbindung#Konfiguration_Modem. For a Quectel modem use /dev/cuaU0.2, for a Huawei ME909u-521 use /dev/cuaU0.0
  3. Switch to "Interfaces -> Assignments" and configure for "WAN" the network port "ppp0". Click "Save"
  4. Immediately after that, on the console there is the following output (in bold): "WARNING: attempt to domain_add(netgraph) after domainfinalize" This is OK, this warning is displayed also with OPNsense 20.1, 19.7, ... (with those older versions LTE works without any issues)
  5. After 1-2 seconds a lot of outputs runs through the console and after a while the firewall reboots.

Expected behavior
When a LTE connection is configured the system should continue to run and not reboot.

Screenshots
If applicable, add screenshots to help explain your problem.

Relevant log files
ppps.log
system.log

Additional context
The interesting thing is that configuring the LTE connection on the command line does not cause this issue, it works without problems. To reproduce do:

  1. Install OPNsense 20.7 20.7-RC1
  2. Activate SSH access
  3. Create /var/etc/mpd_opt1-wernertest.conf with the contents of the following file (adjust "set modem device" and APN according to your modem / LTE provider): mpd_opt1-wernertest.conf.txt
  4. Execute:
# cp -a /usr/local/opnsense/scripts/interfaces/mpd.script /var/etc/
# /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_opt1-wernertest.conf -p /var/run/ppp_opt1-wernertest.pid -s ppp pppclient

Then it works.

Here are the log files of this manual connection process:
ppps-manual.log
system-manual.log

When I compare the entries in system.log when configuring via web interface compared to configuring via command line, I see following additionals entries (I think that they trigger the issue):
system-additional-when-configuring-via-web.txt

Environment
OPNsense OPNsense 20.7.r1-amd64
Intel(R) Celeron(R) CPU N3160 @ 1.60GHz (4 cores)
LTE Modem: "Quectel EG25-G" and "Huawei ME909u-521"

@tniedermeier
Copy link

This issue also affects OPNsense 20.1 installations with activated LTE connection if you do an upgrade to 20.7 RC1.
I changed the release type to development and after a reboot the version was 20.7.b_224.

Then I did the necessary upgrade steps via console:
opnsense-update -t opnsense-devel
opnsense-code core
cd /usr/core && make upgrade

After a reboot I went back to the web interface and unlocked the release upgrade.
This process ran fine and the firewall rebooted again.
Then it's processing the upgrade and after a couple of reboots it successfully upgraded to 20.7. RC1.
It booted the upgraded installation and due to the fact that the LTE interface is still configured and activated it crashes immediately at the end of the boot process.

So if you rely on a LTE connection as WAN interface, you shuldn't upgrade OPNsense 20.1 to 20.7 unless this issue is fixed.

@mimugmail
Copy link
Member

Do you have a screenshot of the crash dump? Usually this is only on the colsole/monitor ...

@tniedermeier
Copy link

I do not only have a screenshot, I made a video... I hope it helps.
Link to the video: https://files.thomas-krenn.com/index.php/s/JbpN95WTZRaQMPR

@tk-wfischer
Copy link
Author

Here you find the serial console output of a test I did with OPNsense 20.7.b_108:
console-test2.txt

In line 297 it starts with a warning when enabling the LTE interace - but like mentioned above this warning causes no problem, because we see it with OPNsense 20.1, too (and there we have no crash):
WARNING: attempt to domain_add(netgraph) after domainfinalize()

So the interesting part of the output starts from line 300 beginning:
Fatal trap 12: page fault while in kernel mode

I'll re-do the tests tomorrow morning with OPNsense 20.7-RC1 - but the output should be roughly the same.

@mimugmail
Copy link
Member

@fichtner do you think you can push a test kernel out?

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242406

@mimugmail
Copy link
Member

@tk-wfischer Ah, just saw the patch .. I'm building an image for you to test.

@mimugmail
Copy link
Member

@tk-wfischer @tniedermeier can you check this usb image (VGA): https://cloud.leute.server.de/index.php/s/tRXDanaS5vQWshp

@tk-wfischer
Copy link
Author

@mimugmail Wow - thank you so much. I'm downloading the image and will immediately start installing & testing. I should have a first feedback within 30-60 minutes.

@tk-wfischer
Copy link
Author

🥇 🥇 🥇 @mimugmail It seems you are now our new super-hero 🥇 🥇 🥇

In my current test system with a LES v3 and a Huawei ME909u-521 (device cuaUx.0) it worked to configured it via the web interface.

system.log shows me the following when I establish the connection:

Jul 23 05:29:39 OPNsense kernel: WARNING: attempt to domain_add(netgraph) after domainfinalize()
Jul 23 05:29:39 OPNsense kernel: ng0: changing name to 'ppp0'
Jul 23 05:29:41 OPNsense kernel: pflog0: promiscuous mode disabled
Jul 23 05:29:41 OPNsense kernel: pflog0: promiscuous mode enabled
Jul 23 05:29:43 OPNsense kernel: pflog0: promiscuous mode disabled
Jul 23 05:29:43 OPNsense kernel: pflog0: promiscuous mode enabled

ppps.log shows no entries, I'll debug further why I see nothing here.

But the very good news is this:

root@OPNsense:~ # ifconfig ppp0
ppp0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492
	inet 10.197.215.156 --> 10.64.64.0 netmask 0xffffffff
	inet6 fe80::230:18ff:fe06:e4d6%ppp0 prefixlen 64 scopeid 0x7
	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

I'll do further testing now, but from a first look at it, it seems the patch and your test images resolves the issue.

@fichtner @AdSchellevis @jschellevis Could you include this patch to the final 20.7 release? (As mentioned, I'll do further in-depth testing now and I should have definitive test results for all the different modems until 11.00am today)

@mimugmail
Copy link
Member

Good news! I'll report upstream to FreeBSD ...

@fichtner
Copy link
Member

Sure, I'll take it for 20.7, but I have to ask to retest just to be sure later today.

@fichtner fichtner self-assigned this Jul 23, 2020
@fichtner fichtner pinned this issue Jul 23, 2020
@fichtner fichtner unpinned this issue Jul 23, 2020
@fichtner fichtner transferred this issue from opnsense/core Jul 23, 2020
@fichtner fichtner added the upstream Third party issue label Jul 23, 2020
@fichtner fichtner changed the title LTE usage broken with 20.7-RC1 LTE usage broken with 20.7-RC1 (HBSD/FBSD 12.1) Jul 23, 2020
@fichtner
Copy link
Member

In any case thanks to all for figuring out the solution! 👍

@tk-wfischer
Copy link
Author

Good news! I'll report upstream to FreeBSD ...

@mimugmail As mentioned I'll do further re-testing also with the other modem brand, so we ensure that both Quectel and Huawei work. I also want to figure out, why ppps.log is empty. I'm testing and I give further feedback until 11.00am. Maybe you can wait until 11.00am to report to upstream FreeBSD so that we are 100% sure that everything is fine.

@fichtner Thank you for including it to 20.7. Sure we can do further tests - @tniedermeier and myself are doing the LTE testing - just let us know when there is some new image to test.

@tk-wfischer
Copy link
Author

Again good news, I did now a test with a LES compact 4L with the Quectel modem:

  1. Installed OPNsense using https://cloud.leute.server.de/index.php/s/tRXDanaS5vQWshp
  2. Activated SSH
  3. Configured LTE as described in https://www.thomas-krenn.com/de/wiki/OPNsense_LTE_Verbindung (I created the Point-to-Point connection and assigned ppp0 to the WAN interface)

Everything went fine, I also got entries in ppps.log:

So 👍 for Quectel modems, too. The patch really fixes the issue.

I keep on debugging why ppps.log was empty in my first test. I'll let you know once I have more details on that.

@fichtner
Copy link
Member

On 20.7.r1 try this....

# opnsense-update -kr 20.7-lte
# opnsense-shell reboot

Cheers,
Franco

@tniedermeier
Copy link

tniedermeier commented Jul 23, 2020

On 20.7.r1 try this....

# opnsense-update -kr 20.7-lte
# opnsense-shell reboot

Cheers,
Franco

I installed 20.7.r1 and issued the commands, configured the LTE connection in the webinterface, looks fine.
Great job, thank you @fichtner and @mimugmail so much!

Here my console output:

root@OPNsense:~ # opnsense-update -kr 20.7-lte
Fetching kernel-20.7-lte-amd64.txz: ........... done
!!!!!!!!!!!! ATTENTION !!!!!!!!!!!!!!!
! A critical upgrade is in progress. !
! Please do not turn off the system. !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Installing kernel-20.7-lte-amd64.txz... done
Please reboot.
root@OPNsense:~ # opnsense-shell reboot
The system will reboot. Do you want to proceed? [y/N]: y

@fichtner
Copy link
Member

Thanks @tk-wfischer @tniedermeier and @mimugmail ... commit is queued up for the final 20.7 build now on stable/20.7 ❤️

@mimugmail
Copy link
Member

@tk-wfischer @tniedermeier there was a change in upstream patch, can you please test this new image which provides a cleaner patch than the previos one:

https://cloud.leute.server.de/index.php/s/fUKKlzDarTmulBK

@tniedermeier
Copy link

Hi @mimugmail,
of course, I will test the new image and report the outcome!

Cheers,
Thomas

@tniedermeier
Copy link

Unfortunately after I configured and activated the LTE Interface with the Quectel Modem, OPNsense crashed and rebootet in a similar way like it did before you applied the first patch. I can collect the output from the serial console and attach it if it helps.

@mimugmail
Copy link
Member

@tniedermeier The crashdump would be nice to see the panic.

@mimugmail
Copy link
Member

mimugmail commented Jul 30, 2020

@tniedermeier forget it .. seems I did a mistake in patching .. you'll get a fresh image later ..

@fichtner fichtner reopened this Jul 30, 2020
@tk-wfischer
Copy link
Author

I have just checked the final 20.7 ISO taken from https://mirrors.dotsrc.org/opnsense/releases/20.7/OPNsense-20.7-OpenSSL-vga-amd64.img.bz2 (using link from https://forum.opnsense.org/index.php?topic=18314.0) and I have good news:

LTE is working on my LES compact 4L with the Quectel modem using the final OPNsense 20.7 which has been released yesterday on July, 30th. There is no crash when I activate it. 👍

@fichtner
Copy link
Member

fichtner commented Jul 31, 2020

@tk-wfischer thanks for confirming. The patch needs to be replaced in a later 20.7.x but for now we are happy this could be solved in time. Thanks again!

@mimugmail
Copy link
Member

Werner or Thomas, would one of you please give this one a shot please? https://cloud.leute.server.de/index.php/s/Vql9Px1HqUC66UO

@tk-wfischer
Copy link
Author

you are welcome :-)

@mimugmail: sure :-) I just wanted to write "just let us know in advance when you plan to replace the patch in a later 20.7.x and @tniedermeier and/or me will test it before it gets released." and just 1 second before I could click "Comment" you have written the answer already 🥇

We will let you know soon when we have tested your image.

PS @fichtner: you are welcome. And regarding the upgrade path 20.1 -> 20.7 with LTE: I plan to test the upgrade today at home (I have there an LES compact 4L with Quectel LTE modem, too, for my Internet connection.

@tniedermeier
Copy link

Werner or Thomas, would one of you please give this one a shot please? https://cloud.leute.server.de/index.php/s/Vql9Px1HqUC66UO

Hi @mimugmail I tested it successfully! No problems at all, very good!

@tk-wfischer
Copy link
Author

PS @fichtner: you are welcome. And regarding the upgrade path 20.1 -> 20.7 with LTE: I plan to test the upgrade today at home (I have there an LES compact 4L with Quectel LTE modem, too, for my Internet connection.

I have tested the upgrade process from 20.1 -> 20.7 with my LTE-based system: everything went fine, LTE keeps on working 👍

@fichtner
Copy link
Member

fichtner commented Aug 6, 2020

Please try the new patch via:

# opnsense-update -kr 20.7-lte2

Thanks,
Franco

@tniedermeier
Copy link

Please try the new patch via:

# opnsense-update -kr 20.7-lte2

Thanks,
Franco

Hi Franco,
thanks for the patch, of course I will test it! I'm not quite sure, but should I apply it onto a fresh installed 20.7 final release or should I upgrade from 20.1 and then apply the patch...

Best regards,
Thomas

@fichtner
Copy link
Member

fichtner commented Aug 7, 2020

If you can use fresh 20.7-RC1 image install and replace kernel accordingly. If it starts working we know the new patch is good and the old one does not interfere. 😊

@tniedermeier
Copy link

Hi @fichtner,
I installed 20.7 r1 and applied the patch 20.7-lte2, after the reboot I configured the LTE interface and activated it. No errors, no problems at all. I checked /var/log/system.log and also /var/log/ppps.log. I switched to the LTE connection and generated some traffic. Looks fine!

Thanks and best regards,
Thomas

@fichtner
Copy link
Member

fichtner commented Aug 7, 2020

Thanks for confirming. We will be adding this to the next kernel update which should be 20.7.1 by the looks of the FreeBSD security advisories released recently. ❤️

@fichtner fichtner closed this as completed Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Third party issue
Development

No branches or pull requests

4 participants