Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#227 - VLAN support mismatch between preinit and default network config #5576

Open
openwrt-bot opened this issue Oct 14, 2016 · 16 comments
Open
Labels

Comments

@openwrt-bot
Copy link

@openwrt-bot openwrt-bot commented Oct 14, 2016

acarlo:

PPPoE is broken on WRT1900ACS

Upgraded from Lede r578 to latest Lede r1814 and PPPOE doesn't work anymore altough the pppd version and PPPoE version are the same:

  • Linksys WRT1900ACS
  • LEDE reboot r1814

pppd debug log:

Plugin rp-pppoe.so loaded.
RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Send PPPOE Discovery V1T1 PADI session 0x0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Send PPPOE Discovery V1T1 PADI session 0x0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Send PPPOE Discovery V1T1 PADI session 0x0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Timeout waiting for PADO packets
Unable to complete PPPoE Discovery
Plugin rp-pppoe.so loaded.
RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Send PPPOE Discovery V1T1 PADI session 0x0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Send PPPOE Discovery V1T1 PADI session 0x0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Send PPPOE Discovery V1T1 PADI session 0x0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]

While on the same hardware running LEDE r578, the PPPoE module works as expected:

Plugin rp-pppoe.so loaded.
RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Send PPPOE Discovery V1T1 PADI session 0x0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 40
dst c2:56:27:ca:d7:d4 src a0:f3:e4:34:d8:21
[service-name] [AC-name acc-aln1.hac] [AC-cookie 75 58 37 a5 ba 3c e4 a5 2a 61 bb 23 92 5c 1b dc]
Send PPPOE Discovery V1T1 PADR session 0x0 length 24
dst a0:f3:e4:34:d8:21 src c2:56:27:ca:d7:d4
[service-name] [AC-cookie 75 58 37 a5 ba 3c e4 a5 2a 61 bb 23 92 5c 1b dc]
Recv PPPOE Discovery V1T1 PADS session 0x30b length 4
dst c2:56:27:ca:d7:d4 src a0:f3:e4:34:d8:21
[service-name]
PADS: Service-Name: ''
PPP session is 779
Connected to a0:f3:e4:34:d8:21 via interface eth0
using channel 2
Using interface pppoe-wan
Connect: pppoe-wan <--> eth0
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0xc6952556>]
rcvd [LCP ConfReq id=0x66 <mru 1492> <magic 0x4cc73648>]
sent [LCP ConfAck id=0x66 <mru 1492> <magic 0x4cc73648>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0xc6952556>]
sent [LCP EchoReq id=0x0 magic=0xc6952556]
rcvd [CHAP Challenge id=0x1 <7131a44524d1de8f1cd1061cac6d8c071d8bfe7351bc4ea7bd08f56684428475f229ba177a192696ebab32>, name = "acc-aln1.hac"]
sent [CHAP Response id=0x1 <4bb1a418b298790b128ad4d7ef3109ad>, name = "bthomehub@btbroadband.com"]
rcvd [LCP EchoRep id=0x0 magic=0x4cc73648]
rcvd [CHAP Success id=0x1 "CHAP authentication success"]
CHAP authentication succeeded: CHAP authentication success
CHAP authentication succeeded
peer from calling number A0:F3:E4:34:D8:21 authorized
sent [IPCP ConfReq id=0x1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
sent [IPV6CP ConfReq id=0x1 ]
rcvd [IPV6CP ConfReq id=0x7b ]
sent [IPV6CP ConfAck id=0x7b ]
rcvd [IPCP ConfReq id=0x38 <addr 172.16.12.12>]
sent [IPCP ConfAck id=0x38 <addr 172.16.12.12>]
rcvd [IPCP ConfNak id=0x1 <addr 81.146.2.155> <ms-dns1 81.139.57.100> <ms-dns2 81.139.56.100>]
sent [IPCP ConfReq id=0x2 <addr 81.146.2.155> <ms-dns1 81.139.57.100> <ms-dns2 81.139.56.100>]
rcvd [IPV6CP ConfAck id=0x1 ]
local LL address fe80::c595:37d1:3987:1929
remote LL address fe80::0221:05ff:feb4:8824
Script /lib/netifd/ppp-up started (pid 2646)
rcvd [IPCP ConfAck id=0x2 <addr 81.146.2.155> <ms-dns1 81.139.57.100> <ms-dns2 81.139.56.100>]
local IP address 81.146.2.155
remote IP address 172.16.12.12
primary DNS address 81.139.57.100
secondary DNS address 81.139.56.100
ppp.log
secondary DNS address 81.139.56.100
Script /lib/netifd/ppp-up started (pid 2653)
Script /lib/netifd/ppp-up finished (pid 2646), status = 0x9
Script /lib/netifd/ppp-up finished (pid 2653), status = 0x9

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 14, 2016

mkresin:

Would you please attach/paste your /e/c/network! Do you have to vlan tag your PPPoE traffic? Which ISP?

Are you able to compile your own image? It would be helpful if you can do a git bisect to find the commit which broke PPPoE on your WRT1900ACS.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 14, 2016

acarlo:

This is the intrface config for the pppoe traffic:

config interface 'wan'
option ifname 'eth0'
option proto 'pppoe'
option username 'bthomehub@btbroadband.com'
option password 'bt'
option timeout '10'

I use the same config for the workign and not working LEDE build.
The provider is BT in UK.

Yes I do build my own image but I am not familiar with git bisect, I will check how to use it and come back on this point.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 14, 2016

mkresin:

Would you please provide your complete /e/c/network!

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 14, 2016

acarlo:

Full network:

root@OpenWrt:/etc/config# cat network

config interface 'loopback'
option ifname 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'

config globals 'globals'
option ula_prefix 'fd7b:f926:6250::/48'

config interface 'lan'
option type 'bridge'
option proto 'static'
option netmask '255.255.255.0'
option ip6assign '60'
option ipaddr '192.168.20.254'
option igmp_snooping '1'
option _orig_ifname 'eth1 wlan0 wlan1'
option _orig_bridge 'true'
option ifname 'eth1 eth2'

config interface 'wan'
option ifname 'eth0'
option proto 'pppoe'
option username 'bthomehub@btbroadband.com'
option password 'bt'
option timeout '10'

config interface 'wan6'
option ifname 'eth0'
option proto 'dhcpv6'

config interface 'iptv'
option ifname 'eth0'
option proto 'static'
option ipaddr '10.22.22.1'
option netmask '255.255.255.0'

config interface 'vpn0'
option ifname 'tun0'
option proto 'none'
option auto '1'

config interface 'guest'
option _orig_ifname 'radio1.network2'
option _orig_bridge 'false'
option proto 'static'
option ipaddr '192.168.99.254'
option netmask '255.255.255.0'

root@OpenWrt:/etc/config#

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 14, 2016

acarlo:

Just found this topic on Openwrt board:

https://forum.openwrt.org/viewtopic.php?pid=335168#p335168

From the topic:
(BTW: R1297 is running ok, so must be a change of the last week)
edit 1: This seems to be the only change to the PPP package: https://git.lede-project.org/?p=source. … 344006173)
edit 2: just reverted that change and rebuild the setup, still not working so it must be collateral damage from something else.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 14, 2016

mkresin:

Nice finding.

The as working reported version r1297 has the git commit hash 4e8c6f3

The report in the forum is from 2016-08-20. The last commit of this date has the commit hash 35be928 (r1396).

First step would be to test these both commits, to make sure that r1297 works and r1396 is really broken.

$ git checkout master $ git checkout 4e8c6f340751c66a602b98b727af28b2a9004313 $ make dirclean $ make menuconfig $ make

the same with 35be928

If you have a good and a bad version you can use git bisect (git bisect start ):

$ git checkout master $ git bisect start 35be9284668d19a565d354a33febb508b0e28131 4e8c6f340751c66a602b98b727af28b2a9004313 $ make dirclean $ make menuconfig $ make $ git bisect good OR git bisect bad $ make dirclean $ make menuconfig $ make $ git bisect good OR git bisect bad ...

In the end, git bisect will tell you which commit introduced the regression.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 15, 2016

acarlo:

here you go:

carlo@ubuntu:/source$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[c18edce] base-files: add preinit ifname detection based on board.json
carlo@ubuntu:
/source$

Important: while testing the builds, I had some of them that would build without errors but didn't let the router to boot, so I marked them as bad.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 15, 2016

mkresin:

Good job!

Would you please apply the attached patch on top of the latest git and check if the issue is gone. The patch is not a fix! It's just to confirm that [[https://git.lede-project.org/c18edcec4500008a1dabf0b017322eb23b059c58|c18edcec4500008a1dabf0b017322eb23b059c58]] is really the cause of your issue.

$ git checkout master $ patch -p1 < fs227_confirmation.patch

to confirm that the patch is applied successfully

$ git diff

build and test image

$ make dirclean
$ make menuconfig
$ make

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 15, 2016

acarlo:

it works :)
I applied the patch to this build: LEDE Reboot (HEAD, r1845) and got finally the pppoe connection back :)

Thanks for your help, hopefully we will get a permanent fix in the trunk (soon) :)

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 16, 2016

mkresin:

Please attach the output of the following commands from a working image and from a not working image:

$ dmesg $ swconfig dev switch0 show $ cat /etc/board.json $ cat /etc/config/network $ for iface in $(ls /sys/class/net/);do echo "${iface}: $(cat /sys/class/net/${iface}/carrier)";done

PLease do not keep your settings during test.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 16, 2016

Johnnysl:

FYI: That message on the openwrt forum was mine.
I could eventually trace it to the changes done to enable vlans on the Switch by default, while my config didn't really use those.
After wiping my /etc/config/network, rebooting, reconfiguring from scratch based on switch vlans, everything started to work again.
PPPOE is still quite slow though, taking often multiple attemps in a couple of minutes to log in.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 16, 2016

mkresin:

According to the code, the vlans were set up already before c18edce.

But since commit c18edce vlans are enabled in failsafe/preinit as well. This might cause some unexpected side effects on mvebu boards, since they never had support for failsafe (which is really bad).

Due to your remark regarding a changed vlan config, I've updated the post where I'm asking for some output.

As a general not, please report bugs here and do not hide them in the forum. To my knowledge no dev is monitoring the forum for bugs reports

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 16, 2016

Johnnysl:

Usually i like to understand if it is me, or a bug. Don't want to clutter this page with all issues i run into.
Due to nobody complaining, and me "fixing" it with a reconfigure of /etc/config/network i assumed it was not a real bug...

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 16, 2016

acarlo:

attached there is the commands' output for the same build (working and not working version)

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 16, 2016

mkresin:

Usually i like to understand if it is me, or a bug. Don't want to clutter this page with all issues i run into.
Due to nobody complaining, and me "fixing" it with a reconfigure of /etc/config/network i assumed it was not a real bug...

Thanks for that! Indeed, that is the way to go and not to spam the bugtracker with support requests.

attached there is the commands' output for the same build (working and not working version)

Okay, now I can see the real issue.

It's a bug in "set up vlans in preinit/failsafe" which is revealed by a config that differs from the default network config.

During preinit vlan support is enabled ("enable_vlan: 1" in swconfig output) since it is (now) the default for the board, but the vlan support is not disabled afterwards. Since your /e/c/network misses the vlan part, it can neither disable vlan support nor setup the desired vlan config on it's own.

That your lan interfaces are working is more luck than expected.

For now, the best is to disable vlan support after boot. Everything should work after that using an unmodified LEDE image:

swconfig dev switch0 set enable_vlan 0 swconfig dev switch0 set apply

I will try to get in contact with the author of this change to discuss the issue. I'm not interested to commit a fix which possibly introduces a new bug.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Jul 8, 2018

psyborg:

your ticket break spacing on 1280x800 screens. also i don't see a point in using tags...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant