Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[odhcp6c] DHCPv6 client loses connection every 8 hours(after the second expiration of PD valid lifetime) on on Japan NTT 10G Hikari #13454

Open
1 task done
missing233 opened this issue Sep 14, 2023 · 11 comments
Labels
bug issue report with a confirmed bug

Comments

@missing233
Copy link
Contributor

missing233 commented Sep 14, 2023

Describe the bug

The issue seems like this

I'm using OpenWRT with NTT 10G Hikari (Flets Cross). I've encountered a specific issue regarding the DHCPv6 PD.
From the NTT DHCPv6 server, my DHCPv6 Client has been assigned a /56 PD prefix with the following lifetimes:

T1: 7200
T2: 10800
Preferred Lifetime: 12600
Valid Lifetime: 14400

The problem manifests itself at specific intervals — typically at the the 8-hour marks. Just when NTT's DHCPv6 server sends out a reconfigure message, within about a second, I get a notice:
'daemon.notice netifd: Interface 'wan6' has lost the connection'.
The connection will restore in about 10 seconds.

In the packet capture data during this process, it shows that at the 28,800-second mark (8 hours), the DHCPv6 server sent a Reconfigure Message to the client. Subsequently, within less than a second, the wan6 connection was lost. This was immediately followed by the Solicit Message and Request Message. Interestingly, the likelihood of encountering this issue at the 14,400-second mark (4 hours) is almost negligible. When the DHCPv6 server sends a Reconfigure Message at this time, odhcp6c successfully responds with a Renew Message. The issue is most likely to occur at the 8-hour mark, which is when the second valid lifetime expires.

My ISP uses an IPv4 over IPv6 protocol, requiring a special IPv6 address as a tunnel endpoint. This issue only pops up when I assign this particular IPv6 address to the IPv4 over IPv6 tunnel interface (by using option ip6ifaceid).

The ISP provided me with a router XG-100NE(OEM by NEC Platforms), it's rock solid. I've also done packet capture analysis on it. I've noticed that each Renew Message behaves differently from OpenWRT, and the response time seems to be a few seconds later than the ISP router.

OpenWRT:

Identity Association for Prefix Delegation
Option: Identity Association for Prefix Delegation (25)
Length: 41
IAID: 00000001
T1: 0
T2: 0
IA Prefix
Option: IA Prefix (26)
Length: 25
Preferred lifetime: 0
Valid lifetime: 0
Prefix length: 56
Prefix address: 2400:xxxx:xxxx:xx00::

ISP router:

Identity Association for Prefix Delegation
Option: Identity Association for Prefix Delegation (25)
Length: 41
IAID: 00000000
T1: 7200
T2: 10800
IA Prefix
Option: IA Prefix (26)
Length: 25
Preferred lifetime: 12600
Valid lifetime: 14400
Prefix length: 56
Prefix address: 2400:xxxx:xxxx:xx00::

OpenWrt version

r23763-46ed38adeb

OpenWrt target/subtarget

x86_64

Device

GoWin Solution R86S - Intel(R) Pentium(R) Silver N6005 @ 2.00GHz : 4 Core 4 Thread

Image kind

Official downloaded image

Steps to reproduce

Using Japan's NTT FLET'S HIKARI CROSS, set WAN6 as a DHCPv6 client and assign an ifaceid to the WAN6 interface.

Actual behaviour

After two Valid lifetime expirations, WAN6 lost connection.

Expected behaviour

Make odhcp6c work properly

Additional info

Alright, I've set up a crontab scheduled task to make odhcp6c send Renew Message to update the IPv6 lifetime every hour.
It seems to work, but I don't think it is a good solution.

Diffconfig

No response

Terms

  • I am reporting an issue for OpenWrt, not an unsupported fork.
@missing233 missing233 added the bug issue report with a confirmed bug label Sep 14, 2023
@missing233 missing233 changed the title DHCPv6 PD lifetime on Japan NTT 10G Hikari [odhcp6c] DHCPv6 PD lifetime on Japan NTT 10G Hikari Sep 14, 2023
@missing233 missing233 changed the title [odhcp6c] DHCPv6 PD lifetime on Japan NTT 10G Hikari [odhcp6c] DHCPv6 client loses connection every 8 hours(after the second expiration of PD valid lifetime) on on Japan NTT 10G Hikari Sep 14, 2023
@Z61p
Copy link

Z61p commented Sep 23, 2023

I have also encountered the same phenomenon (Using Japan's NTT FLET'S HIKARI CROSS).
Until I found this issue, I was seriously considering whether I should give up on using OpenWRT and buy a YAMAHA RTX1300.
As a temporary solution, I will send odhcp6c send Renew Message using crontab...

@cre8ivejp
Copy link

@Z61p, hi.
I came here with the same issue.
I'm using the NTT 10G Hikari (MAP-E), too. Every day the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?

Thank you.

@rany2
Copy link
Contributor

rany2 commented Jan 27, 2024

@cre8ivejp I'm guessing here as there simply isn't enough information but it might be that you need to set noserverunicast.

My theory is that the server is advertising a unicast address which gets used when trying to renew. Setting noserverunicast will cause it to always use the multicast address even if ISP's servers advertise a unicast address and might fix your issue with renew.

@missing233
Copy link
Contributor Author

@cre8ivejp I'm guessing here as there simply isn't enough information but it might be that you need to set noserverunicast.

My theory is that the server is advertising a unicast address which gets used when trying to renew. Setting noserverunicast will cause it to always use the multicast address even if ISP's servers advertise a unicast address and might fix your issue with renew.

It didn't work
still happened on the 8th hour.

@missing233
Copy link
Contributor Author

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?

Thank you.

Try this script:
https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

@cre8ivejp
Copy link

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day, the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?
Thank you.

Try this script: https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

Thank you for the script.

When you lose the ipv6 connection, does it come back automatically, or is the only way to make it work again to kill the SIGUSR1 once?
I get the same warning message as you, but in my case, it comes back in 30-60 seconds after losing connection. But sometimes it happens when I'm gaming or having a video call, which is very annoying.

Before you shared your script, I used this kill -SIGUSR2 pidof odhcp6c, but this kills all the upstreams, which causes a 30-second downtime until it backs.

I executed yours kill -SIGUSR1 $(pgrep odhcp6c), and it doesn't kill the upstream, which is good. Also, I noticed when it is executed, it shows the same log once daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding ', but I think this is expected.

@cre8ivejp
Copy link

cre8ivejp commented Jan 28, 2024

@cre8ivejp I'm guessing here as there simply isn't enough information but it might be that you need to set noserverunicast.

My theory is that the server is advertising a unicast address which gets used when trying to renew. Setting noserverunicast will cause it to always use the multicast address even if ISP's servers advertise a unicast address and might fix your issue with renew.

Thank you for the link! I'll take a look at it.
Did you work for you by setting that option?

@rany2
Copy link
Contributor

rany2 commented Jan 28, 2024

@cre8ivejp no but I figured it was worth a shot as this option is meant to fix issues similar to this with some servers (renewal seemingly failing). In this case, the issue is different as manually renewing does seem to fix it.

@Z61p
Copy link

Z61p commented Jan 28, 2024

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?

Thank you.

@cre8ivejp
I received a notification via email so I went to check it out, but it seems like it has already been resolved.
In my environment, I just put the following two lines in the scheduled task, and it cannot be called a script.

In my environment, the WAN down occurred 12 hours after odhcp6c acquired the address.
Since there is a high possibility that downtime and kill will overlap in 12 hours, I set the kill to be executed 2 hours in advance.
Since the time is fixed in the scheduled task, if OpenWRT is restarted, the schedule time is manually reset.

(example)
OpenWRT starts at 8:45 ⇒ 12 hours later, 2 hours ago, so execute kill at 18:45
Execute 12 hours later ⇒ Execute kill at 6:45

I have been using this configuration for about 4 months, and I have been able to avoid WAN failures.

*By the way, I am Japanese (lol). All my posts are translated by Google.
※なお、私は日本人です(笑)。私の投稿は全てGoogle翻訳でございます。

@missing233
Copy link
Contributor Author

missing233 commented Jan 28, 2024

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day, the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?
Thank you.

Try this script: https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

Thank you for the script.

When you lose the ipv6 connection, does it come back automatically, or is the only way to make it work again to kill the SIGUSR1 once? I get the same warning message as you, but in my case, it comes back in 30-60 seconds after losing connection. But sometimes it happens when I'm gaming or having a video call, which is very annoying.

Before you shared your script, I used this kill -SIGUSR2 pidof odhcp6c, but this kills all the upstreams, which causes a 30-second downtime until it backs.

I executed yours kill -SIGUSR1 $(pgrep odhcp6c), and it doesn't kill the upstream, which is good. Also, I noticed when it is executed, it shows the same log once daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding ', but I think this is expected.

My WAN6 links up within about 10 seconds after losing connection, and sending SIGUSR1 to odhcp6c does not result in a "No Binding" message.
Sending SIGUSR2 signal to odhcp6c means to send RELEASE and restart the state machine by sending SOLICIT messages, while SIGUSR1 sends a renew message.
I guess the different behaviors might be due to the varied configurations of DHCPv6 servers by NTT in different regions, even though they're all equally bad compared to ISPs in other countries. lol

@cre8ivejp
Copy link

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day, the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?
Thank you.

Try this script: https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

Thank you for the script.
When you lose the ipv6 connection, does it come back automatically, or is the only way to make it work again to kill the SIGUSR1 once? I get the same warning message as you, but in my case, it comes back in 30-60 seconds after losing connection. But sometimes it happens when I'm gaming or having a video call, which is very annoying.
Before you shared your script, I used this kill -SIGUSR2 pidof odhcp6c, but this kills all the upstreams, which causes a 30-second downtime until it backs.
I executed yours kill -SIGUSR1 $(pgrep odhcp6c), and it doesn't kill the upstream, which is good. Also, I noticed when it is executed, it shows the same log once daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding ', but I think this is expected.

My WAN6 links up within about 10 seconds after losing connection, and sending SIGUSR1 to odhcp6c does not result in a "No Binding" message. Sending SIGUSR2 signal to odhcp6c means to send RELEASE and restart the state machine by sending SOLICIT messages, while SIGUSR1 sends a renew message. I guess the different behaviors might be due to the varied configurations of DHCPv6 servers by NTT in different regions, even though they're all equally bad compared to ISPs in other countries. lol

Thanks for the explanation.

Every hour when it executes the SIGUSR1, it shows that message in my logs.
I've been checking since yesterday if the same message shows at different times in the logs, but so far, it only shows when the SIGUSR1 is executed and there are no losing connections.
As long as it doesn't lose connection anymore, I can live with those logs, lol.

Mon Jan 29 03:00:00 2024 cron.err crond[1564745]: USER root pid 1571009 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 03:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
Mon Jan 29 04:00:00 2024 cron.err crond[1564745]: USER root pid 1571838 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 04:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
Mon Jan 29 05:00:00 2024 cron.err crond[1564745]: USER root pid 1572818 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 05:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
Mon Jan 29 06:00:00 2024 cron.err crond[1564745]: USER root pid 1573641 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 06:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
Mon Jan 29 07:00:00 2024 cron.err crond[1564745]: USER root pid 1574480 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 07:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issue report with a confirmed bug
Projects
None yet
Development

No branches or pull requests

4 participants