[odhcp6c] DHCPv6 client loses connection every 8 hours(after the second expiration of PD valid lifetime) on on Japan NTT 10G Hikari #13454

missing233 · 2023-09-14T16:54:03Z

Describe the bug

The issue seems like this

I'm using OpenWRT with NTT 10G Hikari (Flets Cross). I've encountered a specific issue regarding the DHCPv6 PD.
From the NTT DHCPv6 server, my DHCPv6 Client has been assigned a /56 PD prefix with the following lifetimes:

T1: 7200
T2: 10800
Preferred Lifetime: 12600
Valid Lifetime: 14400

The problem manifests itself at specific intervals — typically at the the 8-hour marks. Just when NTT's DHCPv6 server sends out a reconfigure message, within about a second, I get a notice:
'daemon.notice netifd: Interface 'wan6' has lost the connection'.
The connection will restore in about 10 seconds.

In the packet capture data during this process, it shows that at the 28,800-second mark (8 hours), the DHCPv6 server sent a Reconfigure Message to the client. Subsequently, within less than a second, the wan6 connection was lost. This was immediately followed by the Solicit Message and Request Message. Interestingly, the likelihood of encountering this issue at the 14,400-second mark (4 hours) is almost negligible. When the DHCPv6 server sends a Reconfigure Message at this time, odhcp6c successfully responds with a Renew Message. The issue is most likely to occur at the 8-hour mark, which is when the second valid lifetime expires.

My ISP uses an IPv4 over IPv6 protocol, requiring a special IPv6 address as a tunnel endpoint. This issue only pops up when I assign this particular IPv6 address to the IPv4 over IPv6 tunnel interface (by using option ip6ifaceid).

The ISP provided me with a router XG-100NE(OEM by NEC Platforms), it's rock solid. I've also done packet capture analysis on it. I've noticed that each Renew Message behaves differently from OpenWRT, and the response time seems to be a few seconds later than the ISP router.

OpenWRT:

Identity Association for Prefix Delegation
Option: Identity Association for Prefix Delegation (25)
Length: 41
IAID: 00000001
T1: 0
T2: 0
IA Prefix
Option: IA Prefix (26)
Length: 25
Preferred lifetime: 0
Valid lifetime: 0
Prefix length: 56
Prefix address: 2400:xxxx:xxxx:xx00::

ISP router:

Identity Association for Prefix Delegation
Option: Identity Association for Prefix Delegation (25)
Length: 41
IAID: 00000000
T1: 7200
T2: 10800
IA Prefix
Option: IA Prefix (26)
Length: 25
Preferred lifetime: 12600
Valid lifetime: 14400
Prefix length: 56
Prefix address: 2400:xxxx:xxxx:xx00::

OpenWrt version

r23763-46ed38adeb

OpenWrt target/subtarget

x86_64

Device

GoWin Solution R86S - Intel(R) Pentium(R) Silver N6005 @ 2.00GHz : 4 Core 4 Thread

Image kind

Official downloaded image

Steps to reproduce

Using Japan's NTT FLET'S HIKARI CROSS, set WAN6 as a DHCPv6 client and assign an ifaceid to the WAN6 interface.

Actual behaviour

After two Valid lifetime expirations, WAN6 lost connection.

Expected behaviour

Make odhcp6c work properly

Additional info

Alright, I've set up a crontab scheduled task to make odhcp6c send Renew Message to update the IPv6 lifetime every hour.
It seems to work, but I don't think it is a good solution.

Diffconfig

No response

Terms

I am reporting an issue for OpenWrt, not an unsupported fork.

The text was updated successfully, but these errors were encountered:

Z61p · 2023-09-23T08:55:48Z

I have also encountered the same phenomenon (Using Japan's NTT FLET'S HIKARI CROSS).
Until I found this issue, I was seriously considering whether I should give up on using OpenWRT and buy a YAMAHA RTX1300.
As a temporary solution, I will send odhcp6c send Renew Message using crontab...

cre8ivejp · 2024-01-27T17:11:45Z

@Z61p, hi.
I came here with the same issue.
I'm using the NTT 10G Hikari (MAP-E), too. Every day the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?

Thank you.

rany2 · 2024-01-27T17:49:26Z

@cre8ivejp I'm guessing here as there simply isn't enough information but it might be that you need to set noserverunicast.

My theory is that the server is advertising a unicast address which gets used when trying to renew. Setting noserverunicast will cause it to always use the multicast address even if ISP's servers advertise a unicast address and might fix your issue with renew.

missing233 · 2024-01-28T02:36:55Z

@cre8ivejp I'm guessing here as there simply isn't enough information but it might be that you need to set noserverunicast.

My theory is that the server is advertising a unicast address which gets used when trying to renew. Setting noserverunicast will cause it to always use the multicast address even if ISP's servers advertise a unicast address and might fix your issue with renew.

It didn't work
still happened on the 8th hour.

missing233 · 2024-01-28T02:37:43Z

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?

Thank you.

Try this script:
https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

cre8ivejp · 2024-01-28T10:14:33Z

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day, the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?
Thank you.

Try this script: https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

Thank you for the script.

When you lose the ipv6 connection, does it come back automatically, or is the only way to make it work again to kill the SIGUSR1 once?
I get the same warning message as you, but in my case, it comes back in 30-60 seconds after losing connection. But sometimes it happens when I'm gaming or having a video call, which is very annoying.

Before you shared your script, I used this kill -SIGUSR2 pidof odhcp6c, but this kills all the upstreams, which causes a 30-second downtime until it backs.

I executed yours kill -SIGUSR1 $(pgrep odhcp6c), and it doesn't kill the upstream, which is good. Also, I noticed when it is executed, it shows the same log once daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding ', but I think this is expected.

cre8ivejp · 2024-01-28T10:15:20Z

@cre8ivejp I'm guessing here as there simply isn't enough information but it might be that you need to set noserverunicast.

My theory is that the server is advertising a unicast address which gets used when trying to renew. Setting noserverunicast will cause it to always use the multicast address even if ISP's servers advertise a unicast address and might fix your issue with renew.

Thank you for the link! I'll take a look at it.
Did you work for you by setting that option?

rany2 · 2024-01-28T10:19:43Z

@cre8ivejp no but I figured it was worth a shot as this option is meant to fix issues similar to this with some servers (renewal seemingly failing). In this case, the issue is different as manually renewing does seem to fix it.

Z61p · 2024-01-28T13:57:45Z

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?

Thank you.

@cre8ivejp
I received a notification via email so I went to check it out, but it seems like it has already been resolved.
In my environment, I just put the following two lines in the scheduled task, and it cannot be called a script.

In my environment, the WAN down occurred 12 hours after odhcp6c acquired the address.
Since there is a high possibility that downtime and kill will overlap in 12 hours, I set the kill to be executed 2 hours in advance.
Since the time is fixed in the scheduled task, if OpenWRT is restarted, the schedule time is manually reset.

(example)
OpenWRT starts at 8:45 ⇒ 12 hours later, 2 hours ago, so execute kill at 18:45
Execute 12 hours later ⇒ Execute kill at 6:45

I have been using this configuration for about 4 months, and I have been able to avoid WAN failures.

*By the way, I am Japanese (lol). All my posts are translated by Google.
※なお、私は日本人です(笑)。私の投稿は全てGoogle翻訳でございます。

missing233 · 2024-01-28T17:47:04Z

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day, the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?
Thank you.

Try this script: https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

Thank you for the script.

When you lose the ipv6 connection, does it come back automatically, or is the only way to make it work again to kill the SIGUSR1 once? I get the same warning message as you, but in my case, it comes back in 30-60 seconds after losing connection. But sometimes it happens when I'm gaming or having a video call, which is very annoying.

Before you shared your script, I used this kill -SIGUSR2 pidof odhcp6c, but this kills all the upstreams, which causes a 30-second downtime until it backs.

I executed yours kill -SIGUSR1 $(pgrep odhcp6c), and it doesn't kill the upstream, which is good. Also, I noticed when it is executed, it shows the same log once daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding ', but I think this is expected.

My WAN6 links up within about 10 seconds after losing connection, and sending SIGUSR1 to odhcp6c does not result in a "No Binding" message.
Sending SIGUSR2 signal to odhcp6c means to send RELEASE and restart the state machine by sending SOLICIT messages, while SIGUSR1 sends a renew message.
I guess the different behaviors might be due to the varied configurations of DHCPv6 servers by NTT in different regions, even though they're all equally bad compared to ISPs in other countries. lol

cre8ivejp · 2024-01-29T12:41:01Z

@Z61p, hi. I came here with the same issue. I'm using the NTT 10G Hikari (MAP-E), too. Every day, the internet goes down. Did your temporary solution work? If so, can you share the script running in the crontab?
Thank you.

Try this script: https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

Thank you for the script.
When you lose the ipv6 connection, does it come back automatically, or is the only way to make it work again to kill the SIGUSR1 once? I get the same warning message as you, but in my case, it comes back in 30-60 seconds after losing connection. But sometimes it happens when I'm gaming or having a video call, which is very annoying.
Before you shared your script, I used this kill -SIGUSR2 pidof odhcp6c, but this kills all the upstreams, which causes a 30-second downtime until it backs.
I executed yours kill -SIGUSR1 $(pgrep odhcp6c), and it doesn't kill the upstream, which is good. Also, I noticed when it is executed, it shows the same log once daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding ', but I think this is expected.

My WAN6 links up within about 10 seconds after losing connection, and sending SIGUSR1 to odhcp6c does not result in a "No Binding" message. Sending SIGUSR2 signal to odhcp6c means to send RELEASE and restart the state machine by sending SOLICIT messages, while SIGUSR1 sends a renew message. I guess the different behaviors might be due to the varied configurations of DHCPv6 servers by NTT in different regions, even though they're all equally bad compared to ISPs in other countries. lol

Thanks for the explanation.

Every hour when it executes the SIGUSR1, it shows that message in my logs.
I've been checking since yesterday if the same message shows at different times in the logs, but so far, it only shows when the SIGUSR1 is executed and there are no losing connections.
As long as it doesn't lose connection anymore, I can live with those logs, lol.

Mon Jan 29 03:00:00 2024 cron.err crond[1564745]: USER root pid 1571009 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 03:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
Mon Jan 29 04:00:00 2024 cron.err crond[1564745]: USER root pid 1571838 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 04:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
Mon Jan 29 05:00:00 2024 cron.err crond[1564745]: USER root pid 1572818 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 05:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
Mon Jan 29 06:00:00 2024 cron.err crond[1564745]: USER root pid 1573641 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 06:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
Mon Jan 29 07:00:00 2024 cron.err crond[1564745]: USER root pid 1574480 cmd kill -SIGUSR1 $(pgrep odhcp6c)
Mon Jan 29 07:00:00 2024 daemon.warn odhcp6c[1562429]: Server returned IA_PD status 'No Binding '
...

missing233 added the bug issue report with a confirmed bug label Sep 14, 2023

missing233 changed the title ~~DHCPv6 PD lifetime on Japan NTT 10G Hikari~~ [odhcp6c] DHCPv6 PD lifetime on Japan NTT 10G Hikari Sep 14, 2023

missing233 changed the title ~~[odhcp6c] DHCPv6 PD lifetime on Japan NTT 10G Hikari~~ [odhcp6c] DHCPv6 client loses connection every 8 hours(after the second expiration of PD valid lifetime) on on Japan NTT 10G Hikari Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[odhcp6c] DHCPv6 client loses connection every 8 hours(after the second expiration of PD valid lifetime) on on Japan NTT 10G Hikari #13454

[odhcp6c] DHCPv6 client loses connection every 8 hours(after the second expiration of PD valid lifetime) on on Japan NTT 10G Hikari #13454

missing233 commented Sep 14, 2023 •

edited

Z61p commented Sep 23, 2023

cre8ivejp commented Jan 27, 2024

rany2 commented Jan 27, 2024

missing233 commented Jan 28, 2024

missing233 commented Jan 28, 2024

cre8ivejp commented Jan 28, 2024

cre8ivejp commented Jan 28, 2024 •

edited

rany2 commented Jan 28, 2024

Z61p commented Jan 28, 2024

missing233 commented Jan 28, 2024 •

edited

cre8ivejp commented Jan 29, 2024

[odhcp6c] DHCPv6 client loses connection every 8 hours(after the second expiration of PD valid lifetime) on on Japan NTT 10G Hikari #13454

[odhcp6c] DHCPv6 client loses connection every 8 hours(after the second expiration of PD valid lifetime) on on Japan NTT 10G Hikari #13454

Comments

missing233 commented Sep 14, 2023 • edited

Describe the bug

OpenWrt version

OpenWrt target/subtarget

Device

Image kind

Steps to reproduce

Actual behaviour

Expected behaviour

Additional info

Diffconfig

Terms

Z61p commented Sep 23, 2023

cre8ivejp commented Jan 27, 2024

rany2 commented Jan 27, 2024

missing233 commented Jan 28, 2024

missing233 commented Jan 28, 2024

cre8ivejp commented Jan 28, 2024

cre8ivejp commented Jan 28, 2024 • edited

rany2 commented Jan 28, 2024

Z61p commented Jan 28, 2024

missing233 commented Jan 28, 2024 • edited

cre8ivejp commented Jan 29, 2024

missing233 commented Sep 14, 2023 •

edited

cre8ivejp commented Jan 28, 2024 •

edited

missing233 commented Jan 28, 2024 •

edited