Prevent concurrent instances of dhcp6c #8467

bensmithurst · 2025-03-21T21:06:27Z

After an unexpected reboot due to a power cut earlier I noticed some dhcp6c errors in the log about XID mismatch, however IPv6 was working fine. On closer inspection it turns out there are 2 dhcp6c processes running, both started at the same time by rtsold.

That script does check for an existing pid file and if found, sends SIGHUP to an existing instance, however it seems as if there is a race condition where if the two runs of rtsold_script.sh are close enough, two instances of dhcp6c are started.

Avoiding this seems to be as simple as wrapping dhcp6c inside a flock command.

Log snippet showing 2 instances being started:

<13>1 2025-03-21T10:48:53+00:00 HOSTNAME dhcp6c 16941 - [meta sequenceId="280"] RTSOLD script - Starting dhcp6c daemon
<29>1 2025-03-21T10:48:53+00:00 HOSTNAME dhcp6c 17028 - [meta sequenceId="281"] extracted an existing DUID from /var/db/dhcp6c_duid: ****
<29>1 2025-03-21T10:48:53+00:00 HOSTNAME dhcp6c 17028 - [meta sequenceId="282"] <3>[interface] (9)
<29>1 2025-03-21T10:48:53+00:00 HOSTNAME dhcp6c 17028 - [meta sequenceId="283"] <5>[igb0] (4)
<13>1 2025-03-21T10:48:53+00:00 HOSTNAME dhcp6c 18091 - [meta sequenceId="284"] RTSOLD script - Starting dhcp6c daemon
<29>1 2025-03-21T10:48:53+00:00 HOSTNAME dhcp6c 17028 - [meta sequenceId="285"] <3>begin of closure [{] (1)

This unsurprisingly leads to interleaving of requests, and the wrong instance reading a response to the other instance's requests, leading to the mismatches:

<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="484"] Sending Solicit
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="485"] a new XID (f5b51) is generated
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="486"] set client ID (len 14)
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="487"] set elapsed time (len 2)
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="488"] set option request (len 4)
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="489"] set IA_PD prefix
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="490"] set IA_PD
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="491"] send solicit to ff02::1:2%igb0
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="492"] reset a timer on igb0, state=SOLICIT, timeo=0, retrans=1052
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="493"] receive advertise from fe80::****%igb0 on igb0
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="494"] get DHCP option server ID, len 10
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="495"]   DUID: ****
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="496"] get DHCP option client ID, len 14
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="497"]   DUID: ****
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="498"] get DHCP option IA_PD, len 41
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="499"]   IA_PD: ID=0, T1=1800, T2=2880
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="500"] get DHCP option IA_PD prefix, len 25
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="501"]   IA_PD prefix: 2a0e:***::/56 pltime=3600 vltime=3600
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="502"] get DHCP option DNS, len 32
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="503"] XID mismatch
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="504"] Sending Solicit
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="505"] a new XID (ba5ecf) is generated
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="506"] set client ID (len 14)
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="507"] set elapsed time (len 2)
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="508"] set option request (len 4)
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="509"] set IA_PD prefix
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="510"] set IA_PD
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="511"] send solicit to ff02::1:2%igb0
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="512"] reset a timer on igb0, state=SOLICIT, timeo=0, retrans=1020
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="513"] receive advertise from fe80::****%igb0 on igb0
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="514"] get DHCP option server ID, len 10
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="515"]   DUID: ****
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="516"] get DHCP option client ID, len 14
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="517"]   DUID: ****
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="518"] get DHCP option IA_PD, len 41
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="519"]   IA_PD: ID=0, T1=1800, T2=2880
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="520"] get DHCP option IA_PD prefix, len 25
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="521"]   IA_PD prefix: 2a0e:***::/56 pltime=3600 vltime=3600
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="522"] get DHCP option DNS, len 32
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="523"] server ID: ****, pref=-1
<29>1 2025-03-21T10:48:54+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="524"] reset timer for igb0 to 0.995808
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="525"] Sending Solicit
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="526"] set client ID (len 14)
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="527"] set elapsed time (len 2)
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="528"] set option request (len 4)
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="529"] set IA_PD prefix
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="530"] set IA_PD
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="531"] send solicit to ff02::1:2%igb0
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 18518 - [meta sequenceId="532"] reset a timer on igb0, state=SOLICIT, timeo=1, retrans=2008
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="533"] receive advertise from fe80::****%igb0 on igb0
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="534"] get DHCP option server ID, len 10
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="535"]   DUID: ****
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="536"] get DHCP option client ID, len 14
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="537"]   DUID: ****
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="538"] get DHCP option IA_PD, len 41
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="539"]   IA_PD: ID=0, T1=1800, T2=2880
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="540"] get DHCP option IA_PD prefix, len 25
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="541"]   IA_PD prefix: 2a0e:***::/56 pltime=3600 vltime=3600
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="542"] get DHCP option DNS, len 32
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="543"] XID mismatch
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="544"] picked a server (ID: ****)
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="545"] Sending Request
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="546"] a new XID (68923e) is generated
<29>1 2025-03-21T10:48:55+00:00 HOSTNAME dhcp6c 19076 - [meta sequenceId="547"] set client ID (len 14)

On my test instance I haven't seen the 2 instances happen for real but I have been able to simulate it to test this fix using

/var/etc/rtsold_script.sh em0 & /var/etc/rtsold_script.sh em0 &

which leads to the same XID mismatch error so seems to be representative of what can happen for real. When rebooting my test instance I have seen logs showing the double instance protection is kicking in though.

<13>1 2025-03-21T20:00:01+00:00 HOSTNAME dhcp6c 6143 - [meta sequenceId="265"] RTSOLD script - Starting dhcp6c daemon
<13>1 2025-03-21T20:00:01+00:00 HOSTNAME dhcp6c 9650 - [meta sequenceId="334"] RTSOLD script - Starting dhcp6c daemon
<13>1 2025-03-21T20:00:01+00:00 HOSTNAME dhcp6c 15501 - [meta sequenceId="335"] RTSOLD script - Failed to start dhcp6c daemon

After this I do see a single dhcp6c running as expected and IPv6 connectivity comes up as normal.

I'm not sure how/why the 2 instances of dhcp6c happened and haven't dug into the rtsold source to see exactly how it runs the wrapper script, but this seems to me to be a useful protection against two instances running.

This avoids two instances running simultaneously due to a potential race condition where rtsold may run rtsold_script twice very close together.

fichtner · 2025-03-21T21:57:23Z

Sounds like #8431 but also that something in 25.1 caused this as we haven’t had this for a long time. The surface timing is odd, either due to FreeBSD 14.2 or the rtsold no delay option 7fe558c

bensmithurst · 2025-03-21T22:57:19Z

I had a quick look and I couldn't see any changes in rtsold upstream.

The two execs here in interface_dhcpv6_configure could be a problem, perhaps? Starting rtsold which will likely run rtsold_script.sh almost immediately and then immediately also running rtsold_script.sh directly.

fichtner · 2025-04-15T09:58:51Z

src/etc/inc/interfaces.inc

+    $dhcp6ccommand = exec_safe('flock -nF /var/run/dhcp6c.lock /usr/local/sbin/dhcp6c -c %s -p %s', [
            '/var/etc/dhcp6c.conf',
            '/var/run/dhcp6c.pid',


Suggested change

$dhcp6ccommand = exec_safe('flock -nF /var/run/dhcp6c.lock /usr/local/sbin/dhcp6c -c %s -p %s', [

'/var/etc/dhcp6c.conf',

'/var/run/dhcp6c.pid',

$dhcp6ccommand = exec_safe('/usr/local/bin/flock -n -E 0 -o %s /usr/local/sbin/dhcp6c -c %s -p %s', [

'/var/etc/dhcp6c.conf',

'/var/etc/dhcp6c.conf',

'/var/run/dhcp6c.pid',

I don't see much use erroring when returning from not being able to lock since another process will do it for us.

hmm, the -E 0 change seems safe, I thought I did have to use -F to make it work though, which I see you've removed here, I'll have to look again when I have chance. I can't quite remember why I needed to use -F.

I think we need the fork to release the lock properly, otherwise the process keeps it (see -o)

with -F the dhcp6c process holds the lock but it will be implicitly released when it exits, which is what we need. Although I guess your way works fine, just means the flock process holds the lock instead. I suppose your way is safer in case an update to dhcp6c makes it close all file descriptors when it starts or something like that.

I think I was trying to avoid the case where someone kills the flock process, meaning the lock is released, even though dhcp6c might still be running. But admins doing silly things is perhaps not something we can completely defend against.

dhcp6c shouldn't own the lock since we just want to avoid the start race. letting it keep the lock going into zombie mode would be problematic and at the time of the daemon()/fork() call in dhcp6c process the race should not be there anymore. After that it chugs along for the most part being handled by a "safer" SIGHUP.

I have added your change but for the record from my testing I can still - occasionally - reproduce the race condition using your suggested change. I take your point about dhcp6c itself not holding the lock, but the problem with your change from what I can see in my tests that dhcp6c will fork into the background and exit, and flock then exits, which will release the lock, before the second run has tried to get the lock, leading to a possible race condition. With my original change I can't ever trigger a race, so far.

I think this PR in its current state is a definite improvement but it does feel like there's still a small chance of the race condition lurking. Maybe adding some locking into the opnsense/dhcp6c repo would be useful? I don't mind looking at that, when I have some time.

I agree that it would be safer to handle this in dhcp6c. We could also remove the immediate flag but the race is still there either way as you indicate and since we have the source code that would be best.

I have created opnsense/dhcp6c#40 now. If you think this is the right overall approach I'll update this PR to just use the new dhcp6c -l parameter.

src/etc/inc/interfaces.inc

fichtner · 2025-04-15T10:01:19Z

I had a quick look and I couldn't see any changes in rtsold upstream.

The two execs here in interface_dhcpv6_configure could be a problem, perhaps? Starting rtsold which will likely run rtsold_script.sh almost immediately and then immediately also running rtsold_script.sh directly.

I specifically meant

% git grep solicit.imme
business/24.10/24.10:o interfaces: send IPv6 solicit immediately on WAN interfaces
community/24.7/24.7.6:o interfaces: send IPv6 solicit immediately on WAN interfaces

bensmithurst · 2025-04-15T10:21:30Z

I specifically meant

% git grep solicit.imme
business/24.10/24.10:o interfaces: send IPv6 solicit immediately on WAN interfaces
community/24.7/24.7.6:o interfaces: send IPv6 solicit immediately on WAN interfaces

Understood - I was just meaning that because we start rtsold and the script at the same time, the change making rtsold send the solicit immediately (and thus run the script) is going to make the race condition more likely, I think.

Remove unnecessary logging of failure to start dhcp6c. Co-authored-by: Franco Fichtner <franco@lastsummer.de>

fichtner · 2025-04-15T10:40:33Z

Understood - I was just meaning that because we start rtsold and the script at the same time, the change making rtsold send the solicit immediately (and thus run the script) is going to make the race condition more likely, I think.

Yes, agreed.

Add suggestion from PR - don't use flock -F flag, and don't report error on lock failure. Co-authored-by: Franco Fichtner <franco@lastsummer.de>

This is needed to avoid concurrent runs of dhcp6c which can lead to IPv6 connectivity problems. Related to opnsense/core#8467 and opnsense/core#8431

fichtner · 2025-04-17T12:45:46Z

No change needed on core, but thanks very much to moving this along!

Wrap dhcp6c daemon in flock command.

0e735da

This avoids two instances running simultaneously due to a potential race condition where rtsold may run rtsold_script twice very close together.

fichtner linked an issue Mar 21, 2025 that may be closed by this pull request

No IPv6 connectivity after reboot / duplicate dhcp6c processes #8431

Closed

2 tasks

fichtner self-assigned this Mar 21, 2025

fichtner reviewed Apr 15, 2025

View reviewed changes

src/etc/inc/interfaces.inc Outdated Show resolved Hide resolved

Update src/etc/inc/interfaces.inc

d4d303c

Remove unnecessary logging of failure to start dhcp6c. Co-authored-by: Franco Fichtner <franco@lastsummer.de>

bensmithurst and others added 2 commits April 15, 2025 22:41

Merge remote-tracking branch 'origin/master' into dhcp6c-lockfile

7644242

Update src/etc/inc/interfaces.inc

012d077

Add suggestion from PR - don't use flock -F flag, and don't report error on lock failure. Co-authored-by: Franco Fichtner <franco@lastsummer.de>

bensmithurst mentioned this pull request Apr 16, 2025

Add lockfile (-l) command line parameter. opnsense/dhcp6c#40

Closed

fichtner closed this Apr 17, 2025

Prevent concurrent instances of dhcp6c #8467

Prevent concurrent instances of dhcp6c #8467

Uh oh!

Conversation

bensmithurst commented Mar 21, 2025

Uh oh!

fichtner commented Mar 21, 2025

Uh oh!

bensmithurst commented Mar 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fichtner commented Apr 15, 2025

Uh oh!

bensmithurst commented Apr 15, 2025

Uh oh!

fichtner commented Apr 15, 2025

Uh oh!

fichtner commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants