Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3 ping packet loss on CARP failover from Master -> Slave, fine failing back #3163

Closed
iMiMx opened this issue Jan 29, 2019 · 15 comments
Closed
Assignees
Labels
cleanup Low impact changes
Milestone

Comments

@iMiMx
Copy link

iMiMx commented Jan 29, 2019

Have HA setup and working well, however I noticed that when failing from Master -> Slave by enabling persistent carp maintenance mode, 3 pings are lost (state is synced fine, ssh sessions remain etc). When failing back from Slave -> Master everything is fine, no loss.

I ran a quick Google and it turned up the below, from pfSense:

https://forum.netgate.com/topic/123871/carp-master-manual-switch-introduces-packet-loss

The suggested workaround there seems to fix the problem. Executing the below on the master causes traffic to shift to the slave with 0 ping loss:

sysctl net.inet.carp.demotion 240

Is this a known/accepted issue?

@AdSchellevis AdSchellevis added the support Community support label Jan 29, 2019
@AdSchellevis
Copy link
Member

You can alter the demotion factor while in maintenance, the option sets the advskew on the interface, but there might be a small glitch (I'm not 100% sure)

$carp_maintenancemode = isset($config["virtualip_carp_maintenancemode"]);
if ($carp_maintenancemode) {
$advskew = "advskew 254";

@iMiMx
Copy link
Author

iMiMx commented Jan 29, 2019

By the time I've entered maintenance, the loss has already happened though? Or am I missing something? :)

  • Click CARP maintenance
  • 3 ping loss
  • Fail over completes successfully with state, etc
  • Reboot master
  • Exit CARP Maintenance
  • Fail back 0 ping loss

If instead:

  • sysctl net.inet.carp.demotion=240
  • 0 ping loss
  • Fail over has happened
  • Reboot master
  • Preempt fail back
  • Fail back 0 ping loss

@AdSchellevis
Copy link
Member

yes, your sysctl net.inet.carp.demotion=240 adds the amount to advskew, which is functionally the same, but the ifconfig might loose a packet when setting advskew.

Difference is after reboot, it falls back, persistent mode keeps the advskew. It would be more fluent if persistent first would set net.inet.carp.demotion and kept the advskew for the next boot.

@iMiMx
Copy link
Author

iMiMx commented Jan 29, 2019

Surely this should be classified as a bug then? Or is the 3 packet loss accepted as part of the current HA implementation?

@AdSchellevis
Copy link
Member

a feature request sure, a bug is debatable.

@AdSchellevis AdSchellevis added feature Adding new functionality and removed support Community support labels Jan 29, 2019
@mimugmail
Copy link
Member

I'd like to verify this on some systems before any changes are done. Are you sure your default demotions are good?

@AdSchellevis
Copy link
Member

@mimugmail thank you!

@iMiMx
Copy link
Author

iMiMx commented Jan 29, 2019

@mimugmail as far as I can tell, no errors in the logs, sessions aren't lost etc.

@fichtner
Copy link
Member

fichtner commented Mar 6, 2019

@mimugmail ping :)

@mimugmail
Copy link
Member

This week only homeoffice, sorry. I also have to hunt the 3 seconds timer when unplugging a cable from LACP bundle (someone an idea?). Next Wednesday I can test this.

Sorry for delay :/

@fichtner
Copy link
Member

fichtner commented Mar 7, 2019

No worries, just curious about the state :)

@fichtner fichtner added help wanted Contributor missing / timeout and removed feature Adding new functionality labels May 4, 2019
@mimugmail
Copy link
Member

@AdSchellevis I can confirm that there's no packet loss when demoting to 240.
Only thing is that in status page you'll get the warning

CARP has detected a problem and this unit has been demoted to BACKUP status.
Check link status on all interfaces with configured CARP VIPs.

Which doesn't happen when setting to mnt mode manually. Might confuse users .. perhaps you know the impact better than me :)

@AdSchellevis
Copy link
Member

@mimugmail I think you're looking for 0e9912c

@fichtner fichtner added cleanup Low impact changes and removed help wanted Contributor missing / timeout labels May 8, 2019
@fichtner fichtner added this to the 19.7 milestone May 8, 2019
@mimugmail
Copy link
Member

Best code in town, works great! :)

@fichtner
Copy link
Member

fichtner commented May 8, 2019

Adding it to 19.1.8, thanks!

@fichtner fichtner closed this as completed May 8, 2019
fichtner pushed a commit that referenced this issue May 8, 2019
EugenMayer pushed a commit to KontextWork/opnsense_core that referenced this issue Jul 22, 2019
EugenMayer pushed a commit to KontextWork/opnsense_core that referenced this issue Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Low impact changes
Development

No branches or pull requests

4 participants