Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispatcher: Fix handling of inactive destination for alg 13 #2964

Merged
merged 1 commit into from Jan 11, 2022

Conversation

mtryfoss
Copy link
Member

@mtryfoss mtryfoss commented Dec 8, 2021

Pre-Submission Checklist

  • Commit message has the format required by CONTRIBUTING guide
  • Commits are split per component (core, individual modules, libs, utils, ...)
  • Each component has a single commit (if not, squash them into one commit)
  • No commits to README files for modules (changes must be done to docbook files
    in doc/ subfolder, the README file is autogenerated)

Type Of Change

  • Small bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would change existing functionality)

Checklist:

  • PR should be backported to stable branches
  • Tested changes locally
  • Related to issue #XXXX (replace XXXX with an open issue number)

Description

Alg 13 did try to distribute calls to inactive destinations.
If the highest priority destination is inactive, hash is not updated
and the xavp is not set. This is resulting in failover mechanism
not working at all for the given call.

When the hash variable is not updated, it makes alg 13 behave like
round robin if the scenario above occurs. If you got two destinations
and the highest priority is out of service, 50% of the calls will fail.

Now I tried a more simple approach updating hash with the first
entry of the sorted list.

Also fixed a typo in a variable name.

Alg 13 did try to distribute calls to inactive destinations.
If the highest priority destination is inactive, hash is not updated
and the xavp is not set. This is resulting in failover mechanism
not working at all for the given call.

When the hash variable is not updated, it makes alg 13 behave like
round robin if the scenario above occurs. If you got two destinations
and the highest priority is out of service, 50% of the calls will fail.

Now I tried a more simple approach updating hash with the first
entry of the sorted list.
@miconda
Copy link
Member

miconda commented Dec 8, 2021

@jchavanton - any comments on this PR?

@mtryfoss
Copy link
Member Author

mtryfoss commented Dec 8, 2021

Please notice, we use this with latency stats disabled. Just a way of having distribution with round robin within each priority for more than two destinations.

For example, in normal operation use round robin against two destinations and a third as failover if both the two first goes offline.

@miconda
Copy link
Member

miconda commented Dec 13, 2021

@mtryfoss - have you checked ds_select_routes(), isn't it more suitable for what you need?

@mtryfoss
Copy link
Member Author

It seems like that's more of a thing to combine multiple dispatcher sets using potentially different algos?
That could be useful in some other scenarios, but in the current setup this is working perfectly with the applied patch.

My point here is that algo 13 seems to be broken (failover does not work) if the first selected (highest priority) destination is marked as inactive. The xavp var is not set.

While it's being probed but not yet inactive, the call distribution will time out - but the xavp will be set and the on failure logic can trigger as expected.

@miconda
Copy link
Member

miconda commented Dec 22, 2021

I am not using that algorithm, so I have no experience with expected behaviour. If @jchavanton has no comments soon, probably will be merged, I see it affects only code specific to the algorithm.

@miconda miconda merged commit adba3ca into kamailio:master Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants