Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dmq_usrloc: sync with multi contacts per message #1054

Merged
merged 2 commits into from Apr 21, 2017

Conversation

jchavanton
Copy link
Member

@jchavanton jchavanton commented Apr 3, 2017

This was tested with 100K contacts and 4 servers (all running on the same host), when restarting a server we get a full sync from the 3 nodes in a few seconds :

show registrations: first x
    "usrloc:location-contacts = 100000",
show registrations: second x
    "usrloc:location-contacts = 100000",
show registrations: third x
    "usrloc:location-contacts = 100000",
show registrations: fourth x
    "usrloc:location-contacts = 100000",
postgres database: 100000

@jchavanton
Copy link
Member Author

jchavanton commented Apr 4, 2017

I think, I should add a check for maximum packet length. to make sure we will never try to send a datagram larger than 60K.
Next step, we could add compression :)

150 contacts may be too much in some cases because each contacts as a max length of 1024, we should deal with this automatically.

@charlesrchance
Copy link
Member

I am reviewing the patch currently but as for limiting to 60KB, I'm not sure that for this application it is necessary. It will be difficult to be 100% accurate, anyway, since you only have control over the body - therefore, the only option is to assume a sensible size for the rest of the message and set that aside in your calculation.

I think Kamailio handles fragmentation just fine and since this is really just a few packets on startup/initial sync, I don't think we need to be concerned about it (although it may be worth a mention in the readme). Others may have a different opinion of course.

@jchavanton
Copy link
Member Author

This is now deployed on cluster running on AWS, fragmentation is taking place and everything is performing much better, no more transactions storm.

OK to update the readme, changing the example to 50 contacts and adding a comment about considering 1024 Bytes / contact in order to stay bellow 65536 UDP send

@miconda
Copy link
Member

miconda commented Apr 4, 2017

@jchavanton: the documentation has to be edited in docbook xml files located in doc/ subfolder of each module. The readme file must not be edited directly, it is generate on server from those file.

@jchavanton
Copy link
Member Author

ok, corrected the documentation, I squashed to avoid too many commits

@jchavanton
Copy link
Member Author

The overhead per contact is 188 chars, this is quite a lot, we may select an alternate format later.
1,:{"action":,"aor":"","ruid":"","c":"","received":"","path":"","callid":"","user_agent":"","instance":"","expires":,"cseq":,"flags":,"cflags":,"q":,"last_modified":,"methods":,"reg_id":},

I think we could validate the size of the packet while building it to automatically send > 60000 for example.

@charlesrchance
Copy link
Member

Are there other changes you'd like to make to this PR?

@jchavanton
Copy link
Member Author

jchavanton commented Apr 5, 2017

Hi Charles, I am implementing the size check, I think it would be nightmarish not to know if sometimes
contacts could be missing because of oversized messages, this will simply things for anyone using this feature.

I am verifying that the calculation matches and I will test and commit later today.

Thanks for your review so far

@jchavanton
Copy link
Member Author

I added dmq_usrloc_batch_msg_size to make sure we are never trying to send messages too large and to simplify the configuration.
I did run a bunch of tests again.

@jchavanton
Copy link
Member Author

Hi @charlesrchance I am not planning any other modifications on this PR, the performance of sync is now acceptable for our need at this point.

@jchavanton
Copy link
Member Author

jchavanton commented Apr 7, 2017

One thing that could help would be to increase the retransmission timer only for the sync traffic, I did not see any obvious way to do this. I guess this is off topic and I was planning to discuss this in the mailing list.

@miconda
Copy link
Member

miconda commented Apr 18, 2017

@charlesrchance I guess this is ready to be merged if you are fine with everything so far.

@charlesrchance
Copy link
Member

Returning after an extended Easter break - therefore, have not tested the recent changes, but it looks ok from source. If it has been tested already and everyone else is happy then it can be merged. Otherwise, I will do it tomorrow.

@jchavanton
Copy link
Member Author

jchavanton commented Apr 19, 2017

I did test the latest version in my lab, not in prod yet.
(by latest version I mean the one with the max size check, we are already using multi contact per message in prod)

@jchavanton
Copy link
Member Author

The lab test I did was load testing with 4 nodes, before going to prod we do integration testing, in this case see no reason why they would fail.
The only reason why this is not in prodyet, is because instead of back porting to 4.5 we will start using 5.x

@charlesrchance charlesrchance merged commit f314edc into kamailio:master Apr 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants