rtpengine: hash table to keep the selected nodes #390

smititelu · 2015-11-05T15:13:25Z

Shared memory hash table with global hashtable lock.
Add state maintaining the selected rtp node, for a given callid.
Hashtable entry expiration time configurable using hash_entry_tout modparam.
The actual deletion happens on the fly while insert/remove/lookup are called.
Updated doku.

Shared memory hash table with global hashtable lock. Add state maintaining the selected rtp node, for a given callid. Hashtable entry expiration time configurable using hash_entry_tout modparam. The actual deletion happens on the fly while insert/remove/lookup are called. Updated doku.

smititelu · 2015-11-05T15:30:01Z

Basically tested it, with focus on entries expiration. Further tests will follow.

What do you think about current expired entries deletion implementation? Do you have any other advices/proposals?

This is my understanding of the current shared memory node list implementation. Correct me if I'm wrong.

Allow configurable table size. Updated doku.

Print the total number of hash entries in the hash table, at the given moment. Updated doku.

When the param is enabled, allow current sessions to finish and deny new sessions for manually deactivated rtpengine nodes via kamctl i.e. "disabled(permanent)" nodes. This is useful when deactivating the nodes for maintenance. Default value is 0, so the current behaviour is maintained (e.g. don't send commands to any deactivated proxy). Updated doku.

smititelu · 2015-11-10T15:15:35Z

Updated pull request. Added some modparams and kamctl for the hashtable. Added a modparam to allow session deletion to manually disabled nodes (useful when performing maintenance). Updated doku.

In my opinion tests were ok. Tested this for 100 calls with media, 10 calls/s rate. Sessions were deleted successfully upon BYE. No memory leaks found at the hash table level (used "kamcmd mod.stats rtpengine shm" for this).

miconda · 2015-11-11T13:51:52Z

What would happen when there is a restart?

I find the feature useful overall, any limitations need to be documented when it is enabled.

Hopefully @rfuchs can comment from his point of view and merge when all is ok.

smititelu · 2015-11-11T14:50:58Z

On a restart (i.e. rtpengine's mod_destroy() is called), rtpengine_hash_table_destroy() is called which frees the entire shm hashtable.

Possible limitations:

overhead added by the rtpengine hashlock (i.e. decrease in calls/sec rate)
more shm used (i.e. one has to increase the shm using kamailio's "-m" parameter)

What is the current known limitation for kamailio (calls/sec) ?

miconda · 2015-11-11T14:56:47Z

I was looking more at the perspective of the items in hash table being lost and if the restart was in a middle of the call, then after restart, when the BYE is processed, which rtpengine is updated?

smititelu · 2015-11-11T15:08:50Z

Hmm.. I think I get the point. If the kamailio restarts in the middle of a call, the hashtable is flushed. When the BYE comes, no entry is found for that callid and no rtpengine is updated.

smititelu · 2015-11-11T15:12:36Z

One approach to this would be to fallback to the stateless behaviour when no entry is found in the hashtable and just select a rtpengine based on the hashed callid (with 100% matching the rtpengine machine for that call if none crashed in the meantime).

rfuchs · 2015-11-11T16:04:23Z

modules/rtpengine/rtpengine.c

+
+	/* Insert the key<->entry from the hashtable */
+	if (!rtpengine_hash_table_insert(&callid, entry)) {
+		LM_ERR("rtpengine hash table fail to insert node=%.*s for calllen=%d callid=%.*s",


The error case needs some shm_free() I think

gaaf · 2015-11-12T07:56:44Z

Why not just require the dialog module and store the selected rtpproxy in the dialog?

What happens on spiraled/hairpin calls? The call-id alone does not seem specific enough.

miconda · 2015-11-12T08:06:46Z

I would not like a dependency on dialog module, it is quite common to run rtp relay on edge proxy which is typically more or less a lightweight load balancer -- dialog management on those nodes will add a lot of overhead.

Fallback to hasing callid after a restart seams reasonable for me (giving the current behaviour).

For spiralled calls, iirc, there is an option to provide via branch to rtpproxy/rtpengine, but this is independent of selecting what rtpproxy/rtpengine to use.

smititelu · 2015-11-12T08:09:01Z

We also thought of using dialog module for keeping the node. The main reason against it was to keep the rtpengine module as independent as it could be. Maybe for spiraled calls this could be extended to consider also the from/to-tag when calculating the hash index ?!

juha-h · 2015-11-12T08:27:23Z

Alex writes:

Why not just require the dialog module and store the selected rtpproxy in the dialog?

dialog module is a complicated mess and its use cannot be warranted just for enabling this feature.

…

-- juha

gaaf · 2015-11-12T08:47:05Z

So, the dialog module is a no-go/

Have you considered just putting the selected rttproxy in an RR parameter? For me that has worked perfectly for years now.

juha-h · 2015-11-12T08:57:59Z

Alex writes:

Have you considered just putting the selected rttproxy in an RR parameter? For me that has worked perfectly for years now.

that is what i'm doing also: ;pm[=setid]

…

-- juha

miconda · 2015-11-12T09:08:14Z

That is only the setid not the selected rtp relay within the set. Afaik, there is no way to specify the rtp relay within a set from config file.

juha-h · 2015-11-12T09:17:40Z

Daniel-Constantin Mierla writes:

That is only the setid not the selected rtp relay within the set. Afaik, there is no way to specify the rtp relay within a set from config file.

it works if you have only one relay per set.

…

-- juha

miconda · 2015-11-12T09:23:50Z

My understanding was that the patch was adding support for keeping the relation between the selected rtpengine and the callid, when there are more than one rtpengine per set. If one of the rtpengine in the set is unavailable, then the current hashing selection becomes inconsistent if the rtpengine turns back available.

If there is only one rtpengine per set, then no hashing selection is done.

- shm NULL checks and free already alloc'ed shm - default entry tout to 3600 sec - return node only, not the whole entry - zero shm hashtable parts - lookup and select new node if lookup fails; this is done for all commands and assures fallback behaviour - change void to struct specific - make set_rtp_inst_pvar() static -> used only in rtpengine.c - fix typos rtpproxy vs rtpengine

- hash table entry contains callid, viabranch - hash table lookup based on callid, viabranch (useful for branching scenarios); keep doing the hash table remove right away - remove op param when select_rtpp_node(); not needed

smititelu · 2015-11-16T15:01:38Z

Updated pull request. Some of the most important updates:

free previously allocated shm in case of reached memory limit and solve possible segfaults; tested this using -m 128 and a large table: kamailio didn't crashed when had no memory for the entries and still accepted calls afterwards.
always lookup hashtable before selecting a new node; this fallbacks to the previous behaviour
check also for viabranch when matching the call in addition to the callid; tested this for simple calls i.e. viabranch was STR_NULL; didn't tested it for real-life branching scenarios as we are not using it; here I'd need some help from your side, if you use those scenarios and have time for it; IMHO the matching should be fine as I'm checking the viabranch value I get from the struct sip_msg.

smititelu · 2015-11-23T10:54:07Z

Accidentally pushed all the commits when trying to "git push upstream p_usrloc_NULL_checks master"; forgot the ":".

Shall I re-push the old files? (can't push -f a reset on previous commit due to locking).

linuxmaniac · 2015-11-23T11:01:36Z

git revert is your friend

linuxmaniac · 2015-11-23T11:06:38Z

Done. I reverted all your commits

smititelu · 2015-11-23T11:07:23Z

Thank you; didn't think of it :)

miconda · 2015-11-23T11:14:36Z

@linuxmaniac -- you should add the commands you executed in the wiki:

https://www.kamailio.org/wiki/devel/git-commit-guidelines#useful_commands

It will be useful for devs getting into git.

linuxmaniac · 2015-11-23T12:11:05Z

you should add the commands you executed in the wiki:
https://www.kamailio.org/wiki/devel/git-commit-guidelines#useful_commands

Done
https://www.kamailio.org/wiki/devel/git-commit-guidelines#revert_already_pushed_commit

Don't dup a NULL str->s to avoid warning message. This happened usually when viabranch is not used(default being NULL).

If allow_op modparam enabled, send commands to the disabled machines for the existing call. So far this was done only for manually deactivated machines. This is useful because there might be cases of proxy timeout, cases when you may want to still allow the operations for the existing calls.

This will further increase rtpengine's hash table access.

For consistency with the per row locks, statistics should be also per row.

- struct rtpengine_hash_table now contains the table size. - rename the entry_list to row_entry_list

smititelu · 2015-12-09T15:01:48Z

Updated the pull request with focus on the hastable API:

done per row hashtable locking instead of global hashtable locking; this should further decrease the possible limitations of using hashtable locks;
when enabled, the allow_op modparam will allow sessions to finish for both disabled(permanent) and disabled nodes(they might be disabled due to timeout not to real machine break and one might still want the current sessions to finish);

rfuchs · 2015-12-10T07:17:39Z

modules/rtpengine/rtpengine_hash.c

+	rtpengine_hash_table->row_entry_list = shm_malloc(rtpengine_hash_table->size * sizeof(struct rtpengine_hash_entry*));
+	if (!rtpengine_hash_table->row_entry_list) {
+		LM_ERR("no shm left to create rtpengine_hash_table->row_entry_list\n");
+		rtpengine_hash_table_destroy();


A minor nitpick: These calls to _destroy() here won't ever actually do anything, because the _sanity_checks() will always fail.

Right. The _destroy() should do sanity checking, with memory free, when possible. Will correct this.

rfuchs · 2015-12-10T07:40:16Z

Other than that, looks good to me.

_destroy() sanity checking, with memory free, when possible: - alloc the locks first. - free the locks last. - consider content already hadled for a NULL lock (or NULL lock vector). - make _free_row_lock() static.

linuxmaniac assigned rfuchs Nov 5, 2015

Stefan Mititelu added 4 commits November 6, 2015 17:17

rtpengine: Update doku for node enable/disable

02d8a62

This is my understanding of the current shared memory node list implementation. Correct me if I'm wrong.

rtpengine: Add hash_table_size modparam

7375d0b

Allow configurable table size. Updated doku.

rtpengine: kamctl fifo nh_show_hash_total

74fdbe2

Print the total number of hash entries in the hash table, at the given moment. Updated doku.

rfuchs reviewed Nov 11, 2015
View reviewed changes

Stefan Mititelu added 2 commits November 16, 2015 13:26

rtpengine: Fix deletion for branching scenarios

5f936a3

- hash table entry contains callid, viabranch - hash table lookup based on callid, viabranch (useful for branching scenarios); keep doing the hash table remove right away - remove op param when select_rtpp_node(); not needed

Stefan Mititelu added 5 commits December 4, 2015 13:16

rtpengine: Don't shm_str_dup() a NULL str->s

6390e8b

Don't dup a NULL str->s to avoid warning message. This happened usually when viabranch is not used(default being NULL).

rtpengine: Add per rows hash table locks

a22b59f

This will further increase rtpengine's hash table access.

rtpengine: Add per rows totals statistics

c4f2b55

For consistency with the per row locks, statistics should be also per row.

rtpengine: Move the size inside the hash table

5ad022a

- struct rtpengine_hash_table now contains the table size. - rename the entry_list to row_entry_list

rfuchs reviewed Dec 10, 2015
View reviewed changes

rtpengine: _destroy() sanity + memory free

95cd106

_destroy() sanity checking, with memory free, when possible: - alloc the locks first. - free the locks last. - consider content already hadled for a NULL lock (or NULL lock vector). - make _free_row_lock() static.

rfuchs merged commit 95cd106 into kamailio:master Dec 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rtpengine: hash table to keep the selected nodes #390

rtpengine: hash table to keep the selected nodes #390

smititelu commented Nov 5, 2015

smititelu commented Nov 5, 2015

smititelu commented Nov 10, 2015

miconda commented Nov 11, 2015

smititelu commented Nov 11, 2015

miconda commented Nov 11, 2015

smititelu commented Nov 11, 2015

smititelu commented Nov 11, 2015

rfuchs Nov 11, 2015

gaaf commented Nov 12, 2015

miconda commented Nov 12, 2015

smititelu commented Nov 12, 2015

juha-h commented Nov 12, 2015 via email

gaaf commented Nov 12, 2015

juha-h commented Nov 12, 2015 via email

miconda commented Nov 12, 2015

juha-h commented Nov 12, 2015 via email

miconda commented Nov 12, 2015

smititelu commented Nov 16, 2015

smititelu commented Nov 23, 2015

linuxmaniac commented Nov 23, 2015

linuxmaniac commented Nov 23, 2015

smititelu commented Nov 23, 2015

miconda commented Nov 23, 2015

linuxmaniac commented Nov 23, 2015

smititelu commented Dec 9, 2015

rfuchs Dec 10, 2015

smititelu Dec 10, 2015

rfuchs commented Dec 10, 2015

rtpengine: hash table to keep the selected nodes #390

rtpengine: hash table to keep the selected nodes #390

Conversation

smititelu commented Nov 5, 2015

smititelu commented Nov 5, 2015

smititelu commented Nov 10, 2015

miconda commented Nov 11, 2015

smititelu commented Nov 11, 2015

miconda commented Nov 11, 2015

smititelu commented Nov 11, 2015

smititelu commented Nov 11, 2015

rfuchs Nov 11, 2015

Choose a reason for hiding this comment

gaaf commented Nov 12, 2015

miconda commented Nov 12, 2015

smititelu commented Nov 12, 2015

juha-h commented Nov 12, 2015 via email

gaaf commented Nov 12, 2015

juha-h commented Nov 12, 2015 via email

miconda commented Nov 12, 2015

juha-h commented Nov 12, 2015 via email

miconda commented Nov 12, 2015

smititelu commented Nov 16, 2015

smititelu commented Nov 23, 2015

linuxmaniac commented Nov 23, 2015

linuxmaniac commented Nov 23, 2015

smititelu commented Nov 23, 2015

miconda commented Nov 23, 2015

linuxmaniac commented Nov 23, 2015

smititelu commented Dec 9, 2015

rfuchs Dec 10, 2015

Choose a reason for hiding this comment

smititelu Dec 10, 2015

Choose a reason for hiding this comment

rfuchs commented Dec 10, 2015