Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep track of connections, introduce per-vhost limits #891

Merged
merged 36 commits into from Aug 26, 2016

Conversation

michaelklishin
Copy link
Member

@michaelklishin michaelklishin commented Jul 22, 2016

This introduces (client) connection tracking that's aware of connection

  • vhost
  • username
  • protocol

and so on across the cluster. We already collect much of this information
in node-local state but that's not sufficient for #500, #607, and related features.

This changes internal database schema and thus can only go into 3.7.0.

This also makes sure that when a vhost is deleted, we force close all connections in it.

This is a 3rd iteration on the design. All connections and counters are tracked using
node-specific tables which are only updated on the node itself but read by all cluster nodes.
This avoids concurrent counter updates without going all the way to CRDTs.

The implementation isn't perfect but I run out of practical test cases to add. Points of improvement for the future:

  • Parallel folding over per-node connection tracking tables
  • Management plugin extensions related to limit management
  • rabbitmqctl status and report entries related to limits
  • As with many other things, replacing Mnesia would open new possibilities but it's a long term goal

Fixes #500, #627.
Lays ground work for #501, #607.

Depends on rabbitmq/rabbitmq-common#121.

Fixes #500, #627.

Squashed commit of the following:

commit 88036dc
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Thu Jul 21 03:31:25 2016 +0300

    Refactor

commit fc84b7a
Merge: df745e2 df28c63
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Wed Jul 20 18:30:19 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit df745e2
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Wed Jul 20 18:04:59 2016 +0300

    Force close connections when vhost is deleted

    Fixes #627, related to #500.

commit 2167f8f
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Wed Jul 20 16:00:35 2016 +0300

    Add tests for per-vhost connection limits

commit 2a032a3
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Wed Jul 20 01:53:07 2016 +0300

    Rename a few tests

commit 86ce592
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Wed Jul 20 01:44:10 2016 +0300

    Tests for connection re-registration idempotency

commit a774c7b
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Tue Jul 19 04:05:20 2016 +0300

    Ask nodes that come back to re-register their connections

    Depending on the partition handling mode used there may or may not
    be any clients still connected. We make sure that registration
    and deregistration functions are idempotent and assume there
    may be connections on the node that has come back.

    Point of improvement: when a node comes back up, N-1 nodes
    will tell it to re-register connections. It could be fewer
    than N-1, ideally just 1.

commit 24e4c0e
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Mon Jul 18 17:05:17 2016 +0300

    Fix boot step

commit 62da3c6
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Mon Jul 18 11:16:21 2016 +0300

    Compile

commit b656f9e
Merge: f2831e1 492406e
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Thu Jul 14 15:25:49 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit f2831e1
Merge: e5858e9 7b10a4e
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Thu Jul 7 13:45:31 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit e5858e9
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Wed Jul 6 12:32:56 2016 +0300

    Towards working connection re-registration after (inter-node) network splits

commit 548df73
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Wed Jul 6 12:32:07 2016 +0300

    Make network split simulation work as expected

commit 4028c66
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Tue Jul 5 14:43:37 2016 +0300

    Close connections using rabbit_ct_client_helpers

    Per discussion with @dumbbell.

commit 26fecc9
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Tue Jul 5 04:17:52 2016 +0300

    Extract connection limit partition tests into a separate suite

commit 8a466f1
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Tue Jul 5 04:17:41 2016 +0300

    Better logging

commit b06de9b
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Mon Jul 4 02:54:54 2016 +0300

    Modify a test so that it (expectedly) fails

commit 078a78a
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Mon Jul 4 02:44:58 2016 +0300

    Towards covering node termination/unavailability in connection tracking

commit ab99361
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sun Jul 3 15:25:10 2016 +0300

    These are moved to rabbit_ct_broker_helpers

commit 520b6ef
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sun Jul 3 03:54:52 2016 +0300

    {allow,block}_traffic_between/2 are moved to rabbit_ct_broker_helpers

commit b842eaa
Merge: 26eb1fa d4f031e
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sun Jul 3 03:14:27 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit 26eb1fa
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sun Jul 3 02:39:09 2016 +0300

    dist_proxy helpers moved to rabbit_ct_broker_helpers

commit 3d741f4
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sun Jul 3 01:28:44 2016 +0300

    Cluster node shutdown test

commit 57c7129
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sat Jul 2 23:01:46 2016 +0300

    Refactor

commit b736b30
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sat Jul 2 22:49:42 2016 +0300

    More tests

commit dc1cb5f
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sat Jul 2 22:27:16 2016 +0300

    More tests

commit e94edfe
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sat Jul 2 17:08:34 2016 +0300

    Initial per-vhost connection limit tests

commit 15b7b4e
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sat Jul 2 15:04:57 2016 +0300

    Adapt to master, compile

commit dc7f333
Merge: e4884ff bb1fa55
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Sat Jul 2 02:44:18 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit e4884ff
Merge: 71e2710 f0f43f8
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Wed Jun 29 14:27:40 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit 71e2710
Merge: b1ec9f3 704a2b5
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Thu Mar 31 01:55:29 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

    Conflicts:
    	src/rabbit_control_main.erl
    	src/rabbit_types.erl

commit b1ec9f3
Author: Michael Klishin <mklishin@pivotal.io>
Date:   Mon Feb 15 13:51:37 2016 +0300

    Stub out event handlers for #627 and #628

commit f3cfb57
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Sat Feb 13 01:33:50 2016 +0300

    Use a counter column to track number of connections per vhost

    Limit query time is now 50-70 microseconds for
    50M connections.

commit e9132f1
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Fri Feb 12 06:23:51 2016 +0300

    Ignore ./debug

commit 976e3ae
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Fri Feb 12 06:20:01 2016 +0300

    Switch to ets:select_count/2

commit ec23cf1
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Thu Feb 11 05:11:08 2016 +0300

    Enforce max connection limit

    Also introduce `rabbitmqctl clear_vhost_limits`
    and fix rabbitmqctl(1).

commit ba20887
Merge: 49a1886 8974581
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Thu Feb 11 02:16:24 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit 49a1886
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Wed Feb 10 16:45:00 2016 +0300

    Spelling

commit 723e6e4
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Wed Feb 10 16:31:34 2016 +0300

    Create secondary indices on rabbit_tracked_connection.vhost and username

commit b468c0f
Merge: 6940d05 0120438
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Wed Feb 10 12:23:38 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit 6940d05
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Mon Feb 8 01:30:03 2016 +0300

    Spam

commit 032c2a6
Merge: 46da39c 2374ae8
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Fri Feb 5 23:48:38 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit 46da39c
Merge: 655e351 05361e6
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Wed Feb 3 11:20:08 2016 +0300

    Merge branch 'master' into rabbitmq-server-500

commit 655e351
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Wed Feb 3 11:19:23 2016 +0300

    Store and delete tracked connections in a table

commit 4e849cf
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Tue Jan 19 17:56:14 2016 +0300

    Compile

commit 504adde
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Tue Jan 19 17:55:08 2016 +0300

    Switch to a handler for connection tracking (WIP)

commit 3e1d2b4
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Tue Jan 19 14:46:09 2016 +0300

    Migrations for virtual host limits and tracked connections

commit 7499020
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Fri Jan 8 19:40:36 2016 +0300

    Compile

commit f3a1101
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Fri Jan 8 19:14:12 2016 +0300

    Switch rabbitmqctl set_vhost_limits to use JSON payload values

    Just like policies do.

commit 7fc5f1a
Author: Michael Klishin <michael@clojurewerkz.org>
Date:   Wed Jan 6 19:07:50 2016 +0300

    Stub out set_vhost_limits in ctl
@michaelklishin michaelklishin added this to the n/a milestone Jul 22, 2016
@michaelklishin
Copy link
Member Author

Feel free to QA but I have one more test case to add => do not merge yet.

@michaelklishin
Copy link
Member Author

Also: this has a rabbitmqctl(1) entry but still needs a doc guide.

[] ->
mnesia:write(?TABLE, Conn, write),
mnesia:dirty_update_counter(
rabbit_tracked_connection_per_vhost, VHost, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to use ?PER_VHOST_COUNTER_TABLE here.

@michaelklishin
Copy link
Member Author

This approach suffers from the lost updates problem when connections from a failed node are removed concurrently. We have evaluated a couple of approaches and the most prominent so far seems to be:

  • Have a pair of connection tracking tables per node
  • Every node updates its own tables
  • Every node reads from all tables in order to come up with a total count

So, one writer, many readers. This should also make it possible to eliminate the connection re-registration mechanism in place today.

case rabbit_connection_tracking:list(VHost) of
[] -> {ok, State};
Cs ->
[rabbit_networking:close_connection(Pid, rabbit_misc:format("vhost '~s' is deleted", [VHost])) || #tracked_connection{pid = Pid} <- Cs],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a case statement is necessary here? if Cs = [], then nothing to evaluate in the list comprehension, and you can just return {ok, State}.

This way we have a single writer, multiple readers and no
lost counter update due to natural race conditions.
Data for tables from available nodes are then aggregated.

Two open questions:

 * Whether to use a parallel version of lists:map/2
 * When to clean up the tables
connected_at = proplists:get_value(connected_at, EventDetails),
pid = proplists:get_value(pid, EventDetails),
peer_host = proplists:get_value(peer_host, EventDetails),
peer_port = proplists:get_value(peer_port, EventDetails)}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not lie on the critical path, but use of proplists:get_value/2 seems quite extensive. how about using rabbit_misc:pget/2 instead, considering that it was optimised in rabbitmq/rabbitmq-common#51

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thank you.

@michaelklishin michaelklishin changed the title Keep track of connections, introduce per-vhost limits DO NOT MERGE Keep track of connections, introduce per-vhost limits Aug 12, 2016
@michaelklishin
Copy link
Member Author

michaelklishin commented Aug 12, 2016

This affected connection tracking in the stats DB, now fixed: rabbitmq/rabbitmq-management#274.

@michaelklishin michaelklishin changed the title DO NOT MERGE Keep track of connections, introduce per-vhost limits Keep track of connections, introduce per-vhost limits Aug 12, 2016
@michaelklishin
Copy link
Member Author

@dumbbell this is ready for another round.

catch _ -> 0
end
end, rabbit_mnesia:cluster_nodes(running)),
lists:foldl(fun(X, Acc) -> Acc + X end, 0, Ns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelklishin you're going through 2 lists just to calculate number of connections? imagine how much this will impact performance (per connection open request) as cluster size increases? can you use lists:foldl/3 only please, with connection count as your accumulator. there's no need for lists:map/2 here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but you are worrying about the wrong thing here. What really needs to happen is doing a parallel map but for small clusters it will likely be less efficient so I decided that it can be done at a later point.

It's not common to see clusters with more than maybe 7-9 nodes and it's unlikely to be a major contributor to how many connections/second can be accepted. Plus if you don't have any limits configured, none of this code will be executed.

Tab = tracked_connection_table_name_for(Node),
mnesia:dirty_match_object(Tab, #tracked_connection{vhost = VHost, _ = '_'})
end, rabbit_mnesia:cluster_nodes(running)),
lists:foldl(fun(Chunk, Acc) -> Acc ++ Chunk end, [], Chunks).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also make Chunks your accumulator in lists:foldl/3 for both your listing functions (unless you were planning something else)

michaelklishin and others added 13 commits August 18, 2016 09:32
Conflicts:
	src/rabbit_control_main.erl
When the node is offline (`--offline` is set on the command line),
we must skip the emission of the `node_deleted` event because the
rabbit_event event handler is not setup (RabbitMQ is stopped).

Otherwise, the command fails with `badarg` in gen_event.

References #500.
[#116521809]
The connection tracking tables are not replicated because the table
tracking connections on node A logically exists only the node A.

The backup made during the rename of a node failed because it wanted to
access a remote offline node to backup its connection tracking tables.
Obviously it didn't work. The solution is to not backup those tables.
This is correct because they are only relevant while the node is
running.

References #500.
[#116521809]
…lose

This helps in CI where the connection tracking tables may not yet
be up-to-date at the time the testcase verifies them.

References #500.
[#116521809]
While here, be more informative when a node rename fails.
This makes sure the limits are still good after a node was renamed and
thus, the tracking tables were recreated with a new name too.

While here, do some cleanup in the init/end functions in both the two
per-vhost limit testsuite.

References #500.
[#116521809]
@michaelklishin michaelklishin merged commit 6d9d506 into master Aug 26, 2016
@dumbbell dumbbell deleted the rabbitmq-server-500-squashed branch January 2, 2018 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Per-vhost connection limit
3 participants