Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use mnesia for IQ response tracking #3560

Merged
merged 4 commits into from
Mar 22, 2022

Conversation

arcusfelis
Copy link
Contributor

@arcusfelis arcusfelis commented Feb 23, 2022

The code I am touching here is a bit old and a bit hairy.
I honestly didn't read that one and it took sometime to understand why we use mnesia there to begin with.

So, TLDR:

  • the code I am touching allows to route an IQ from the SERVER to the CLIENT (the opposite way they usually go).
  • it is used in mod_ping.
  • though, the last thing I want is to replicate data, that is used by mod_ping in a big cluster (because of how often it is written).
  • in the real life, that iq_response would never do RPC calls (at least for mod_ping, because the process sending IQ and receiving IQ is a c2s process). We still need to test it though :(

Proposed changes include:

  • This change removes replication for IQ response tracking (useful in the big clusters, that use mod_ping)
  • We use IQ IDs to locate the metadata for the IQ instead

@codecov
Copy link

codecov bot commented Feb 23, 2022

Codecov Report

Merging #3560 (f476beb) into master (b4392ac) will increase coverage by 0.00%.
The diff coverage is 81.57%.

@@           Coverage Diff           @@
##           master    #3560   +/-   ##
=======================================
  Coverage   79.23%   79.24%           
=======================================
  Files         421      421           
  Lines       32283    32280    -3     
=======================================
- Hits        25581    25579    -2     
+ Misses       6702     6701    -1     
Impacted Files Coverage Δ
src/ejabberd_node_id.erl 91.30% <75.00%> (-3.44%) ⬇️
src/ejabberd_local.erl 77.68% <82.35%> (+0.34%) ⬆️
src/mongoose_tcp_listener.erl 76.59% <0.00%> (-4.26%) ⬇️
src/rdbms/mongoose_rdbms.erl 62.17% <0.00%> (-1.13%) ⬇️
src/offline/mod_offline.erl 76.08% <0.00%> (-1.09%) ⬇️
src/mod_roster.erl 78.52% <0.00%> (-0.24%) ⬇️
src/mod_muc_log.erl 78.11% <0.00%> (ø)
src/mod_muc_room.erl 77.09% <0.00%> (+0.05%) ⬆️
src/pubsub/mod_pubsub.erl 73.31% <0.00%> (+0.17%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4392ac...f476beb. Read the comment docs.

@mongoose-im

This comment was marked as outdated.

This change removes replication for IQ response tracking
(useful in the big clusters, that use mod_ping)
We use IQ IDs to locate the metadata for the IQ instead
@arcusfelis arcusfelis force-pushed the mu-local-iq-response-without-mnesia branch from 51a1be3 to 82809ec Compare February 23, 2022 20:31
@mongoose-im

This comment was marked as outdated.

@mongoose-im
Copy link
Collaborator

mongoose-im commented Mar 7, 2022

small_tests_24 / small_tests / 962079a
Reports root / small


small_tests_23 / small_tests / 962079a
Reports root / small


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 962079a
Reports root/ big
OK: 2723 / Failed: 0 / User-skipped: 239 / Auto-skipped: 0


dynamic_domains_mysql_redis_24 / mysql_redis / 962079a
Reports root/ big
OK: 2706 / Failed: 0 / User-skipped: 256 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 962079a
Reports root/ big
OK: 2723 / Failed: 0 / User-skipped: 239 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 962079a
Reports root/ big
OK: 2723 / Failed: 0 / User-skipped: 239 / Auto-skipped: 0


ldap_mnesia_23 / ldap_mnesia / 962079a
Reports root/ big
OK: 1512 / Failed: 0 / User-skipped: 390 / Auto-skipped: 0


ldap_mnesia_24 / ldap_mnesia / 962079a
Reports root/ big
OK: 1512 / Failed: 0 / User-skipped: 390 / Auto-skipped: 0


internal_mnesia_24 / internal_mnesia / 962079a
Reports root/ big
OK: 1553 / Failed: 0 / User-skipped: 349 / Auto-skipped: 0


elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 962079a
Reports root/ big
OK: 1686 / Failed: 1 / User-skipped: 369 / Auto-skipped: 0

bosh_SUITE:essential:accept_higher_hold_value
{error,
  {{assertEqual,
     [{module,bosh_SUITE},
      {line,251},
      {expression,"get_bosh_sessions ( )"},
      {expected,[]},
      {value,
        [{bosh_session,<<"fff9ba1bbbb81ce45068c730f278fa2e334cc4f6">>,
           <8643.6226.0>}]}]},
   [{bosh_SUITE,accept_higher_hold_value,1,
      [{file,"/home/circleci/project/big_tests/tests/bosh_SUITE.erl"},
       {line,251}]},
    {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
    {test_server,run_test_case_eval1,6,
      [{file,"test_server.erl"},{line,1292}]},
    {test_server,run_test_case_eval,9,
      [{file,"test_server.erl"},{line,1224}]}]}}

Report log


mysql_redis_24 / mysql_redis / 962079a
Reports root/ big
OK: 3092 / Failed: 0 / User-skipped: 253 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / 962079a
Reports root/ big
OK: 3097 / Failed: 0 / User-skipped: 248 / Auto-skipped: 0


pgsql_mnesia_23 / pgsql_mnesia / 962079a
Reports root/ big
OK: 3097 / Failed: 0 / User-skipped: 248 / Auto-skipped: 0


mssql_mnesia_24 / odbc_mssql_mnesia / 962079a
Reports root/ big
OK: 3097 / Failed: 0 / User-skipped: 248 / Auto-skipped: 0


riak_mnesia_24 / riak_mnesia / 962079a
Reports root/ big
OK: 1692 / Failed: 0 / User-skipped: 364 / Auto-skipped: 0


elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 962079a
Reports root/ big
OK: 1687 / Failed: 0 / User-skipped: 369 / Auto-skipped: 0

Comment on lines 141 to 143
{ok, NodeId} = ejabberd_node_id:node_id(),
BinNodeId = integer_to_binary(NodeId),
case binary:split(ID, <<"_">>, []) of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this some kind of parse_iq_id next to make_iq_id, so that knowledge about the structured of this ID is encapsulated in two related and close-to-each-other functions, makes it a bit less magic.

But this still feels like a hack, so I'd say hacks are welcome but only if we really need one, still thinking about this one 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do parse_iq_id for sure.

yea, it is also used in pusher push to route a pubsub requests to a remote servers. D:
Which means that response could arrive from any s2s connection (meh).

So, the issue here is too generic API, which is now just writes data to other nodes which is not used :)

mod_caps/mod_ping - just local routes.
pusher_push - remote calls (s2s is possible).

BTW, ejabberd_local means jids like servername (without user@ part).

That pusher_push code, will try to read it one more time:

    ResponseHandler =
    fun(_From, _To, FAcc, Response) ->
            mod_event_pusher_push:cast(Host, fun handle_publish_response/5,
                                       [Host, To, PubsubJID, Node, Response]),
            FAcc
    end,
    %% The IQ is routed from the recipient's server JID to pubsub JID
    %% This is recommended in the XEP and also helps process replies to this IQ
    NotificationFrom = jid:make(<<>>, Host, <<>>),
    mod_event_pusher_push:cast(Host, fun ejabberd_local:route_iq/5,
                               [NotificationFrom, PubsubJID, Acc, Stanza, ResponseHandler]).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that's ejabberd_local, all right. So it would be good to have two mechanisms here, one for iqs that are entirely processed in the current c2s process, a different one for iqs processed far away 🤔 How's the re-read for the pusher_push?
Nevertheless, maybe that would be too complicated. At the very least I'd insist in that make_iq_iq/parse_iq_id encapsulation and that might do just fine as a first iteration on this topic.

@NelsonVides
Copy link
Collaborator

Yeah, I like the idea of one less mnesia here, I didn't even know ping was replicating all its handlers across all nodes, that seems like a waaaaaaste.

But I'd also say, if we don't have any module doing rpcs nor using this cluster logic, we can throw away that code to begin with. It feels a bit like a hack to have those prefixes in the ID and to use rpcs, and also, I imagine ejabberd_local to do local stuff, non-clustered logic, I'd make ejabberd_sm a bit fatter for that instead. When we need distributed iqs, we'll implement it, until then, less hacky code to maintain.

@mongoose-im
Copy link
Collaborator

mongoose-im commented Mar 22, 2022

riak_mnesia_24 / riak_mnesia / f476beb
Reports root


small_tests_24 / small_tests / f476beb
Reports root / small


small_tests_23 / small_tests / f476beb
Reports root / small


dynamic_domains_mysql_redis_24 / mysql_redis / f476beb
Reports root/ big
OK: 2706 / Failed: 0 / User-skipped: 256 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / f476beb
Reports root/ big
OK: 2723 / Failed: 0 / User-skipped: 239 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / f476beb
Reports root/ big
OK: 2723 / Failed: 0 / User-skipped: 239 / Auto-skipped: 0


ldap_mnesia_23 / ldap_mnesia / f476beb
Reports root/ big
OK: 1512 / Failed: 0 / User-skipped: 390 / Auto-skipped: 0


internal_mnesia_24 / internal_mnesia / f476beb
Reports root/ big
OK: 1553 / Failed: 0 / User-skipped: 349 / Auto-skipped: 0


ldap_mnesia_24 / ldap_mnesia / f476beb
Reports root/ big
OK: 1493 / Failed: 5 / User-skipped: 348 / Auto-skipped: 65

connect_SUITE:just_tls:feature_order:tls_authenticate
{error,{{badrpc,timeout},
    [{escalus_rpc,call_with_cookie_match,
            [mongooseim@localhost,ejabberd_admin,register,
             [<<"secure_joe_tls_authenticate_540">>,<<"localhost">>,
            <<"break_me">>],
             3000,ejabberd],
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_rpc.erl"},
             {line,34}]},
     {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
     {escalus_ejabberd,create_users,2,
               [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_ejabberd.erl"},
              {line,211}]},
     {escalus_fresh,create_users,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,62}]},
     {escalus_fresh,create_fresh_user,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,112}]},
     {connect_SUITE,tls_authenticate,1,
            [{file,"/home/circleci/project/big_tests/tests/connect_SUITE.erl"},
             {line,478}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]}]}}

Report log

connect_SUITE:just_tls:feature_order:auth_bind_compression_session
{error,{{badrpc,timeout},
    [{escalus_rpc,call_with_cookie_match,
            [mongooseim@localhost,ejabberd_admin,register,
             [<<"secure_joe_auth_bind_compression_session_543">>,
            <<"localhost">>,<<"break_me">>],
             3000,ejabberd],
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_rpc.erl"},
             {line,34}]},
     {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
     {escalus_ejabberd,create_users,2,
               [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_ejabberd.erl"},
              {line,211}]},
     {escalus_fresh,create_users,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,62}]},
     {escalus_fresh,create_fresh_user,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,112}]},
     {connect_SUITE,auth_bind_compression_session,1,
            [{file,"/home/circleci/project/big_tests/tests/connect_SUITE.erl"},
             {line,516}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]}]}}

Report log

connect_SUITE:just_tls:feature_order:tls_authenticate_compression
{error,{{badrpc,timeout},
    [{escalus_rpc,call_with_cookie_match,
            [mongooseim@localhost,ejabberd_admin,register,
             [<<"secure_joe_tls_authenticate_compression_545">>,
            <<"localhost">>,<<"break_me">>],
             3000,ejabberd],
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_rpc.erl"},
             {line,34}]},
     {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
     {escalus_ejabberd,create_users,2,
               [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_ejabberd.erl"},
              {line,211}]},
     {escalus_fresh,create_users,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,62}]},
     {escalus_fresh,create_fresh_user,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,112}]},
     {connect_SUITE,tls_authenticate_compression,1,
            [{file,"/home/circleci/project/big_tests/tests/connect_SUITE.erl"},
             {line,468}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]}]}}

Report log

connect_SUITE:just_tls:feature_order:auth_compression_bind_session
{error,{{badrpc,timeout},
    [{escalus_rpc,call_with_cookie_match,
            [mongooseim@localhost,ejabberd_admin,register,
             [<<"secure_joe_auth_compression_bind_session_544">>,
            <<"localhost">>,<<"break_me">>],
             3000,ejabberd],
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_rpc.erl"},
             {line,34}]},
     {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
     {escalus_ejabberd,create_users,2,
               [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_ejabberd.erl"},
              {line,211}]},
     {escalus_fresh,create_users,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,62}]},
     {escalus_fresh,create_fresh_user,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,112}]},
     {connect_SUITE,auth_compression_bind_session,1,
            [{file,"/home/circleci/project/big_tests/tests/connect_SUITE.erl"},
             {line,506}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]}]}}

Report log

connect_SUITE:just_tls:feature_order:bind_server_generated_resource
{error,{{badrpc,timeout},
    [{escalus_rpc,call_with_cookie_match,
            [mongooseim@localhost,ejabberd_admin,register,
             [<<"secure_joe_bind_server_generated_resource_547">>,
            <<"localhost">>,<<"break_me">>],
             3000,ejabberd],
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_rpc.erl"},
             {line,34}]},
     {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
     {escalus_ejabberd,create_users,2,
               [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_ejabberd.erl"},
              {line,211}]},
     {escalus_fresh,create_users,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,62}]},
     {escalus_fresh,create_fresh_user,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_fresh.erl"},
             {line,112}]},
     {connect_SUITE,bind_server_generated_resource,1,
            [{file,"/home/circleci/project/big_tests/tests/connect_SUITE.erl"},
             {line,583}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]}]}}

Report log

gdpr_SUITE:init_per_suite
{'EXIT',{{badrpc,timeout},
     [{distributed_helper,rpc,
                [#{node => mongooseim@localhost},
                 mongoose_modules,ensure_started,
                 [<<"localhost">>,mod_muc,
                [{host,{prefix,<<"muc.">>}},
                 {backend,mnesia},
                 {hibernate_timeout,2000},
                 {hibernated_room_check_interval,1000},
                 {hibernated_room_timeout,2000},
                 {access,muc},
                 {access_create,muc_create}]]],
                [{file,"/home/circleci/project/big_tests/tests/distributed_helper.erl"},
                 {line,117}]},
      {muc_helper,load_muc,0,
            [{file,"/home/circleci/project/big_tests/tests/muc_helper.erl"},
             {line,53}]},
      {gdpr_SUITE,init_per_suite,1,
            [{file,"/home/circleci/project/big_tests/tests/gdpr_SUITE.erl"},
             {line,189}]},
      {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
      {test_server,run_test_case_eval1,6,
             [{file,"test_server.erl"},{line,1380}]},
      {test_server,run_test_case_eval,9,
             [{file,"test_server.erl"},{line,1224}]}]}}

Report log


elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / f476beb
Reports root/ big
OK: 1687 / Failed: 0 / User-skipped: 369 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / f476beb
Reports root/ big
OK: 3097 / Failed: 0 / User-skipped: 248 / Auto-skipped: 0


pgsql_mnesia_23 / pgsql_mnesia / f476beb
Reports root/ big
OK: 3097 / Failed: 0 / User-skipped: 248 / Auto-skipped: 0


mssql_mnesia_24 / odbc_mssql_mnesia / f476beb
Reports root/ big
OK: 3097 / Failed: 0 / User-skipped: 248 / Auto-skipped: 0


mysql_redis_24 / mysql_redis / f476beb
Reports root/ big
OK: 3092 / Failed: 0 / User-skipped: 253 / Auto-skipped: 0


ldap_mnesia_24 / ldap_mnesia / f476beb
Reports root/ big
OK: 1512 / Failed: 0 / User-skipped: 390 / Auto-skipped: 0


riak_mnesia_24 / riak_mnesia / f476beb
Reports root/ big
OK: 1692 / Failed: 0 / User-skipped: 364 / Auto-skipped: 0

Copy link
Collaborator

@NelsonVides NelsonVides left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One less mnesia! 💪🏽

{ok, Module, Function};
case ets:lookup(?IQRESPONSE, ID) of
[{ID, Callback, TRef}] ->
erlang:cancel_timer(TRef),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think, the ping IQs timeouts are handled twice here... c2s also registers its own. Well This PR solves not using mnesia, which is a huge improvement, and everything else actually doesn't change, so it seems fine. But hell, there's an important tech debt waiting here. Let's keep the PRs small and work on that on the next one.

@NelsonVides NelsonVides merged commit bda07f6 into master Mar 22, 2022
@NelsonVides NelsonVides deleted the mu-local-iq-response-without-mnesia branch March 22, 2022 15:05
@Premwoik Premwoik added this to the 5.1.0 milestone May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants