Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inbox/async removes #3616

Merged
merged 14 commits into from
Apr 1, 2022
Merged

Inbox/async removes #3616

merged 14 commits into from
Apr 1, 2022

Conversation

NelsonVides
Copy link
Collaborator

@NelsonVides NelsonVides commented Mar 29, 2022

Building on top of #3608 and #3611

This PR fixes the fact that the asynchronous inbox is currently dangerously inconsistent. If on two nodes one writes new messages to a user's conversation with a room, and in another node that user is removed from the room, the order in which those events are flushed into the DB is now not ensured, and the risk of the removal being flushed before the insert, and therefore the insert recovering a conversation forever, is much higher than in the synchronous inbox.

With the concept of boxes, the async backend can instead have a deleted box, called bin, so that removals aren't really removals, but just moves to another box, which are atomic and can interleave with modifications to that entry's other columns. Then asynchronously the deleted boxes can be collected, much like email clears the Bin after so many days.

Later on #3619 builds on top of this one.

It also implements three different mechanisms to clean the inbox bin after the asynchronous backend: an XMPP endpoint for the user, two REST endpoints for per-user and global cleanups, and an automatic collector working with timeouts.

@codecov
Copy link

codecov bot commented Mar 29, 2022

Codecov Report

Merging #3616 (5c359e0) into master (7cbb30d) will decrease coverage by 0.03%.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master    #3616      +/-   ##
==========================================
- Coverage   80.96%   80.92%   -0.04%     
==========================================
  Files         424      426       +2     
  Lines       32061    32126      +65     
==========================================
+ Hits        25958    25998      +40     
- Misses       6103     6128      +25     
Impacted Files Coverage Δ
src/mod_commands.erl 93.08% <ø> (ø)
src/mongoose_api_admin.erl 83.05% <ø> (ø)
src/mongoose_collector.erl 53.84% <53.84%> (ø)
src/inbox/mod_inbox_rdbms_async.erl 66.66% <68.96%> (-2.83%) ⬇️
src/inbox/mod_inbox.erl 86.80% <88.00%> (-0.37%) ⬇️
src/inbox/mod_inbox_backend.erl 100.00% <100.00%> (ø)
src/inbox/mod_inbox_commands.erl 100.00% <100.00%> (ø)
src/inbox/mod_inbox_entries.erl 97.64% <100.00%> (+0.02%) ⬆️
src/inbox/mod_inbox_muc.erl 91.17% <100.00%> (ø)
src/inbox/mod_inbox_muclight.erl 100.00% <100.00%> (ø)
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7cbb30d...5c359e0. Read the comment docs.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im
Copy link
Collaborator

mongoose-im commented Mar 29, 2022

small_tests_24 / small_tests / b7b8b54
Reports root / small


small_tests_23 / small_tests / b7b8b54
Reports root / small


dynamic_domains_mysql_redis_24 / mysql_redis / b7b8b54
Reports root/ big
OK: 2833 / Failed: 0 / User-skipped: 150 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / b7b8b54
Reports root/ big
OK: 2850 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / b7b8b54
Reports root/ big
OK: 2850 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / b7b8b54
Reports root/ big
OK: 2850 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


ldap_mnesia_24 / ldap_mnesia / b7b8b54
Reports root/ big
OK: 1507 / Failed: 0 / User-skipped: 399 / Auto-skipped: 0


ldap_mnesia_23 / ldap_mnesia / b7b8b54
Reports root/ big
OK: 1507 / Failed: 0 / User-skipped: 399 / Auto-skipped: 0


internal_mnesia_24 / internal_mnesia / b7b8b54
Reports root/ big
OK: 1548 / Failed: 0 / User-skipped: 358 / Auto-skipped: 0


elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / b7b8b54
Reports root/ big
OK: 1855 / Failed: 0 / User-skipped: 366 / Auto-skipped: 0


mysql_redis_24 / mysql_redis / b7b8b54
Reports root/ big
OK: 3219 / Failed: 0 / User-skipped: 147 / Auto-skipped: 0


pgsql_mnesia_23 / pgsql_mnesia / b7b8b54
Reports root/ big
OK: 3224 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / b7b8b54
Reports root/ big
OK: 3224 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


mssql_mnesia_24 / odbc_mssql_mnesia / b7b8b54
Reports root/ big
OK: 3224 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


riak_mnesia_24 / riak_mnesia / b7b8b54
Reports root/ big
OK: 1698 / Failed: 0 / User-skipped: 365 / Auto-skipped: 0

Base automatically changed from async_rdbms_order to master March 30, 2022 11:28
@NelsonVides NelsonVides marked this pull request as ready for review March 30, 2022 11:45
@NelsonVides NelsonVides mentioned this pull request Mar 31, 2022
Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good in general. I have a few general remarks:

  • There are no docs explaining what bin is and e.g. if messages from bin will be returned by default or not.
  • It is counter-intuitive for me that enabling the async writer implicitly changes the removal semantics in quite a significant way. The user should definitely know that from the docs. It is quite unexpected that e.g. if I switch off the async writer, suddenly my deleted messages stop landing in the bin.

big_tests/tests/inbox_SUITE.erl Outdated Show resolved Hide resolved
doc/modules/mod_inbox.md Outdated Show resolved Hide resolved
@NelsonVides
Copy link
Collaborator Author

The code looks good in general. I have a few general remarks:

* There are no docs explaining what `bin` is and e.g. if messages from bin will be returned by default or not.

* It is counter-intuitive for me that enabling the async writer implicitly changes the removal semantics in quite a significant way. The user should definitely know that from the docs. It is quite unexpected that e.g. if I switch off the async writer, suddenly my deleted messages stop landing in the bin.

Huh, I think those points are answered precisely in the PR made on top of this one, here: #3619 so perhaps I should push them both into a single PR 🤔

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed most of the code and it looks fine, but I have some doubts about the tight coupling between the bin and the async writer. IMO the user would benefit from using the bin independently of the async writer. I also think that the REST commands should be put in a separate module and decoupled from the specific backend.

src/inbox/mod_inbox_rdbms_async.erl Outdated Show resolved Hide resolved
src/inbox/mod_inbox_rdbms_async.erl Outdated Show resolved Hide resolved
doc/modules/mod_inbox.md Outdated Show resolved Hide resolved
Parallel to how the exml is parsed on backend code, we also encode the
exml packet in the backend code, for several reasons: consistency, being
able to modify the payload in custom code before is put on the db, and,
performance, as, if the entry is aggregated, the payload didn't need to
be encoded, and such operation can be skipped.
This fixes the fact that the asynchronous inbox is currently dangerously
inconsistent. If on two nodes one writes new messages to a user's
conversation with a room, and in another node that user is removed from
the room, the order in which those events are flushed into the DB is now
not ensured, and the risk of the removal being flushed before the
insert, and therefore the insert recovering a conversation forever, is
much higher than in the synchronous inbox.

With the concept of boxes, the async backend can instead have a deleted
box, called bin, so that removals aren't really removals, but just moves
to another box, which are atomic and can interleave with modifications
to that entry's other columns. Then asynchronously the deleted boxes can
be collected, much like email clears the Bin after so many days.
Note that the handle_result callback in the API stack had to be
modified. The HTTP standard allows to return a message body for a DELETE
call under the status code 200-OK, so we want to allow for such
possibility in the code. Two old delete operations needed to be updated
though, as they were inconsistently registering they'd return a message
that was then ignored.
Note specially the partial index for postgres, which allows us to keep
very small indexes for all rows that jump into the bin box. Smaller
index means faster performance and faster lookups for the bin box.
@mongoose-im

This comment was marked as outdated.

@mongoose-im
Copy link
Collaborator

mongoose-im commented Apr 1, 2022

small_tests_24 / small_tests / 72a01cc
Reports root / small


small_tests_23 / small_tests / 72a01cc
Reports root / small


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 72a01cc
Reports root/ big
OK: 2858 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 72a01cc
Reports root/ big
OK: 2858 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_mysql_redis_24 / mysql_redis / 72a01cc
Reports root/ big
OK: 2853 / Failed: 1 / User-skipped: 150 / Auto-skipped: 0

muc_SUITE:hibernation:hibernated_room_can_be_queried_for_archive
{error,{{assertion_failed,assert,is_groupchat_message,
              [<<"Restorable message">>],
              undefined,"undefined"},
    [{escalus_new_assert,assert_true,2,
               [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_new_assert.erl"},
                {line,84}]},
     {muc_SUITE,wait_for_mam_result,3,
          [{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
           {line,4383}]},
     {muc_SUITE,'-hibernated_room_can_be_queried_for_archive/1-fun-0-',3,
          [{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
           {line,4124}]},
     {escalus_story,story,4,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
             {line,72}]},
     {muc_SUITE,hibernated_room_can_be_queried_for_archive,1,
          [{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
           {line,4120}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]},
     {test_server,run_test_case_eval,9,
            [{file,"test_server.erl"},{line,1224}]}]}}

Report log


ldap_mnesia_23 / ldap_mnesia / 72a01cc
Reports root/ big
OK: 1506 / Failed: 0 / User-skipped: 401 / Auto-skipped: 0


ldap_mnesia_24 / ldap_mnesia / 72a01cc
Reports root/ big
OK: 1506 / Failed: 0 / User-skipped: 401 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 72a01cc
Reports root/ big
OK: 2866 / Failed: 1 / User-skipped: 133 / Auto-skipped: 0

disco_and_caps_SUITE:disco_with_caps:user_can_query_friend_resources
{error,{{assertion_failed,assert_many,false,[is_roster_set],[],[]},
    [{escalus_new_assert,assert_true,2,
               [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_new_assert.erl"},
                {line,84}]},
     {escalus_story,'-make_all_clients_friends/1-fun-0-',2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
             {line,108}]},
     {escalus_utils,'-each_with_index/3-fun-0-',3,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_utils.erl"},
             {line,87}]},
     {lists,foldl,3,[{file,"lists.erl"},{line,1267}]},
     {escalus_utils,'-each_with_index/3-fun-0-',3,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_utils.erl"},
             {line,87}]},
     {lists,foldl,3,[{file,"lists.erl"},{line,1267}]},
     {escalus_utils,distinct_pairs,2,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_utils.erl"},
             {line,60}]},
     {escalus_story,make_all_clients_friends,1,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
             {line,106}]}]}}

Report log


internal_mnesia_24 / internal_mnesia / 72a01cc
Reports root/ big
OK: 1547 / Failed: 0 / User-skipped: 360 / Auto-skipped: 0


pgsql_mnesia_23 / pgsql_mnesia / 72a01cc
Reports root/ big
OK: 3232 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 72a01cc
Reports root/ big
OK: 1854 / Failed: 0 / User-skipped: 368 / Auto-skipped: 0


mysql_redis_24 / mysql_redis / 72a01cc
Reports root/ big
OK: 3227 / Failed: 0 / User-skipped: 147 / Auto-skipped: 0


mssql_mnesia_24 / odbc_mssql_mnesia / 72a01cc
Reports root/ big
OK: 3232 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


riak_mnesia_24 / riak_mnesia / 72a01cc
Reports root/ big
OK: 1697 / Failed: 0 / User-skipped: 367 / Auto-skipped: 0

Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good 👍
Do you think we might enable using the bin with the synchronous writer in the future? Because I see it is unsupported now.

src/inbox/mod_inbox.erl Outdated Show resolved Hide resolved
@NelsonVides
Copy link
Collaborator Author

It looks good 👍 Do you think we might enable using the bin with the synchronous writer in the future? Because I see it is unsupported now.

I think it is enabled already, it is added in the process_inbox_boxes callback together with the archive and inbox names, regardless of the backend 🤔
There's just no automatic movement of entries to the bin, but rather proper deletes. I didn't want to introduce the moves in the regular backend yet, for easier migrations, if somebody's not using the asynchronous writer he shouldn't need to change anything to how his deployment is working (like adding logic to clean the bin on the frontend or using the API). But maybe later we can enforce the moves globally, dunno.

@mongoose-im
Copy link
Collaborator

mongoose-im commented Apr 1, 2022

small_tests_24 / small_tests / 5c359e0
Reports root / small


small_tests_23 / small_tests / 5c359e0
Reports root / small


dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 5c359e0
Reports root/ big
OK: 2858 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 5c359e0
Reports root/ big
OK: 2858 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_mysql_redis_24 / mysql_redis / 5c359e0
Reports root/ big
OK: 2841 / Failed: 0 / User-skipped: 150 / Auto-skipped: 0


ldap_mnesia_23 / ldap_mnesia / 5c359e0
Reports root/ big
OK: 1506 / Failed: 0 / User-skipped: 401 / Auto-skipped: 0


ldap_mnesia_24 / ldap_mnesia / 5c359e0
Reports root/ big
OK: 1506 / Failed: 0 / User-skipped: 401 / Auto-skipped: 0


internal_mnesia_24 / internal_mnesia / 5c359e0
Reports root/ big
OK: 1547 / Failed: 0 / User-skipped: 360 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 5c359e0
Reports root/ big
OK: 2858 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / 5c359e0
Reports root/ big
OK: 3232 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


pgsql_mnesia_23 / pgsql_mnesia / 5c359e0
Reports root/ big
OK: 3232 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 5c359e0
Reports root/ big
OK: 1854 / Failed: 0 / User-skipped: 368 / Auto-skipped: 0


mysql_redis_24 / mysql_redis / 5c359e0
Reports root/ big
OK: 3239 / Failed: 1 / User-skipped: 147 / Auto-skipped: 0

muc_SUITE:hibernation:hibernated_room_can_be_queried_for_archive
{error,{{assertion_failed,assert,is_groupchat_message,
              [<<"Restorable message">>],
              undefined,"undefined"},
    [{escalus_new_assert,assert_true,2,
               [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_new_assert.erl"},
                {line,84}]},
     {muc_SUITE,wait_for_mam_result,3,
          [{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
           {line,4383}]},
     {muc_SUITE,'-hibernated_room_can_be_queried_for_archive/1-fun-0-',3,
          [{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
           {line,4124}]},
     {escalus_story,story,4,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
             {line,72}]},
     {muc_SUITE,hibernated_room_can_be_queried_for_archive,1,
          [{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
           {line,4120}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]},
     {test_server,run_test_case_eval,9,
            [{file,"test_server.erl"},{line,1224}]}]}}

Report log


mssql_mnesia_24 / odbc_mssql_mnesia / 5c359e0
Reports root/ big
OK: 3232 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


riak_mnesia_24 / riak_mnesia / 5c359e0
Reports root/ big
OK: 1697 / Failed: 0 / User-skipped: 367 / Auto-skipped: 0

@chrzaszcz chrzaszcz merged commit a4606d5 into master Apr 1, 2022
@chrzaszcz chrzaszcz deleted the inbox/async_removes branch April 1, 2022 14:24
@Premwoik Premwoik added this to the 5.1.0 milestone May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants