Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debugging #1856

Closed
Closed

Conversation

kanes115
Copy link
Contributor

DO NOT MERGE
it is created just to be able to run it in container

@kanes115 kanes115 changed the base branch from master to improve_reload_cluster_v2 May 18, 2018 07:50
@codecov
Copy link

codecov bot commented May 18, 2018

Codecov Report

Merging #1856 into improve_reload_cluster_v2 will decrease coverage by 4.27%.
The diff coverage is 5.12%.

Impacted file tree graph

@@                      Coverage Diff                      @@
##           improve_reload_cluster_v2    #1856      +/-   ##
=============================================================
- Coverage                      74.38%   70.11%   -4.28%     
=============================================================
  Files                            290      298       +8     
  Lines                          26949    28000    +1051     
=============================================================
- Hits                           20045    19631     -414     
- Misses                          6904     8369    +1465
Impacted Files Coverage Δ
src/ejabberd_config.erl 59.4% <5.12%> (-5.24%) ⬇️
src/mam_message_xml.erl 0% <0%> (-100%) ⬇️
src/mongoose_riak_sup.erl 0% <0%> (-100%) ⬇️
src/mod_offline_riak.erl 0% <0%> (-96.43%) ⬇️
src/mod_last_riak.erl 0% <0%> (-95.24%) ⬇️
src/mod_roster_riak.erl 0% <0%> (-94.12%) ⬇️
src/mod_vcard_riak.erl 0% <0%> (-91.53%) ⬇️
src/mod_mam_riak_timed_arch_yz.erl 0% <0%> (-87.97%) ⬇️
src/mongoose_riak.erl 10.63% <0%> (-85.11%) ⬇️
src/mod_mam_cassandra_arch.erl 0% <0%> (-84.68%) ⬇️
... and 97 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95e017e...084221c. Read the comment docs.

@kanes115 kanes115 force-pushed the improve_reload_cluster_debug branch from 235af1b to f75f558 Compare May 18, 2018 08:32
@kanes115
Copy link
Contributor Author

kanes115 commented May 21, 2018

I managed to reproduce the problem. There occurs an error when reloading right after reloading config in which there has been one option's value changed. This option was somewhere in global_distirb, it was of the kind that should not be different between nodes.

EDIT: Ok, not always happens

EDIT: Can't reproduce anymore, happened once

@kanes115
Copy link
Contributor Author

kanes115 commented May 21, 2018

It seems that reproduction steps are:
Assumptions:
node 1 - local node
node 2 - remote node

  1. change config on node 2 (choose option that is stored in memory after parsing from config file, my case: {connections_per_endpoint, 10})
  2. reload_local on node 2 (we make in-memory configs differ)
  3. change option from point 1 to the previous value
  4. reload_cluster from node 1 (probably optional step)
  5. reload_local on node 2 (we make the in-memory configs the same again)
  6. reload_cluster from node 1, it should now reload successfully (but it doesn't)

diff local_host_config remote_host_config-mongooseim\@c1-mim-2.vidyo (interesting part of output, also it is from branch improve_reload_cluster_debug_sorted where config is sorted before it's saved):

>    [mod_global_distrib_bounce,
>     [[3,max_retries],
>      [300,resend_after_ms],
>      [bounce,[[3,max_retries],[300,resend_after_ms]]],
>      [cache,[[60,domain_lifetime_seconds]]],
>      [connections,
>       [[10,connections_per_endpoint],
>        [advertised_endpoints,[[5555,"--.12cdiimmovy"]]],
>        [endpoints,[[5555,"...0000"]]],
>        [tls_opts,
>         [[cafile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>          [certfile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>          [ciphers,
>           "----------------------------111122222222222233334444555555556666666688888888:::::::AAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEGGGGHHHHHHHHHHHHHHHHMMMMRRRRSSSSSSSSSSSSSSSSSSSSSSSS"],
>          [dhfile,".//___addeeeefhiiklmopprrrsssvvvy"],
>          [protocol_options,
>           ["11123_______cceeeeeeefhillllnnnnnoooopprrrrrsssssssttvvvvv||||"]]]]]],
>      [global_host,".diimmovy"],
>      [local_host,"-.1cdiimmovy"],
>      [redis,[[24,pool_size],[server,"...01111227"]]]]],
>    [mod_global_distrib_disco,
>     [[bounce,[[3,max_retries],[300,resend_after_ms]]],
>      [cache,[[60,domain_lifetime_seconds]]],
>      [connections,
>       [[10,connections_per_endpoint],
>        [advertised_endpoints,[[5555,"--.12cdiimmovy"]]],
>        [endpoints,[[5555,"...0000"]]],
>        [tls_opts,
>         [[cafile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>          [certfile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>          [ciphers,
>           "----------------------------111122222222222233334444555555556666666688888888:::::::AAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEGGGGHHHHHHHHHHHHHHHHMMMMRRRRSSSSSSSSSSSSSSSSSSSSSSSS"],
>          [dhfile,".//___addeeeefhiiklmopprrrsssvvvy"],
>          [protocol_options,
>           ["11123_______cceeeeeeefhillllnnnnnoooopprrrrrsssssssttvvvvv||||"]]]]]],
>      [global_host,".diimmovy"],
>      [local_host,"-.1cdiimmovy"],
>      [redis,[[24,pool_size],[server,"...01111227"]]]]],
>    [mod_global_distrib_mapping,
>     [[60,domain_lifetime_seconds],
>    [mod_global_distrib_receiver,
>     [[10,connections_per_endpoint],
>      [advertised_endpoints,[[5555,"--.12cdiimmovy"]]],
>      [bounce,[[3,max_retries],[300,resend_after_ms]]],
>      [cache,[[60,domain_lifetime_seconds]]],
>      [connections,
>       [[10,connections_per_endpoint],
>        [advertised_endpoints,[[5555,"--.12cdiimmovy"]]],
>        [endpoints,[[5555,"...0000"]]],
>        [tls_opts,
>         [[cafile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>          [certfile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>          [ciphers,
>           "----------------------------111122222222222233334444555555556666666688888888:::::::AAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEGGGGHHHHHHHHHHHHHHHHMMMMRRRRSSSSSSSSSSSSSSSSSSSSSSSS"],
>          [dhfile,".//___addeeeefhiiklmopprrrsssvvvy"],
>          [protocol_options,
>           ["11123_______cceeeeeeefhillllnnnnnoooopprrrrrsssssssttvvvvv||||"]]]]]],
>      [endpoints,[[5555,"...0000"]]],
>      [global_host,".diimmovy"],
>      [local_host,"-.1cdiimmovy"],
>      [redis,[[24,pool_size],[server,"...01111227"]]],
>      [tls_opts,
>       [[cafile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>        [certfile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>        [ciphers,
>         "----------------------------111122222222222233334444555555556666666688888888:::::::AAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEGGGGHHHHHHHHHHHHHHHHMMMMRRRRSSSSSSSSSSSSSSSSSSSSSSSS"],
>        [dhfile,".//___addeeeefhiiklmopprrrsssvvvy"],
>        [protocol_options,
>         ["11123_______cceeeeeeefhillllnnnnnoooopprrrrrsssssssttvvvvv||||"]]]]]],
>    [mod_global_distrib_sender,
>     [[10,connections_per_endpoint],
>      [advertised_endpoints,[[5555,"--.12cdiimmovy"]]],
>      [bounce,[[3,max_retries],[300,resend_after_ms]]],
>      [cache,[[60,domain_lifetime_seconds]]],
>      [connections,
>       [[10,connections_per_endpoint],
>        [advertised_endpoints,[[5555,"--.12cdiimmovy"]]],
>        [endpoints,[[5555,"...0000"]]],
>        [tls_opts,
>         [[cafile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>          [certfile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>          [ciphers,
>           "----------------------------111122222222222233334444555555556666666688888888:::::::AAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEGGGGHHHHHHHHHHHHHHHHMMMMRRRRSSSSSSSSSSSSSSSSSSSSSSSS"],
>          [dhfile,".//___addeeeefhiiklmopprrrsssvvvy"],
>          [protocol_options,
>           ["11123_______cceeeeeeefhillllnnnnnoooopprrrrrsssssssttvvvvv||||"]]]]]],
>      [endpoints,[[5555,"...0000"]]],
>      [global_host,".diimmovy"],
>      [local_host,"-.1cdiimmovy"],
>      [redis,[[24,pool_size],[server,"...01111227"]]],
>      [tls_opts,
>       [[cafile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>        [certfile,".//___aacdeeffhiiiklllmnopprssuvvy"],
>        [ciphers,
>         "----------------------------111122222222222233334444555555556666666688888888:::::::AAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEGGGGHHHHHHHHHHHHHHHHMMMMRRRRSSSSSSSSSSSSSSSSSSSSSSSS"],
>        [dhfile,".//___addeeeefhiiklmopprrrsssvvvy"],
>        [protocol_options,
>         ["11123_______cceeeeeeefhillllnnnnnoooopprrrrrsssssssttvvvvv||||"]]]]]],

But it involves reload_local which is said to be dangerous and that it can desynchronise nodes.

@michalwski
Copy link
Contributor

@kanes115 @fenek can we close this PR? I understand the issue was solved by #1948. Is that right?

@michalwski michalwski closed this Jul 6, 2018
@michalwski michalwski deleted the improve_reload_cluster_debug branch July 6, 2018 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants