Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible IPA versions, pausing replication #2353

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
6 participants
@tbordaz
Copy link
Contributor

commented Sep 11, 2018

Bug Description:
The bug is not systematic. Symptoms are replication broken and this message
ERR - repl_version_plugin_recv_acquire_cb - [file ipa_repl_version.c, line 119]: Incompatible IPA versions, pausing replication
IPA Version Replication is a plugin that checks replication version (payload).
It does the checking in the early phase of replication protocol (acquire_replica).
The plugin must be enabled on both side, or disabled on both side.
If we have a mixed setting (i.e. disabled on Master and enabled on Replica), there is
a risk that the Master sends an incomplete payload. The incomplete payload will be rejected
by the Replica and replication in both direction will fail.
The risk is that on the master, the plugin got initialized but not started.
Initialization of the plugin registers a broker API that allows a plugin (replication) to call
a callback even if the plugin is not started.
The problem is that to build valid payload the 'IPA Version Replication' needs to be started
(data_version to be initialized).

Fix Description:
The fix consist in:
- define data_version during plugin initialization
- Make plugin callback as noop if the plugin is not started. This is just for perf.
- Prevent to enable the 'IPA Version Replication' when a replica is created.

https://pagure.io/freeipa/issue/7690

@tbordaz tbordaz added the WIP label Sep 11, 2018

@flo-renaud flo-renaud added the re-run label Sep 11, 2018

@flo-renaud

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2018

Added the re-run label to trigger PR-CI as tbordaz is not in the whitelist yet.

@rcritten

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2018

The test_vault case is failing to install a replica with: [Error (49) - LDAP error: Invalid credentials]

@@ -321,9 +320,6 @@ def run(self):

logger.info("Preparing replica for %s from %s",
self.replica_fqdn, api.env.host)
enable_replication_version_checking(
api.env.realm,
self.dirman_password)

This comment has been minimized.

Copy link
@rcritten

rcritten Sep 11, 2018

Contributor

The whole point of this is to provide a relief valve in the case where we make some breaking architectural change in IPA so that one version can't replicate with another. If the check is not done at install time this is pointless, right?

This comment has been minimized.

Copy link
@tbordaz

tbordaz Sep 11, 2018

Author Contributor

Currently the mechanism checks at each replication session, that versions are compatible. So it is not only at install. In addition I am not sure this mechanism would prevent install between incompatible versions.
If we want to check compatible version we can also add a version entry/attribute in the DIT instead of breaking replication

This comment has been minimized.

Copy link
@rcritten

rcritten Sep 11, 2018

Contributor

I don't mean to imply it is only at install but if it isn't checked at install time then the installation could be successful only to fail at the first replication event right?

We don't have to stick with this replication version but it was, as I understood it, the best or only way to do this years ago.

This comment has been minimized.

Copy link
@tbordaz

tbordaz Sep 12, 2018

Author Contributor

according to https://bugzilla.redhat.com/show_bug.cgi?id=1623761#c12, the problem is dynamic and transient. So it is quite possible that even if the problem occurs during install, replication retries and install script does not detect the error. During runtime, retry occurs and replication looks just fine.

Repl_version is a nice piece of code and I think the intention was to prevent incompatible versions to pollute each other. So it is enforcing compatibility in a stronger way than uniq/regular version checking from DIT entry. Now being a plugin there is always the risk that it is disabled on one/both side and version not checked.

If we want to keep this ability to prevent a new version to break an old one (or the opposite), we should keep this plugin enabled.

This comment has been minimized.

Copy link
@tbordaz

tbordaz Sep 14, 2018

Author Contributor

@rcritten , I have a mixed feeling regarding the repl_version checking.

This plugin was created to prevent incompatible versions to break each other. So it is addressing rare condition that never occurred so far and now we are willing to disable it by default.

Now if we want to keep such safety mechanism . It should be enabled on most or all instances. incompatibility checking requires that the both sides have it enabled.

In short, currently repl_version is disabled on Master and enabled on Replica. This patch is disabling it on Replica. I wonder if in the opposite it should not enable it as well on master.

@tbordaz tbordaz force-pushed the tbordaz:ticket-7690 branch 2 times, most recently from eea30fc to 40edc91 Sep 11, 2018

@tbordaz tbordaz removed the WIP label Sep 11, 2018

Incompatible IPA versions, pausing replication
Bug Description:
    The bug is not systematic. Symptoms are replication broken and this message
    ERR - repl_version_plugin_recv_acquire_cb - [file ipa_repl_version.c, line 119]: Incompatible IPA versions, pausing replication
    IPA Version Replication is a plugin that checks replication version (payload).
    It does the checking in the early phase of replication protocol (acquire_replica).
    The plugin must be enabled on both side, or disabled on both side.
    If we have a mixed setting (i.e. disabled on Master and enabled on Replica), there is
    a risk that the Master sends an incomplete payload. The incomplete payload will be rejected
    by the Replica and replication in both direction will fail.
    The risk is that on the master, the plugin got initialized but not started.
    Initialization of the plugin registers a broker API that allows a plugin (replication) to call
    a callback even if the plugin is not started.
    The problem is that to build valid payload the 'IPA Version Replication' needs to be started
    (data_version to be initialized).

Fix Description:
    The fix consist in:
        - define data_version during plugin initialization
        - Make plugin callback as noop if the plugin is not started. This is just for perf.
        - Prevent to enable the 'IPA Version Replication' when a replica is created.

https://pagure.io/freeipa/issue/7690

@tbordaz tbordaz force-pushed the tbordaz:ticket-7690 branch from 40edc91 to 18a9b8f Sep 14, 2018

@flo-renaud flo-renaud added re-run and removed needs rebase labels Sep 14, 2018

@freeipa-pr-ci freeipa-pr-ci removed the re-run label Sep 14, 2018

@flo-renaud flo-renaud added the re-run label Sep 17, 2018

@flo-renaud

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2018

Adding re-run as fedora-28/dnssec — RunPytest Failed to publish artifacts

@tiran

This comment has been minimized.

Copy link
Member

commented Nov 8, 2018

I removed needsreview label. The change needs a deeper technical discussion between @rcritten and @tbordaz first.

@fcami fcami added the re-run label May 7, 2019

@freeipa-pr-ci freeipa-pr-ci removed the re-run label May 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.