Fix IndexAuditTrail rolling restart on rollover edge #35988

albertzaharovits · 2018-11-28T10:28:15Z

This fixes two independent bugs , both tripping integ test failures.
They are both facilitated by the rolling nature of the audit index. Moreover, they
will both manifest only during a rolling upgrade executed while the audit index
rolls over.

The bugs:

While rolling upgrading, the master has the responsibility to install the new index template. Non-master new nodes (upgraded before the master) should hold off from creating new indices, because of rolling over, until the new template is installed. Because of a backwards condition, onOrAfter, they will fail to do that and use the index template for the old nodes with the event data of the new ones.
Also while rolling upgrading, non-master new nodes can wait indefinitely for a mapping update. The mapping update for the audit index is also the responsibility of the master node. However the master can fail to update the mapping for a foregoing index ( for which other new nodes might have events) and only upgrade it for the following index. Precisely on clock edge, a master with no events pipelined will only try to update the new index.

Closes #33867

…ess never does.

elasticmachine · 2018-11-28T10:28:16Z

Pinging @elastic/es-security

albertzaharovits · 2018-11-28T10:31:16Z

...gin/security/src/main/java/org/elasticsearch/xpack/security/audit/index/IndexAuditTrail.java

@@ -277,7 +277,7 @@ private boolean canStart(ClusterState clusterState) {
        }

        if (TemplateUtils.checkTemplateExistsAndVersionMatches(INDEX_TEMPLATE_NAME, SECURITY_VERSION_STRING,
-                clusterState, logger, Version.CURRENT::onOrAfter) == false) {
+                clusterState, logger, Version.CURRENT::onOrBefore) == false) {


bug 1.
new nodes, with a greater version, will not be stopped to create a rolled over index using an older template.

Hmm, maybe I am misunderstanding but I don't follow the logic.

template does not exist so we should not start - this makes sense

template exists but is onOrBefore the current version of the node. On a new node, say 6.5.1, this means that if the version in the template is <= 6.5.1 that we start. Why would we start with a older template?

I think the scenario you're describing is just the scenario for onOrAfter:

public boolean onOrAfter(Version version) { return version.id <= id; }

In this case id is 6.5.1 and version.id is the version of the template, because the predicate is Version.CURRENT::onOrAfter .

As another anchor, further down in the function:

Version.fromString(versionString).onOrAfter(Version.CURRENT)

is correct, because it has the template version on the left of CURRENT.

You don't want to know for how long I had been banging my head on it....

Gah! This is not easy to understand. For my sanity:

Version.CURRENT on new node = 6.5.1
template version = 6.5.0

Using onOrAfter would be:
6.5.0 <= 6.5.1 = true

Using onOrBefore would be:
6.5.0 >= 6.5.1 = false

albertzaharovits · 2018-11-28T10:33:07Z

...gin/security/src/main/java/org/elasticsearch/xpack/security/audit/index/IndexAuditTrail.java

@@ -381,7 +390,7 @@ void updateCurrentIndexMappingsIfNecessary(ClusterState state) {
            IndexMetaData indexMetaData = indices.get(0);
            MappingMetaData docMapping = indexMetaData.mapping("doc");
            if (docMapping == null) {
-                if (indexToRemoteCluster || state.nodes().isLocalNodeElectedMaster()) {
+                if (indexToRemoteCluster || state.nodes().isLocalNodeElectedMaster() || hasStaleMessage()) {
                    putAuditIndexMappingsAndStart(index);


bug 2.
master will only update the next rolled over index, if there are no events (on master) for the old index. However, there could be events of new nodes for the old index, whose template will never be updated.

I'd rather have the master issue a mapping update for the previous index or indices. What do you think?

I've been entertaining this idea too. I've chosen this alternative because the code is much simpler. In the obverse, the previous index might not exist or the rollover setting might have changed during the restart, which I think is too complex to code in an already complex mayhem.
I think the surface for the dreaded cluster update storm is acceptable, as there is a very good chance the non-master will not be in sync during the repeated start retrials.

albertzaharovits · 2018-11-29T14:15:52Z

If you're here because of the silly commit message I blame github for failing to keep the merge message on retrying a failed squash and merge. 🙏

This fixes two independent bugs , both tripping integ test failures. They are both facilitated by the rolling nature of the audit index. Moreover, they will both manifest only during a rolling upgrade executed while the audit index rolls over.

While most peoples' opinions change, the conviction of their correctn…

1f974ce

…ess never does.

albertzaharovits added >bug v7.0.0 :Security/Audit X-Pack Audit logging v6.6.0 labels Nov 28, 2018

albertzaharovits self-assigned this Nov 28, 2018

albertzaharovits requested a review from jaymode November 28, 2018 10:28

albertzaharovits commented Nov 28, 2018

View reviewed changes

jaymode approved these changes Nov 28, 2018

View reviewed changes

albertzaharovits merged commit 5eb7040 into elastic:master Nov 29, 2018

albertzaharovits deleted the fix_index_audit_upgrade_it branch November 29, 2018 14:12

albertzaharovits mentioned this pull request Feb 3, 2019

Fix IndexAuditTrail rolling upgrade on rollover edge - take 2 #38286

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix IndexAuditTrail rolling restart on rollover edge #35988

Fix IndexAuditTrail rolling restart on rollover edge #35988

albertzaharovits commented Nov 28, 2018 •

edited

elasticmachine commented Nov 28, 2018

albertzaharovits Nov 28, 2018

jaymode Nov 28, 2018

albertzaharovits Nov 28, 2018

jaymode Nov 28, 2018

albertzaharovits Nov 28, 2018

jaymode Nov 28, 2018

albertzaharovits Nov 28, 2018

albertzaharovits commented Nov 29, 2018 •

edited

Fix IndexAuditTrail rolling restart on rollover edge #35988

Fix IndexAuditTrail rolling restart on rollover edge #35988

Conversation

albertzaharovits commented Nov 28, 2018 • edited

elasticmachine commented Nov 28, 2018

albertzaharovits Nov 28, 2018

Choose a reason for hiding this comment

jaymode Nov 28, 2018

Choose a reason for hiding this comment

albertzaharovits Nov 28, 2018

Choose a reason for hiding this comment

jaymode Nov 28, 2018

Choose a reason for hiding this comment

albertzaharovits Nov 28, 2018

Choose a reason for hiding this comment

jaymode Nov 28, 2018

Choose a reason for hiding this comment

albertzaharovits Nov 28, 2018

Choose a reason for hiding this comment

albertzaharovits commented Nov 29, 2018 • edited

albertzaharovits commented Nov 28, 2018 •

edited

albertzaharovits commented Nov 29, 2018 •

edited