Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 3223: RetentionTest failure #3272

Merged
merged 15 commits into from Jan 17, 2019

Conversation

RaulGracia
Copy link
Contributor

@RaulGracia RaulGracia commented Jan 10, 2019

Change log description
This PR explicitly creates Controller bucket zNodes before a change listener is registered. By doing this, we make sure that a bucket zNode is not of type "container", which may be the case if it is created by Curator as a parent of an actual zNode. This prevents Zookeeper from automatically deleting bucket zNodes in its internal process of garbage collection of empty containers.

Purpose of the change
Fixes #3223.

What the code does
This PR initializes the Stream bucket znode when we register a listener for changes on that bucket.

The reason for doing so is the following: Zookeeper recently (v3.5+) introduced a new kind of nodes called containers (ref). This kind of nodes differ from traditional zNodes in the following: once a container is populated and then emptied (i.g., all its child nodes have been deleted), it is candidate for garbage collection in Zookeeper (see ContainerManager.java):

...
Once started, it periodically checks container nodes that have a cversion > 0 and have no children. A
delete is attempted on the node.
...

By default, Curator exploits this feature when available. This implies that, when creating a zNode with a non-existent parent, Curator can create a container for the parent node and a zNode for the child one (i.e., PERSISTENT).

This was the case for the issue that motivates this PR. Before the execution of RetentionTest, another test (RestAPITest) created and deleted a Stream with a retention policy, so it was added and removed from a Stream bucket (buckets/0). Curator seems to create bucket/0 as a container and, according to the previous description of ContainerManager, the bucket node was candidate for garbage collection (i.e., filled, and then emptied) before the execution of RetentionTest. Once Zookeeper removes that bucket, the retention service (StreamCutBucketService) stops working, given that the watch on the bucket is broken.

To avoid empty Stream buckets from being deleted, this PR explicitly creates the bucket node as a zNode. This prevents Curator from making it of type container and suffices to prevent Zookeeper from deleting it.

Note: We also tried to use the option CuratorFrameworkFactory.builder().dontUseContainerParents(), but it seems to do not work as expected when used jointly with creatingParentsIfNeeded().

How to verify it
Locally replayed failure scenario of RetentionTest and works.
System test build passes entirely (only BookieFailoverTest has been disabled).
All tests should be passing as before.

@RaulGracia
Copy link
Contributor Author

RetentionTest failure

@codecov
Copy link

codecov bot commented Jan 10, 2019

Codecov Report

Merging #3272 into master will decrease coverage by <.01%.
The diff coverage is 75%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3272      +/-   ##
============================================
- Coverage     78.85%   78.85%   -0.01%     
- Complexity     8024     8032       +8     
============================================
  Files           609      609              
  Lines         31497    31512      +15     
  Branches       3046     3048       +2     
============================================
+ Hits          24838    24849      +11     
- Misses         4865     4870       +5     
+ Partials       1794     1793       -1
Impacted Files Coverage Δ Complexity Δ
...ller/store/stream/InMemoryStreamMetadataStore.java 88.27% <100%> (+0.08%) 51 <1> (+1) ⬆️
.../controller/server/retention/StreamCutService.java 79.03% <100%> (+0.34%) 15 <2> (+1) ⬆️
...controller/store/stream/ZKStreamMetadataStore.java 85.58% <69.23%> (-1.02%) 72 <5> (+5)
...main/java/io/pravega/controller/task/TaskBase.java 77.57% <0%> (-6.55%) 26% <0%> (-3%)
...a/io/pravega/common/util/OrderedItemProcessor.java 87.01% <0%> (-2.6%) 19% <0%> (-3%)
...ga/controller/store/client/StoreClientFactory.java 79.59% <0%> (-2.05%) 10% <0%> (-1%)
...egmentstore/server/reading/ContainerReadIndex.java 86.98% <0%> (-1.78%) 38% <0%> (ø)
...o/pravega/segmentstore/server/logs/DurableLog.java 89.26% <0%> (-0.49%) 67% <0%> (-1%)
.../segmentstore/server/writer/SegmentAggregator.java 78.5% <0%> (ø) 237% <0%> (+1%) ⬆️
...ga/controller/task/Stream/StreamMetadataTasks.java 82.29% <0%> (+0.31%) 143% <0%> (+1%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9fe1985...5dcb36b. Read the comment docs.

@shrids
Copy link
Contributor

shrids commented Jan 11, 2019

From the curator java docs observing that
client.create().creatingParentsIfNeeded() would cause the parent nodes to be created with org.apache.zookeeper.CreateMode#PERSISTENT type by default.

org.apache.curator.framework.api.CreateModable#withMode javadoc also indicates that by default the CreateMode is Persistent.

@RaulGracia
Copy link
Contributor Author

Yes @shrids, you are right. But apparently, this behavior is not completely honored. Even using dontUseContainerParents() in CuratorFrameworkFactory doesn't fix the problem.

The difference between master and this PR is the following.

In master, a Stream bucket is created by the first task (either StreamMetadataTasks or UpdateStreamTask) that invokes addUpdateStreamForAutoStreamCut(). So the creation of the bucket zNode is automatically performed as a parent of a Stream that goes into that bucket:

String retentionPath = String.format(RETENTION_PATH, bucket, encodedScopedStreamName(scope, stream));
byte[] serialize = SerializationUtils.serialize(retentionPolicy);
...
storeHelper.createZNodeIfNotExist(retentionPath, serialize);

In this PR, we first create the bucket as a znode before adding any Stream (in registerBucketChangeListener):

storeHelper.addNode(bucketRoot)

This avoids that Curator creates the bucket parent (.../buckets/x) in the process of registering the first Stream on that bucket, and thus set its type to container.

This leads to two considerations:

  • Are there other znodes that we believe that are not going to be deleted, but they actually are in some cases? For instance, I see these attempts to automatically delete znodes in ZK logs of a system test run, which originated the problem that this PR tries to solve:
2019-01-11 07:59:22,326 [myid:1] - INFO  [ContainerManagerTask:ContainerManager@119] - Attempting to delete candidate container: /pravega/pravega/cluster/faulthandlerleader
2019-01-11 07:59:22,332 [myid:1] - INFO  [ContainerManagerTask:ContainerManager@119] - Attempting to delete candidate container: /pravega/pravega/cluster/controllers
  • Should be create and issue in the Curator project? Apparently, the behavior that we are encountering is not the one specified in the docs, so maybe it would be useful to create an issue for that in Curator.

@@ -45,6 +46,7 @@

@Slf4j
@RunWith(SystemTestRunner.class)
@Ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a temporary change for testing purposes, is reverted now.

@@ -321,6 +322,9 @@ public void unregisterBucketOwnershipListener() {
public void registerBucketChangeListener(int bucket, BucketChangeListener listener) {
Preconditions.checkNotNull(listener);

String bucketRoot = String.format(BUCKET_PATH, bucket);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does not seem right. There are other listeners in this class too, like the ownership one. Also, this is the wrong place to do this. If we are to create this bucket znode, it should be during the process of acquiring ownership, not while registering a listener.

I need to do some further investigation before making a concrete proposal here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up question: why not create all necessary persistent znodes during the initialization of the controller process?

https://github.com/pravega/pravega/blob/master/controller/src/main/java/io/pravega/controller/server/Main.java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fpj This is another possibility that may be interesting, given that I observed the same phenomenon related to other znodes. This can be done when instantiating a new ZKStreamMetadataStore object. In fact, ZKHostStore does something similar in the method tryInit(), but instead of ensuring that the znode is created in the constructor, it is checked on every call to Zookeeper (but executed only once). I will try to find a better, broader znode initialization approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following your suggestion, I added a new method to StreamMetadataStore interface called initializeMetadataStore(). In the case of ZKStreamMetadataStore, it contains the logic to initialize the Stream buckets. This method is invoked in ControllerServiceStarter, just after the StreamMetadataStore object gets created. This may be a suitable "centralized" point to initialize all the necessary metadata items to facilitate maintenance.

Note that this solution fixes the reported problem (system tests are passing with the current version of the code) and could be easily cherry-picked to previous versions. But it could subject to changes in PR #3230, given that the initialization of "buckets" will be done for multiple services, not just for the retention service.

@@ -460,6 +463,35 @@ public void unregisterBucketListener(int bucket) {
});
}

/**
* When managing the Controller's metadata in Zookeeper, we explicitly create parent bucket zNodes (so they are of
* type "zNode"). Otherwise, they may be inadvertently created as Zookeeper "containers" by Curator. This would lead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point in this comment to curator documentation that says this, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reference for this. Actually, the first line of the page says: NOTE: Most Curator recipes will autocreate parent nodes of paths given to the recipe as CreateMode.CONTAINER.

Predicate<Throwable> isDataExistsException = ex -> Exceptions.unwrap(ex) instanceof StoreException.DataExistsException;
for (int bucket = 0; bucket < bucketCount; bucket++) {
final String bucketPath = String.format(BUCKET_PATH, bucket);
initializationFutures.add(retryPolicy.retryWhen(isDataExistsException.negate()).run(() ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does curator have its own retry mechanism? Do we need to retry ourselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@shiveshr shiveshr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaulGracia please remove the initializeStore method and have specific metadata creation apis that should be called from specific modules that need those metadata initialized.

@RaulGracia
Copy link
Contributor Author

@fpj @shiveshr Please, let me know if the current approach reasonably satisfies the existing points of view on this issue. If so, I will run system tests again with the latest changes.

Copy link
Contributor

@fpj fpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this current approach is fine by me, awaiting @shiveshr 's take.

Copy link
Contributor

@shiveshr shiveshr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggested improvements.

@Override
public CompletableFuture<Void> createBucketsRoot() {
List<CompletableFuture<Void>> initializationFutures = new ArrayList<>();
initializationFutures.add(initializeZNode(BUCKET_OWNERSHIP_PATH)); // Also initialize ownership zNode.
Copy link
Contributor

@shiveshr shiveshr Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add the root path explicitly as well..
following should be done:
"/buckets"
"/buckets/ownership"
"buckets/<bucket-id>"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also want to chain bucketRootPath --> bucketOwnershipPath --> ListOf bucketPath rather than run them concurrently as bucket path creation may fail if bucketRootPath is not yet created.

…type Container).

Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
…ode.

Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
…stances to contain metadata initialization tasks during Controller start.

Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
@RaulGracia RaulGracia force-pushed the issue-3223-retentionTest-failure branch from 6cc707f to 6c0e3bc Compare January 16, 2019 19:30
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
@Override
public CompletableFuture<Void> createBucketsRoot() {
List<CompletableFuture<Void>> initializationFutures = new ArrayList<>();
initializationFutures.add(initializeZNode(BUCKET_ROOT_PATH).thenCompose(v -> initializeZNode(BUCKET_OWNERSHIP_PATH)));
Copy link
Contributor

@shiveshr shiveshr Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following is what i was suggesting as it will ensure that parent znodes are created and only then do we attempt children znode creation:

return initializeZNode(BUCKET_ROOT_PATH)
.thenCompose(v -> initializeZNode(BUCKET_OWNERSHIP_PATH))
.thenCompose(v -> {
        for (int bucket = 0; bucket < bucketCount; bucket++) {
            final String bucketPath = String.format(BUCKET_PATH, bucket);
            initializationFutures.add(initializeZNode(bucketPath));
        }
         return Futures.allOf(initializationFutures); 
})

Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
Copy link
Contributor

@fpj fpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are testing this change primarily with system tests, but shouldn't we have (mocked) unit tests to exericize the code paths we are adding here?

Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
@fpj fpj merged commit f18ac01 into pravega:master Jan 17, 2019
fpj pushed a commit that referenced this pull request Jan 22, 2019
* Cherry pick changes from PR #3272 to branch r0.4.

Signed-off-by: Raúl Gracia <raul.gracia@emc.com>
adrianmo added a commit to adrianmo/pravega that referenced this pull request Jan 23, 2019
* master: (31 commits)
  Issue 3279: Client is performing blocking operations inside Netty Event Loop (pravega#3280)
  Issue 3200: Document Review: wire-protocol.md (pravega#3201)
  Issue 2933: (SegmentStore) testEndToEndWithFencing fixes (pravega#3294)
  Issue pravega#2768: Rename PravegaCredsWrapper class and its member variable (pravega#3302)
  Issue pravega#3299: Reduce dependencies in client (pravega#3300)
  Issue 2895: Wire Protocol implementation for TableStore (pravega#3283)
  Issue-3239: Make DYNAMIC_LOGGER non-static (pravega#3293)
  Issue 2956: (SegmentStore) TableSegment.deleteIfEmpty (pravega#3263)
  Issue pravega#3289: Use compile-time dependencies where possible (pravega#3290)
  Issue 3223: RetentionTest failure (pravega#3272)
  Issue 2001: Make a defensive copy of streamCut's position map (pravega#3281)
  Issue 3287: Exclude "passwd" extensions from rat. (pravega#3288)
  Issue 3285: Changed image pull policy. (pravega#3286)
  Issue 3266: Ensure bookkeeper.bkAckQuorumSize is 3 for BookieFailover test. (pravega#3267)
  Issue 3264: Return a SegmentHandle from Storage create method. (pravega#3276)
  Issue 3086: Controller Diagrams(new) included (pravega#3097)
  Issue 2570: Update segment metrics when a transaction merge happens (pravega#3239)
  Issue 3149: (SegmentStore) Pinned Segments (pravega#3215)
  Issue 3269: Fix invalid initialization order in AbstractScaleTests. (pravega#3270)
  Issue 3148: (Segment Store) Segment Metadata Store (pravega#3206)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants