Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PutManager initial implementation #255

Merged
merged 23 commits into from
Apr 7, 2016

Conversation

pnarayanan
Copy link
Contributor

This PR contains the core Put Manager implementation. A few other components that the Put Manager depends on, and certain flows within the NonBlocking router as a whole had to be modified/refactored in the process. However, the core implementation is limited to three main classes:

  • PutManager: that manages all put operations,
  • PutOperation: created by the PutManager for every operation.
  • PutChunk/MetadataPutChunk: inner classes of a PutOperation, that do state maintenance for the chunks of an operation.

A quick look at the PutManagerTest will help get an idea of what the Put Manager supports and how it behaves.

Primary reviewers: Gopal, Ming, Siva


Production Code
To review, I would start with the core classes:

ambry-router/src/main/java/com.github.ambry.router/NonBlockingRouter.java
ambry-router/src/main/java/com.github.ambry.router/PutManager.java
ambry-router/src/main/java/com.github.ambry.router/PutOperation.java (PutOperation + PutChunk)

Most, if not all, of the changes made elsewhere in the production code should get covered as the changes above are reviewed.

Test Code
Helpers
In order to test the Put Manager (and the other operation managers going forward), the following are introduced:

  • MockServer: In-memory mock implementation of a server that exposes a send() method that takes a Send (request) and returns a BoundedByteBufferReceive (response). It also provides hooks to introduce server errors and to deterministically simulate intermittent errors (that helps in testing slipped puts and such).

ambry-router/src/test/java/com.github.ambry.router/MockServer.java

  • MockSelector: Intercepts all the router requests (instead of them going to the real selector) and sends and receives requests and responses from MockServers. It also allows for setting bad states to cause failures during connect, send or poll.

ambry-router/src/test/java/com.github.ambry.router/MockSelector.java

  • MockServerLayout: A class that maintains a mapping from a DataNodeId in the MockClusterMap to a corresponding MockServer. Used by the MockSelector to get the MockServer to send a request to, given the host and port. This class is also used by the tests themselves to get a list of all the MockServers and the requests they received, in order to perform verification.

ambry-router/src/test/java/com.github.ambry.router/MockServerLayout.java

Actual tests

  • PutManager test: Main set of tests for the Put Manager testing various scenarios. These tests submit putBlob() operations and assert success or failure as appropriate. Tests ensure that all requests for the same chunk are identical, that the content of blobs created are identical to their original content, all the cleanup happen as expected when the router is closed, etc.

ambry-router/src/test/java/com.github.ambry.router/PutManagerTest.java

  • ChunkFillTest: This is simply a test that directly executes the chunk filling flow within PutOperation. This helped exercise and stabilize that flow before the whole operation was tested.

ambry-router/src/test/java/com.github.ambry.router/ChunkFillTest.java


Coverage

Package                            Class, %           Method, %           Line, %
com.github.ambry.router            100% (29/ 29)      90.2% (147/ 163)    88.9% (727/ 818)

Class                              Class, %           Method, %           Line, %
ChunkState                         100% (1/ 1)        100% (2/ 2)         100% (5/ 5)
CoordinatorBackedRouter            100% (1/ 1)        100% (13/ 13)       90.7% (78/ 86)
CoordinatorBackedRouterFactory     100% (1/ 1)        100% (2/ 2)         100% (16/ 16)
CoordinatorBackedRouterMetrics     100% (1/ 1)        100% (1/ 1)         100% (25/ 25)
CoordinatorOperation               100% (2/ 2)        100% (7/ 7)         93.8% (76/ 81)
CoordinatorOperationType           100% (1/ 1)        100% (2/ 2)         100% (5/ 5)
DeleteManager                      100% (1/ 1)        50% (1/ 2)          50% (1/ 2)
FutureResult                       100% (1/ 1)        69.2% (9/ 13)       68.2% (15/ 22)
GetManager                         100% (1/ 1)        100% (1/ 1)         100% (1/ 1)
NonBlockingRouter                  100% (3/ 3)        69% (20/ 29)        69.3% (88/ 127)
NonBlockingRouterFactory           100% (1/ 1)        100% (3/ 3)         85% (17/ 20)
NonBlockingRouterMetrics           100% (1/ 1)        100% (1/ 1)         100% (2/ 2)
PutManager                         100% (3/ 3)        100% (13/ 13)       95.9% (70/ 73)
PutOperation                       100% (5/ 5)        100% (47/ 47)       94% (220/ 234)
ReadableStreamChannelInputStream   100% (2/ 2)        87.5% (7/ 8)        90.6% (48/ 53)
RouterErrorCode                    100% (1/ 1)        100% (3/ 3)         87.5% (14/ 16)
RouterException                    100% (1/ 1)        80% (4/ 5)          76.9% (10/ 13)
SimpleOperationTracker             100% (2/ 2)        100% (11/ 11)       97.3% (36/ 37)

return isOpen.get();
}

/**
* Close the PutManager.
*/
void close() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel we should have close, shouldn't all the managers implement Closeable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about line 199. We wait on chunkFillerThread to join here. But we set the open state to false on exiting (finally block) the chunkFillerThread. So, does it make any difference. I feel like its more like a no-op.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PutManager is special because it has a thread running in it. But we are being consistent across the operation managers.
I don't think there is a particular value in extending Closeable as these are not public classes.

while (TestUtils.numThreadsByThisName("RequestResponseHandlerThread") > 0) {
Thread.yield();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above test, you could try to put a blob here and ensure all 3 operations are failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is unnecessary here. If the RequestResponseHandler thread dies, it will close all operations. For the other test where the ChunkFiller thread dies, we close operations only when a new request comes in and realizes it (which was done to keep things simple).

private static final AtomicLong operationIdGenerator = new AtomicLong(0);
private static final AtomicInteger currentOperationsCount = new AtomicInteger(0);

static final int SHUTDOWN_WAIT_MS = 10 * Time.MsPerSec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 10 instead of 10 * Time.MsPerSec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is meant to be 10 seconds.

@nsivabalan
Copy link
Contributor

Looks good to me.

} catch (Exception e) {
logger.error("Aborting, as ChunkFillerThread received an exception: ", e);
} finally {
isOpen.set(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnarayanan Please correct me if I was wrong. If there is any exception during ChunkFiller thread, it will finally set PutManager's isOpen to false. So, the PutManager is closed.

Then, what is the behavior of the rest of the world? The router will not be closed, right? There can be no more PutOperations submitted to the PutManager, because it will check if the PutManager is closed. But how about the pending PutOperations? Will they be timed out? Also, it seems the OperationController is still able to handle responses for PutOperation in a normal way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Ming is taking off for the rest of the week, I am gonna handle this comment. He just has this one comment. Once this is addressed, we are good to merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ChunkFillerThread fails, we do not immediately close the router. The close logic is already a bit complicated and trying to do that would over complicate things. We are keeping it simple and making sure that if the ChunkFillerThread dies, it will simply close the PutManager.

The router will get closed when the next put operation gets submitted, and all existing operations will be disposed off. I think that is good enough, unless we see an issue in the future.

As an aside, note that in other parts of our code, we do not generally attempt to close everything down when a thread dies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you are trying to say. But I would really like to revisit this, but lets not hold this patch anymore. So, can you add this also to your follow up list so that we can discuss later.

@xiahome
Copy link
Contributor

xiahome commented Apr 7, 2016

Left one comment, the rest LGTM.

@nsivabalan
Copy link
Contributor

Looks good. Have you applied coding style and has the build passed successfully after all the changes?

@pnarayanan
Copy link
Contributor Author

The changes build fine and the coding style is taken care of. There are intermittent errors that we have seen from time to time in the server tests, but those are unrelated to this patch.

Could you choose the squash and merge option when we merge. Here are the steps:

  1. Click on "Merge pull request"
  2. In the "Confirm Merge" button, click on the dropdown and select "Squash and Merge"
  3. You will see each of the commit's title in the message. Remove everything and just keep "PutManager initial implementation".
  4. Click on "Confirm squash and merge".

@nsivabalan
Copy link
Contributor

Cool. Merging now. I mean, squashing and merging.

@nsivabalan nsivabalan merged commit ac1cfbf into linkedin:master Apr 7, 2016
@pnarayanan pnarayanan deleted the PutImplementationBak branch May 17, 2016 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants