-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PutManager initial implementation #255
Conversation
return isOpen.get(); | ||
} | ||
|
||
/** | ||
* Close the PutManager. | ||
*/ | ||
void close() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel we should have close, shouldn't all the managers implement Closeable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about line 199. We wait on chunkFillerThread to join here. But we set the open state to false on exiting (finally block) the chunkFillerThread. So, does it make any difference. I feel like its more like a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PutManager is special because it has a thread running in it. But we are being consistent across the operation managers.
I don't think there is a particular value in extending Closeable
as these are not public classes.
while (TestUtils.numThreadsByThisName("RequestResponseHandlerThread") > 0) { | ||
Thread.yield(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above test, you could try to put a blob here and ensure all 3 operations are failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is unnecessary here. If the RequestResponseHandler thread dies, it will close all operations. For the other test where the ChunkFiller thread dies, we close operations only when a new request comes in and realizes it (which was done to keep things simple).
private static final AtomicLong operationIdGenerator = new AtomicLong(0); | ||
private static final AtomicInteger currentOperationsCount = new AtomicInteger(0); | ||
|
||
static final int SHUTDOWN_WAIT_MS = 10 * Time.MsPerSec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 10
instead of 10 * Time.MsPerSec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is meant to be 10 seconds.
Looks good to me. |
} catch (Exception e) { | ||
logger.error("Aborting, as ChunkFillerThread received an exception: ", e); | ||
} finally { | ||
isOpen.set(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pnarayanan Please correct me if I was wrong. If there is any exception during ChunkFiller
thread, it will finally set PutManager
's isOpen
to false
. So, the PutManager
is closed.
Then, what is the behavior of the rest of the world? The router will not be closed, right? There can be no more PutOperations
submitted to the PutManager
, because it will check if the PutManager
is closed. But how about the pending PutOperations? Will they be timed out? Also, it seems the OperationController
is still able to handle responses for PutOperation
in a normal way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Ming is taking off for the rest of the week, I am gonna handle this comment. He just has this one comment. Once this is addressed, we are good to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ChunkFillerThread
fails, we do not immediately close the router. The close logic is already a bit complicated and trying to do that would over complicate things. We are keeping it simple and making sure that if the ChunkFillerThread
dies, it will simply close the PutManager
.
The router will get closed when the next put operation gets submitted, and all existing operations will be disposed off. I think that is good enough, unless we see an issue in the future.
As an aside, note that in other parts of our code, we do not generally attempt to close everything down when a thread dies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you are trying to say. But I would really like to revisit this, but lets not hold this patch anymore. So, can you add this also to your follow up list so that we can discuss later.
Left one comment, the rest LGTM. |
Looks good. Have you applied coding style and has the build passed successfully after all the changes? |
The changes build fine and the coding style is taken care of. There are intermittent errors that we have seen from time to time in the server tests, but those are unrelated to this patch. Could you choose the squash and merge option when we merge. Here are the steps:
|
Cool. Merging now. I mean, squashing and merging. |
This PR contains the core Put Manager implementation. A few other components that the Put Manager depends on, and certain flows within the NonBlocking router as a whole had to be modified/refactored in the process. However, the core implementation is limited to three main classes:
PutManager
: that manages all put operations,PutOperation
: created by thePutManager
for every operation.PutChunk
/MetadataPutChunk
: inner classes of aPutOperation
, that do state maintenance for the chunks of an operation.A quick look at the
PutManagerTest
will help get an idea of what the Put Manager supports and how it behaves.Primary reviewers: Gopal, Ming, Siva
Production Code
To review, I would start with the core classes:
Most, if not all, of the changes made elsewhere in the production code should get covered as the changes above are reviewed.
Test Code
Helpers
In order to test the Put Manager (and the other operation managers going forward), the following are introduced:
MockServer
: In-memory mock implementation of a server that exposes asend()
method that takes aSend
(request) and returns aBoundedByteBufferReceive
(response). It also provides hooks to introduce server errors and to deterministically simulate intermittent errors (that helps in testing slipped puts and such).ambry-router/src/test/java/com.github.ambry.router/MockServer.java
MockSelector
: Intercepts all the router requests (instead of them going to the real selector) and sends and receives requests and responses fromMockServer
s. It also allows for setting bad states to cause failures during connect, send or poll.ambry-router/src/test/java/com.github.ambry.router/MockSelector.java
MockServerLayout
: A class that maintains a mapping from aDataNodeId
in theMockClusterMap
to a correspondingMockServer
. Used by theMockSelector
to get theMockServer
to send a request to, given the host and port. This class is also used by the tests themselves to get a list of all theMockServer
s and the requests they received, in order to perform verification.ambry-router/src/test/java/com.github.ambry.router/MockServerLayout.java
Actual tests
putBlob()
operations and assert success or failure as appropriate. Tests ensure that all requests for the same chunk are identical, that the content of blobs created are identical to their original content, all the cleanup happen as expected when the router is closed, etc.ambry-router/src/test/java/com.github.ambry.router/PutManagerTest.java
PutOperation
. This helped exercise and stabilize that flow before the whole operation was tested.ambry-router/src/test/java/com.github.ambry.router/ChunkFillTest.java
Coverage