New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transport: shortcut local execution #10350

Closed
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@bleskes
Member

bleskes commented Mar 31, 2015

In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by #10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.

Transport: shortcut local execution
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by #10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.
@kimchy

This comment has been minimized.

Show comment
Hide comment
@kimchy

kimchy Apr 2, 2015

Member

@bleskes added a few comments

Member

kimchy commented Apr 2, 2015

@bleskes added a few comments

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Apr 2, 2015

Member

@kimchy thx. pushed a minor commit plus responded.

Member

bleskes commented Apr 2, 2015

@kimchy thx. pushed a minor commit plus responded.

@kimchy

This comment has been minimized.

Show comment
Hide comment
@kimchy

kimchy Apr 2, 2015

Member

@bleskes somehow my comment about disconnect from node got lost, I am not sure we should throw an unsupported exception if its from the local node, it breaks the contract of not doing anything for connect if its a local node (and what happens with unicast disco where the existing host is provided as well?)

Member

kimchy commented Apr 2, 2015

@bleskes somehow my comment about disconnect from node got lost, I am not sure we should throw an unsupported exception if its from the local node, it breaks the contract of not doing anything for connect if its a local node (and what happens with unicast disco where the existing host is provided as well?)

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Apr 2, 2015

Member

@kimchy your comment is folded up here. Here is my response:

re this being a noop - yeah, that's how I originally implemented it as well. Then I thought that strictly speaking we don't honor the disconnect (you can still ask nodeConnected(localNode) and get true), so we should throw an exception. I think your point valid regarding connectToNodeLight , but in that case, at least currently, we never use a known node id. Doubting which one is the lesser evil.

Member

bleskes commented Apr 2, 2015

@kimchy your comment is folded up here. Here is my response:

re this being a noop - yeah, that's how I originally implemented it as well. Then I thought that strictly speaking we don't honor the disconnect (you can still ask nodeConnected(localNode) and get true), so we should throw an exception. I think your point valid regarding connectToNodeLight , but in that case, at least currently, we never use a known node id. Doubting which one is the lesser evil.

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Apr 2, 2015

Member

@kimchy I pushed another commit. I was hesitant to add the optimization where we check that the response is returned on the same executor as the request. The API doesn't guarantee us that the channel was not handed off to another thread. I would prefer not to do that one. Feels dangerous.

Member

bleskes commented Apr 2, 2015

@kimchy I pushed another commit. I was hesitant to add the optimization where we check that the response is returned on the same executor as the request. The API doesn't guarantee us that the channel was not handed off to another thread. I would prefer not to do that one. Feels dangerous.

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Apr 2, 2015

Member

@kimchy pushed the noop disconnect thing.

Member

bleskes commented Apr 2, 2015

@kimchy pushed the noop disconnect thing.

// ignore if its null, the adapter logs it
if (handler != null) {
final String executor = handler.executor();
if (ThreadPool.Names.SAME.equals(executor)) {

This comment has been minimized.

@kimchy

kimchy Apr 5, 2015

Member

can we pass to the channel the executor the TransportRequestHandler is executing on, and on top of SAME, if its the same as the executor the request is executing on, we don't need to fork it again?

@kimchy

kimchy Apr 5, 2015

Member

can we pass to the channel the executor the TransportRequestHandler is executing on, and on top of SAME, if its the same as the executor the request is executing on, we don't need to fork it again?

}
final RemoteTransportException rtx = (RemoteTransportException) error;
final String executor = handler.executor();
if (ThreadPool.Names.SAME.equals(executor)) {

This comment has been minimized.

@kimchy

kimchy Apr 5, 2015

Member

same as above for non exception case

@kimchy

kimchy Apr 5, 2015

Member

same as above for non exception case

@kimchy

This comment has been minimized.

Show comment
Hide comment
@kimchy

kimchy Apr 5, 2015

Member

@bleskes left another small comment, other than that LGTM. There are several other places where we check on local node that we can remove and simplify the code, SearchServiceTransportAction is a great example :). We can push this and keep it small, and then go and cleanup the places to keep the changes manageable?

Member

kimchy commented Apr 5, 2015

@bleskes left another small comment, other than that LGTM. There are several other places where we check on local node that we can remove and simplify the code, SearchServiceTransportAction is a great example :). We can push this and keep it small, and then go and cleanup the places to keep the changes manageable?

@kimchy kimchy assigned bleskes and unassigned kimchy Apr 5, 2015

@bleskes bleskes closed this in 80e86e5 Apr 8, 2015

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Apr 8, 2015

Transport: shortcut local execution
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by #10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.

Closes #10350

@bleskes bleskes deleted the bleskes:transport_local_shortcut branch Apr 8, 2015

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Apr 8, 2015

Member

@kimchy thx. I committed. Agreed on cleaning up more places as a second iteration.

Member

bleskes commented Apr 8, 2015

@kimchy thx. I committed. Agreed on cleaning up more places as a second iteration.

bleskes added a commit that referenced this pull request Apr 8, 2015

Revert "Transport: shortcut local execution, #10350"
This reverts commit d8bb760.

This causes BWC issues for some plugins

@bleskes bleskes removed the v1.6.0 label Apr 8, 2015

@clintongormley clintongormley removed the review label Aug 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment