Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Stats API: moved detector count call outside transport layer and make asynchronous #108

Merged

Conversation

jmazanec15
Copy link
Member

Issue #, if available:
#107

Description of changes:
Previously, getting the detector count for the stats API was leading to timeouts. The detector count was retrieved by making a timed blocking SearchRequest in the transport layer and tallying up the number of results.

This PR changes the way in which the detector count is retrieved. It moves the logic to get the detector count to a nonblocking IndicesStatsRequest in the Rest layer. Now, the rest layer makes two asynchronous requests:

  1. IndicesStatsRequest to get detector count (only if needed)
  2. ADStatsRequest to get the node level stats

On response from those requests, it combines the node stats and the cluster stats together and sends the response back to the user.

In order to test, I modified existing tests and added a new one. I also manually tested on a local cluster using the command ./gradlew run -PnumNodes=3

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jmazanec15 jmazanec15 requested review from kaituo and wnbts May 5, 2020 18:58
@jmazanec15 jmazanec15 changed the title moved detector count call outside transport layer and make asynchronous Stats API: moved detector count call outside transport layer and make asynchronous May 5, 2020

@Override
public void merge(Mergeable other) {
if (this == other || other == null || getClass() != other.getClass()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor. other instanceof ADStatsResponse seems better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point will fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to use instanceof since this makes a subclass relationship returns true. We want to have exact same class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, will change back.


ADStatsResponse otherResponse = (ADStatsResponse) other;

if (otherResponse.adStatsNodesResponse != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor. Some edge cases might be handled more clearly, such as when both have node responses/cluster stats, throw an exception if unexpected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this object, the expected behavior if other has both node response and cluster stats is to merge them both into the object.

Set<String> statsToBeRetrieved = adStatsRequest.getStatsToBeRetrieved();
for (String statName : adStats.getClusterStats().keySet()) {
if (statsToBeRetrieved.contains(statName)) {
clusterStats.put(statName, adStats.getStats().get(statName).getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor. Seems more consistent to use adStats.getClusterStats() as in the for loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a different way might be iterating the entries using map::entrySet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion


public static final ADStatsAction INSTANCE = new ADStatsAction();
public static final ADStatsNodesAction INSTANCE = new ADStatsNodesAction();
public static final String NAME = "cluster:admin/ad_stats_action";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor. should ad_stats_action be renamed to ad_stats_node_action ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should good catch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of our transport actions are using prefix "cluster:admin/ad" like:

"cluster:admin/ad/result"
"cluster:admin/ad/cron"
"cluster:admin/ad/model/delete"

It would be good to have the same style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out. Will update.

* @param channel Channel to send response
* @param adStatsRequest Request containing stats to be retrieved
*/
public void getStats(Client client, RestChannel channel, ADStatsRequest adStatsRequest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor. The get methods should be private because they are implementation not contract. Tests are done based on contract not implementation. Because implementation is not client-facing, the full documentation is optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will update them.

Set<String> statsToBeRetrieved = adStatsRequest.getStatsToBeRetrieved();
for (String statName : adStats.getClusterStats().keySet()) {
if (statsToBeRetrieved.contains(statName)) {
clusterStats.put(statName, adStats.getStats().get(statName).getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a different way might be iterating the entries using map::entrySet

"Unable to return AD Stats"
);

onGetClusterStats(client, delegateListener, adStatsRequest);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on is more of a name for listener. You can change to getClusterStats or sth similar. Same applies below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will update both.

ADStatsResponse adStatsResponse = new ADStatsResponse();
if (adStatsRequest.getStatsToBeRetrieved().contains(StatNames.DETECTOR_COUNT.getName())) {
if (clusterService.state().getRoutingTable().hasIndex(AnomalyDetector.ANOMALY_DETECTORS_INDEX)) {
IndicesStatsRequest indicesStatsRequest = new IndicesStatsRequest();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need docs statistics. .setDocs(true) seems what you need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right that is correct. Will update.

.setValue(indicesStatsResponse.getIndex(AnomalyDetector.ANOMALY_DETECTORS_INDEX).getPrimaries().docs.getCount());
adStatsResponse.setClusterStats(getClusterStats(adStatsRequest));
listener.onResponse(adStatsResponse);
}, e -> listener.onFailure(new RuntimeException("Failed to get AD cluster stats: " + e))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new RuntimeException("Failed to get AD cluster stats", e) is more common.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will update.

listener.onResponse(adStatsResponse);
}
} else {
adStatsResponse.setClusterStats(getClusterStats(adStatsRequest));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you didn't set a value here, would it be possible that we pick some value set previously?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else block is only entered if the user does not want to get the detector count stat. So, it will not be retrieved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Is SettableSupplier's value a singleton shared by all of the requests? Would we have concurrent issue like one thread is writing to it while another is reading it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes SettableSupplier's value is a singleton. SettableSupplier uses AtomicLong for thread safety.

I believe this is the scenario of concern:

  1. async request(req_1) to get detector count is made (detector count = 3)
  2. detector is created (detector count = 4)
  3. async request(req_2) to get detector count is made (detector count = 4)
  4. req_2 returns and sets detector count to 4
  5. req_1 returns and sets detector count to 3
  6. stats are returned for req_1 with detector count = 3
  7. stats are returned for req_2 with detector count = 3

I think this is a minor issue for a couple of reasons. (1) On each stats request, the detector count is updated so it will not put the system in a permanent state of inconsistency (2) The scenario above is low probability because detector count is not expected to be that volatile and also stats api calls are not expected to be in that tight of a window.

A workaround could be to pass the detector count to getClusterStatsMap(ADStatsRequest adStatsRequest) and have a special case for DetectorCount stat. However, I am not sure this is best practice. Do you have any other suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it is a minor issue since document count is only used by humans, right?

Can you create a github issue for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes only humans will read.

#112

// Use MultiResponsesDelegateActionListener to execute 2 async requests and create the response once they finish
MultiResponsesDelegateActionListener<ADStatsResponse> delegateListener = new MultiResponsesDelegateActionListener<>(
getRestStatsListener(channel),
2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always 2? If customers don't want to get detector count or only want to get detector count, this value should be 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is always 2. We still call the listeners onResponse method even if the calls are not made. For example, here

@jmazanec15 jmazanec15 merged commit 5bc0ccd into opendistro-for-elasticsearch:master May 6, 2020
jmazanec15 added a commit to jmazanec15/anomaly-detection that referenced this pull request May 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants