Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HWKMETRICS-566 Power up the external alerter #727

Merged
merged 1 commit into from
Feb 2, 2017

Conversation

jshaughn
Copy link
Contributor

  • Introduces a new mechanism for defining conditions using
    flexible expressions with embedded stats queries.
  • Replaces the previous expression syntax.

Note:

  • Introduces use of a new 3rd party component, EvalEx. A nifty
    tool for handling expression evaluation.
  • Added 'samples' to AvailabilityBucketPoint
    • good for consistency, and useful for determining upCount
  • Add Percentile.toString() for better debugging

@FilipB
Copy link

FilipB commented Jan 18, 2017

retest this please

@jshaughn jshaughn force-pushed the hwkmetrics-566 branch 3 times, most recently from df2f188 to 2df52db Compare January 23, 2017 22:32
@jshaughn
Copy link
Contributor Author

@stefannegrea @lucasponce This is now ready, I think, for a code review prior to merge. Since the meeting I've cleaned things up a bit and incorporated feedback:

  • Added support for EvalType.EACH which performs evaluations against each metric individually, as opposed to EvalType.ALL (the default) which combines the datapoints from all metrics and performs a single evaluation.
    -- Note, EvalType was formerly named ResultType.
    -- Note, I moved EvalType from Query level to ConditionExpression level, which I think is more straightforward and less confusing than supporting different approaches for different queries used in the eval expression.
  • I migrated from having the Alerter generate Data to generating (non-persisted) Events when an evaluation is true. As discussed with @lucasponce , this is a better semantic for External Alerters. It also avoids an issue with alerting tossing out Data for the same DataId, when timestamps are too near together.
  • I also added better handling for when queries return no data. In this situation the evaluation simply resolves to false and things continue (debug logging as well, if enabled).

Please take a look, time permitting.

}
this.expression.eval();
} catch (Exception e) {
throw new ExpressionException("Invalid eval [" + eval + "]: " + e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a IllegalArgumentException instead to propagate this one com.udojava.evalex.Expression.ExpressionException ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, replaced in two places so as not to propagate the 3rd party Exception class.

import com.udojava.evalex.Expression;
import com.udojava.evalex.Expression.ExpressionException;

public class ConditionEvaluator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ConditionEvaluator is dealing with the expression, perhaps a BNF representation of valid expressions here is good.
It has not to be complete but some lines would definitely help.
For example, I put a similar one in this other alerter:
https://github.com/hawkular/hawkular-alerts/blob/master/hawkular-alerts-engine-extensions/hawkular-alerts-events-aggregation/src/main/java/org/hawkular/alerts/extensions/Expression.java#L31

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, a couple of the new classes were missing class-level jdoc. The supported Expression syntax is more fully defined at the EvalEx page, I have created class jdoc that points to it, as well as adds more info. I also added class jdoc for ConditionExpression.

case uptimeRatio:
return data.getUptimeRatio().doubleValue();
case samples:
return data.getSamples() * 1.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I normally used the cast operator for this. I guess this is for converting to double, isn't it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiplying by a double literal always returns a double, so a cast is not necessary.


// First, collect the queries
List<String> queries = new ArrayList<String>();
Matcher m = Pattern.compile("q\\(.*?\\)").matcher(eval);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor optimization.
If this one pattern is going to be used often you could make it static as part of the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, create a static Pattern.


public QueryFunc(String query) {
super();
Matcher m = Pattern.compile("q\\((.*?),\\s*(.*?)\\)").matcher(query);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Pattern can be static to the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, created a static Pattern.

* the evaluation is performed once (per run of the ConditionExpression). EACH means that data for each
* metric will be aggregated separately and the evaluation is performed once per metric. Note that if multiple
* queries are involved in the eval expression that only metrics common to each query will be evaluated.
* The default is ALL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read the comment, but I have some fear that perhaps I could misinterpreted the behavior, perhaps a couple of lines with examples of ALL/EACH to clarify could help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a more verbose comment to hopefully make things more clear. This same information will be in the documentation.

private static final Logger log = Logger.getLogger(ConditionExpression.class);

/** 10mn */
public static final String DEFAULT_FREQUENCY = "10mn";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mn" means "minutes" here ? Why don't use just "m" ? In other DSL, the some-sort-standard prefixes are d/h/m/s etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mn" is the suffix used by Metrics' Duration. I question their decision as well but I am only conforming to their documented Duration type: http://www.hawkular.org/docs/rest/rest-metrics.html#_custom_string_formats.


private static final Integer THREAD_POOL_SIZE = 20;

ScheduledThreadPoolExecutor expressionExecutor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self.
ThreadPools are growing.
I guess that at some point it could be nice to have a wrapper for these classes and put more descriptive names to them.
Then when we dump a JVM we can trace better the threads involved.

Just a side comment, no action required, but it is not first time that I think on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, they are everywhere.


if (evaluator.evaluate()) {
try {
Data externalData = Data.forString(externalCondition.getTenantId(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I am not 100% sure if I understand well some details of the logic.
But in EACH mode, the ExternalCondition will expect and Event and in ALL mode a Data to trigger the Alert.
Am I right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the reason you are confused is that this is a bug, leftover from the conversion from Data to Event sending. Thanks for pointing it out. Now the code sends Events consistently.

// Average over the last 1 minute > 50, check every minute
Query qNow = Query.builder("qNow").metric(metricId).duration("1mn").build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add tests over the REST API through plain JSON too.
Perhaps the same but with plain text.
There are not support for extensions with Java clients, and also, I am not 100% if python/ruby clients are tested against ExternalCondition (probably not).
So, these JSON tests will be the closest example to a live example.
It would help to point people into the feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what that test would exercise. This code is already using REST to post the ExternalCondition. Are you concerned that python/ruby clients may not be able to post an ExternalCondition using a JSON string for the expression value?

Copy link
Contributor

@lucasponce lucasponce Jan 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that groovy is unbeatable for these tests where Java objects are "JSON-lized" under the hood in a pretty straightforward way.
But experience showed us when someone else needs to use this feature, and we point them to these examples, they are not very useful, as they need to access via Java API.
I think these tests are great and we should continue using but what I do is to introduce some additional test where instead of an object we define the json representation in raw format.
That helps to catch up with the technology.
And point potential users to take the test as a valid example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you are talking about this from the perspective of a code example. OK, I can create one like that. In the formal documentation I have a JSON example as well.

@jshaughn
Copy link
Contributor Author

@lucasponce Muchas gracias for the review! I have pushed a commit with the changes (all but the requested JSON ITest example).

@jshaughn
Copy link
Contributor Author

With the review from @lucasponce and the follow-up commits I think I can remove the DoNotMerge label. @stefannegrea, or anyone, it is still open for more review comments or questions if you have them.

@jshaughn
Copy link
Contributor Author

Re-added DNM, I want to use only the the new tag query string, so waiting on PR 725...

@jshaughn
Copy link
Contributor Author

jshaughn commented Feb 2, 2017

@stefannegrea @lucasponce, I have removed the DNM label. I have rebased the PR on master, applied final changes wrt the new tag query work, and sqashed. Assuming travis is happy and if there are no further comments, I think this can be merged along with the associated documentation PR (hawkular/hawkular.github.io#265).

- Introduces a new mechanism for defining conditions using
  flexible expressions with embedded stats queries.
- Replaces the previous expression syntax.

Note:
- Introduces use of a new 3rd party component, EvalEx.  A nifty
  tool for handling expression evaluation.
  - https://github.com/uklimaschewski/EvalEx
- Added 'samples' to AvailabilityBucketPoint
  - good for consistency, and useful for determining upCount
- Add Percentile.toString() for better debugging
- Supports the new metrics tag query expressions
@stefannegrea
Copy link
Contributor

Thank you @jshaughn!

@stefannegrea stefannegrea merged commit 2fd2541 into hawkular:master Feb 2, 2017
@jshaughn jshaughn deleted the hwkmetrics-566 branch May 11, 2017 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants