Add more math functions to influxql #9620

Tomcat-Engineering · 2018-03-22T18:21:29Z

Basic trig functions were recently added by @jsternberg. This PR adds a load more, using the same mechanisms.

New functions

abs(x)
asin(x), acos(x), atan(x), atan2(x, y)
exp(x), ln(x), log(x, y), log2(x), log10(x)
pow(x, y), sqrt(x)

Required for all non-trivial PRs

Sign CLA (if not already signed)

Required only if applicable

You can erase any checkboxes below this note if they are not applicable to your Pull Request.

[InfluxData Documentation] see existing ticket

jsternberg · 2018-03-23T17:50:06Z

I'm going to review this, but before I do, I just want to point out this: +120, -27.

🎉 🎉 🎉

I'm very happy. I hope this refactor made it much easier to implement these for you.

jsternberg

Overall, this looks good and thank you so much for helping us to add these functions.

Would it be possible for you to remove exp(x) though? We are still discussing internally how we should handle special constants like π and I think e fits into that same discussion. I think it would be good for us to make an issue for supporting π and e (especially after this is merged) so we can have a public discussion about what's best, but I don't want to block the other functions.

Another one is log(). @rbetts @timhallinflux do you have any specific opinion about log()? I was thinking we should have ln(x) and log(x, y) instead, but I'm not sure what is more commonly expected by users and the change of base formula is pretty easy. I'm just used to those being the functions on calculators.

jsternberg · 2018-03-23T17:52:28Z

query/compile.go

+	}
+
+	// Compile all the argument expressions that are not just literals
+	for _, arg := range expr.Args {


Can you just do this instead?

if _, ok := arg.(influxql.Literal); ok { continue } // stuff

I don't think the switch is necessary and I think it hurts the readability.

jsternberg · 2018-03-23T17:59:00Z

query/compile.go

-			return fmt.Errorf("invalid number of arguments for %s, expected %d, got %d", expr.Name, exp, got)
+
+		// How many arguments are we expecting?
+		var nargs int


nargs := 1 switch expr.Name { case "atan2", "pow": nargs = 2 }

It makes it so the default clause isn't necessary.

That would be neater, but I saw this: https://github.com/influxdata/influxdb/blob/master/CODING_GUIDELINES.md#always-include-a-default-case-in-a-switch-statement

jsternberg · 2018-03-23T18:00:30Z

query/compile.go

 		}
-		return c.validateCondition(expr.Args[0])
+		return nil
+


Extra newline here.

jsternberg · 2018-03-23T18:06:06Z

query/math.go

 		switch name {
+		case "abs":
+			fn = math.Abs


I think we need to be more careful with this. Most of these functions operate the same as the trigonometry functions. For example, log() and its derivatives should always return a float. I don't think it makes sense for them not to. But some of these methods it doesn't necessarily make sense. For example, if I have abs(value::integer) I don't think it makes sense for it to return a float. I also don't think pow(value::integer, 2) makes sense to return a float either. If I were a user, I would expect using two integers with pow() to return an integer. On the other hand, I would expect sqrt(x) to always return a float since I don't think there's any scenario where it makes sense for it not to return a float.

So not all of these can reuse that code. On first glance, I see abs() and pow() as the only two I would apply that integer rule to. The others should always return floats in my opinion.

Yes, last time I implemented these I did preserve int datatypes where possible, but given that the latest code for sin happily accepted strings and all sorts I figured datatypes weren't that important now :-)

Happy to try and preserve ints where possible, I'll have a look in the next few days if I have time. pow is quite subtle because pow(x, y) only (usually) returns an int if x is and integer AND y is a positive integer...

jsternberg · 2018-03-23T18:07:47Z

query/math.go

+		if fn != nil {
+			result := fn(arg0, arg1)
+			if math.IsNaN(result) {
+				return nil, false


This should probably return nil, true and should probably be more like this:

result := fn(arg0, arg1) if math.IsNaN(result) { result = nil } return result, true

The ok is supposed to signify "was this call handled?" It was handled, it just returned a result that we do not support and cannot serialize back to the user to begin with.

That makes sense, I guessed that might be what I should return but wasn't sure.

jsternberg · 2018-03-23T19:29:55Z

Also, can you add a test for each of the new functions? Just having one query that is valid in TestCompile_Success and one in the failures is good. It just ensures that the compiler is tested at least a little bit.

CAFxX · 2018-03-23T22:00:02Z

query/math.go

 		case "sin":
-			return v.callTrigFunction(math.Sin, args[0])
+			fn =  math.Sin


How about avoiding the double indirection (switch+indirect call) to give the inliner a chance to do its job? E.g. instead of

fn = math.FnName

call the function directly

return handleNaN(math.FnName(arg0))

where handleNaN is

func handleNaN(result float64) (float64, bool) { if math.IsNaN(result) { return nil, false } return result, true }

If there are readability concerns then the something like the following should take care of them:

var res float64 switch name { case "abs": res = math.Abs(arg0) ... default: return nil, false } return handleNaN(res)

Sounds fine to me. You would likely need to do something like this:

switch name { case "sin", "cos", ... all of the other functions that return float ...: arg0, ok := asFloat(args[0]) if !ok { return nil, true } var res float64 // Need to do another switch to allow inlining. switch name { case "sin": res = math.Sin(arg0)) default: panic("need to panic here") } if math.IsNaN(res) { return nil, true } return res, true case "abs": // Handled differently because it has different code depending on whether the input // is a float or integer. }

It would be a tiny bit more repetitive with some extra typing (plus, you have to do a name switch twice). I don't think that's a big deal though.

I was thinking about it though and you probably don't want handleNaN. The main issue is that it would have to return an interface so returning a value type as an interface{} usually results in an allocation. That would likely undo any extra optimizations from inlining.

I think we need to get the original PR merged though before talking about optimizations.

As an aside, I just realized the current code is wrong for string and boolean inputs. When it encounters something like sin(x::string), it should return nil, true, but it currently returns nil, false.

(edited, I really need to read the names of users before responding)

jsternberg · 2018-03-26T16:08:20Z

@Tomcat-Engineering for function names, we should just match the functions Prometheus has.

So exp() is fine, but we should have log10(x), log2(x), and ln(x) too. If we want a generic log function, we can do log(x, y) since that function isn't used in Prometheus. The primary reason is to avoid confusion between the two platforms.

jsternberg · 2018-03-27T13:53:01Z

@Tomcat-Engineering whenever you have the time, this PR is relevant to the NaN question: #9637.

This came about when a regression test caught stddev() acting weirdly because it previously returned null and started getting filled with the fill value after the regression. Likely, we're going to handle it by allowing NaN to propagate through the system, but have the cursor return a sentinel null value so NaN will get encoded as null in the JSON. I think this is likely relevant to what we should do with other NaN values.

That likely means you don't have to check for NaN. It'll be handled as a null value which is the equivalent of return nil, true anyway.

jsternberg

One more comment and then we can probably merge this.

jsternberg · 2018-03-27T17:51:45Z

query/math.go

+		if arg0, ok := v.getIntegerValue(args[0]); ok {
+			if arg1, ok := v.getIntegerValue(args[1]); ok && arg1 >= 0 {
+				// Do the calc as a float then convert back to int if possible
+				result := math.Pow(float64(arg0), float64(arg1))


I would prefer if we did the integer power using only integers if possible. It should likely be implemented like this:

switch arg1 { case 2: return arg0 << 1, true case 4: return arg0 << 2, true case 8: return arg0 << 3, true case 16: return arg0 << 4, true default: val := arg0 for i := 1; i < arg1; i++ { val *= arg0 } return val, true }

This would also include an optimization for bitshift. I'm not sure if the go compiler would get that automatically, but I would guess it would not in this scenario. It might be worth checking the optimizer though.

One advantage of the method used here is that we would be able to detect overflow, but no other integer operation really checks for that so I don't think that's as important as accuracy for the integer type.

As a reminder, absolute value and power need to also be implemented for the unsigned type. For unsigned, if one of the two is unsigned and one is integer, it needs to treat it as unsigned.

Hmmm, bitshifting two places to the left would multiply by 4, not raise to the power of 4. And I suspect that the naive "iterate arg1 times" approach will usually be way slower than converting to float, using the heavily optimised math library algorithm and converting back to int.

Maybe we should handle powers of 2 and 3 by using basic multiplication, that should optimise the most likely scenarios I guess. I'll push a change.

Do we try to return a consistent datatype across the calculation? At the moment we are returning int for pow(3, my_int_series) where my_int_series is >= 0 and a float otherwise... depending on the values of the series we could have a mixed return type. Should we actually only return int if the second argument is a literal int >= 0?

The number of questions that are coming up because I'm being dumb is becoming problematic so I'll say something different that will hopefully be less dumb.

We should go back to your original implementation and pow() should only return float values and then it should be put in the documentation. For anything with x^2 you can always just do x * x if you need to guarantee accuracy as an int. For something like 2^x, we support bitshifts and I hope that's good enough. We wouldn't have 3^x, but I think the additional complication caused by this request is creating too big of an issue.

Then we just put it in the documentation. I hope this isn't a bad idea as I cannot predict what people are going to need, but we could add a new power function for influxql if it comes up and we can fix the behavior in ifql.

Sorry for throwing you around so much on this. I didn't realize it would be so complicated to do what I thought.

Tomcat-Engineering · 2018-03-28T08:08:12Z

It would be nice to have some unit tests which actually check the values and datatypes returned from these functions... if you guys provide some for the trig functions that you implemented, I'm happy to extend the coverage to these new functions. The scaffolding required (as seen in select_test.go) looks somewhat complicated so I'm going to let you do that bit!

jsternberg · 2018-03-28T12:17:17Z

I'll add a unit test for that. I've been meaning to do that anyway.

I'm thinking of changing the interface for call functions though. I thought the original interface would work well, but I've been finding the interface from the type mapper is much better and easier for construction. So here's the interface I'm thinking of changing it to:

type CallValuer interface {
    Call(name string, nargs int, args func() []interface{}) (interface{}, bool)
}

So similar to the current interface, but it passes the name of the function, the number of arguments, and a function that can be used to obtain the arguments. The purpose is to make it so that the call arguments aren't evaluated until the function knows it can handle them.

Thoughts?

jsternberg · 2018-04-02T16:48:26Z

I've now updated the current code to use the new interface: #9666

jsternberg · 2018-04-17T15:08:25Z

Hi @Tomcat-Engineering, do you have time to complete this? We are planning to do a release candidate probably early next week and I would like to get this included. I can help finish this up if you don't have the time.

- abs - asin, acos, atan, atan2 - exp, ln, log, log2, log10 - pow, sqrt

jsternberg · 2018-04-17T18:07:09Z

Thanks @Tomcat-Engineering for your help in starting this. I did the work of adapting my comments and then squashing it. I'm going to merge it now.

Once again, thank you so much for your help in getting these additional functions in. We're going to get them into 1.6.

ghost added the proposed label Mar 22, 2018

jsternberg self-assigned this Mar 23, 2018

jsternberg suggested changes Mar 23, 2018

View reviewed changes

CAFxX reviewed Mar 24, 2018

View reviewed changes

jsternberg suggested changes Mar 27, 2018

View reviewed changes

ghost added review and removed proposed labels Apr 17, 2018

jsternberg added the kind/feature-request label Apr 17, 2018

jsternberg approved these changes Apr 17, 2018

View reviewed changes

Add new math functions:

42581c7

- abs - asin, acos, atan, atan2 - exp, ln, log, log2, log10 - pow, sqrt

jsternberg merged commit 672a32a into influxdata:master Apr 17, 2018

ghost removed the review label Apr 17, 2018

jsternberg mentioned this pull request Apr 17, 2018

Document additional math functions influxdata/docs.influxdata.com-ARCHIVE#1550

Closed

timhallinflux mentioned this pull request Apr 17, 2018

Document Trig Functions in InfluxQL influxdata/docs.influxdata.com-ARCHIVE#1552

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more math functions to influxql #9620

Add more math functions to influxql #9620

Tomcat-Engineering commented Mar 22, 2018 •

edited by jsternberg

Loading

jsternberg commented Mar 23, 2018

jsternberg left a comment

jsternberg Mar 23, 2018

jsternberg Mar 23, 2018

Tomcat-Engineering Mar 27, 2018

jsternberg Mar 23, 2018

jsternberg Mar 23, 2018

Tomcat-Engineering Mar 26, 2018

jsternberg Mar 23, 2018

Tomcat-Engineering Mar 26, 2018

jsternberg commented Mar 23, 2018

CAFxX Mar 23, 2018

jsternberg Mar 26, 2018 •

edited

Loading

jsternberg commented Mar 26, 2018

jsternberg commented Mar 27, 2018

jsternberg left a comment

jsternberg Mar 27, 2018

Tomcat-Engineering Mar 27, 2018

Tomcat-Engineering Mar 27, 2018 •

edited

Loading

jsternberg Mar 27, 2018

Tomcat-Engineering commented Mar 28, 2018

jsternberg commented Mar 28, 2018

jsternberg commented Apr 2, 2018

jsternberg commented Apr 17, 2018

jsternberg commented Apr 17, 2018

Add more math functions to influxql #9620

Add more math functions to influxql #9620

Conversation

Tomcat-Engineering commented Mar 22, 2018 • edited by jsternberg Loading

Required for all non-trivial PRs

Required only if applicable

jsternberg commented Mar 23, 2018

jsternberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsternberg commented Mar 23, 2018

Choose a reason for hiding this comment

jsternberg Mar 26, 2018 • edited Loading

Choose a reason for hiding this comment

jsternberg commented Mar 26, 2018

jsternberg commented Mar 27, 2018

jsternberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tomcat-Engineering Mar 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tomcat-Engineering commented Mar 28, 2018

jsternberg commented Mar 28, 2018

jsternberg commented Apr 2, 2018

jsternberg commented Apr 17, 2018

jsternberg commented Apr 17, 2018

Tomcat-Engineering commented Mar 22, 2018 •

edited by jsternberg

Loading

jsternberg Mar 26, 2018 •

edited

Loading

Tomcat-Engineering Mar 27, 2018 •

edited

Loading