Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more math functions to influxql #9620

Merged
merged 1 commit into from
Apr 17, 2018
Merged

Add more math functions to influxql #9620

merged 1 commit into from
Apr 17, 2018

Conversation

Tomcat-Engineering
Copy link
Contributor

@Tomcat-Engineering Tomcat-Engineering commented Mar 22, 2018

Basic trig functions were recently added by @jsternberg. This PR adds a load more, using the same mechanisms.

New functions

  • abs(x)
  • asin(x), acos(x), atan(x), atan2(x, y)
  • exp(x), ln(x), log(x, y), log2(x), log10(x)
  • pow(x, y), sqrt(x)
Required for all non-trivial PRs
  • Sign CLA (if not already signed)
Required only if applicable

You can erase any checkboxes below this note if they are not applicable to your Pull Request.

@ghost ghost added the proposed label Mar 22, 2018
@jsternberg jsternberg self-assigned this Mar 23, 2018
@jsternberg
Copy link
Contributor

I'm going to review this, but before I do, I just want to point out this: +120, -27.

🎉 🎉 🎉

I'm very happy. I hope this refactor made it much easier to implement these for you.

Copy link
Contributor

@jsternberg jsternberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good and thank you so much for helping us to add these functions.

Would it be possible for you to remove exp(x) though? We are still discussing internally how we should handle special constants like π and I think e fits into that same discussion. I think it would be good for us to make an issue for supporting π and e (especially after this is merged) so we can have a public discussion about what's best, but I don't want to block the other functions.

Another one is log(). @rbetts @timhallinflux do you have any specific opinion about log()? I was thinking we should have ln(x) and log(x, y) instead, but I'm not sure what is more commonly expected by users and the change of base formula is pretty easy. I'm just used to those being the functions on calculators.

query/compile.go Outdated
}

// Compile all the argument expressions that are not just literals
for _, arg := range expr.Args {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just do this instead?

if _, ok := arg.(influxql.Literal); ok {
    continue
}
// stuff

I don't think the switch is necessary and I think it hurts the readability.

query/compile.go Outdated
return fmt.Errorf("invalid number of arguments for %s, expected %d, got %d", expr.Name, exp, got)

// How many arguments are we expecting?
var nargs int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nargs := 1
switch expr.Name {
case "atan2", "pow":
    nargs = 2
}

It makes it so the default clause isn't necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query/compile.go Outdated
}
return c.validateCondition(expr.Args[0])
return nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra newline here.

query/math.go Outdated
switch name {
case "abs":
fn = math.Abs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be more careful with this. Most of these functions operate the same as the trigonometry functions. For example, log() and its derivatives should always return a float. I don't think it makes sense for them not to. But some of these methods it doesn't necessarily make sense. For example, if I have abs(value::integer) I don't think it makes sense for it to return a float. I also don't think pow(value::integer, 2) makes sense to return a float either. If I were a user, I would expect using two integers with pow() to return an integer. On the other hand, I would expect sqrt(x) to always return a float since I don't think there's any scenario where it makes sense for it not to return a float.

So not all of these can reuse that code. On first glance, I see abs() and pow() as the only two I would apply that integer rule to. The others should always return floats in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, last time I implemented these I did preserve int datatypes where possible, but given that the latest code for sin happily accepted strings and all sorts I figured datatypes weren't that important now :-)

Happy to try and preserve ints where possible, I'll have a look in the next few days if I have time. pow is quite subtle because pow(x, y) only (usually) returns an int if x is and integer AND y is a positive integer...

query/math.go Outdated
if fn != nil {
result := fn(arg0, arg1)
if math.IsNaN(result) {
return nil, false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably return nil, true and should probably be more like this:

result := fn(arg0, arg1)
if math.IsNaN(result) {
    result = nil
}
return result, true

The ok is supposed to signify "was this call handled?" It was handled, it just returned a result that we do not support and cannot serialize back to the user to begin with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I guessed that might be what I should return but wasn't sure.

@jsternberg
Copy link
Contributor

Also, can you add a test for each of the new functions? Just having one query that is valid in TestCompile_Success and one in the failures is good. It just ensures that the compiler is tested at least a little bit.

query/math.go Outdated
case "sin":
return v.callTrigFunction(math.Sin, args[0])
fn = math.Sin
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about avoiding the double indirection (switch+indirect call) to give the inliner a chance to do its job? E.g. instead of

fn = math.FnName

call the function directly

return handleNaN(math.FnName(arg0))

where handleNaN is

func handleNaN(result float64) (float64, bool) {
  if math.IsNaN(result) {
    return nil, false
  }
  return result, true
}

If there are readability concerns then the something like the following should take care of them:

var res float64
switch name {
case "abs":
	res = math.Abs(arg0)
...
default:
	return nil, false
}
return handleNaN(res)

Copy link
Contributor

@jsternberg jsternberg Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me. You would likely need to do something like this:

switch name {
case "sin", "cos", ... all of the other functions that return float ...:
    arg0, ok := asFloat(args[0])
    if !ok {
        return nil, true
    }

    var res float64
    // Need to do another switch to allow inlining.
    switch name {
    case "sin":
        res = math.Sin(arg0))
    default:
        panic("need to panic here")
    }

    if math.IsNaN(res) {
        return nil, true
    }
    return res, true
case "abs":
    // Handled differently because it has different code depending on whether the input
    // is a float or integer.
}

It would be a tiny bit more repetitive with some extra typing (plus, you have to do a name switch twice). I don't think that's a big deal though.

I was thinking about it though and you probably don't want handleNaN. The main issue is that it would have to return an interface so returning a value type as an interface{} usually results in an allocation. That would likely undo any extra optimizations from inlining.

I think we need to get the original PR merged though before talking about optimizations.

As an aside, I just realized the current code is wrong for string and boolean inputs. When it encounters something like sin(x::string), it should return nil, true, but it currently returns nil, false.

(edited, I really need to read the names of users before responding)

@jsternberg
Copy link
Contributor

@Tomcat-Engineering for function names, we should just match the functions Prometheus has.

So exp() is fine, but we should have log10(x), log2(x), and ln(x) too. If we want a generic log function, we can do log(x, y) since that function isn't used in Prometheus. The primary reason is to avoid confusion between the two platforms.

@jsternberg
Copy link
Contributor

@Tomcat-Engineering whenever you have the time, this PR is relevant to the NaN question: #9637.

This came about when a regression test caught stddev() acting weirdly because it previously returned null and started getting filled with the fill value after the regression. Likely, we're going to handle it by allowing NaN to propagate through the system, but have the cursor return a sentinel null value so NaN will get encoded as null in the JSON. I think this is likely relevant to what we should do with other NaN values.

That likely means you don't have to check for NaN. It'll be handled as a null value which is the equivalent of return nil, true anyway.

Copy link
Contributor

@jsternberg jsternberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment and then we can probably merge this.

query/math.go Outdated
if arg0, ok := v.getIntegerValue(args[0]); ok {
if arg1, ok := v.getIntegerValue(args[1]); ok && arg1 >= 0 {
// Do the calc as a float then convert back to int if possible
result := math.Pow(float64(arg0), float64(arg1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we did the integer power using only integers if possible. It should likely be implemented like this:

switch arg1 {
case 2:
    return arg0 << 1, true
case 4:
    return arg0 << 2, true
case 8:
    return arg0 << 3, true
case 16:
    return arg0 << 4, true
default:
    val := arg0
    for i := 1; i < arg1; i++ {
        val *= arg0
    }
    return val, true
}

This would also include an optimization for bitshift. I'm not sure if the go compiler would get that automatically, but I would guess it would not in this scenario. It might be worth checking the optimizer though.

One advantage of the method used here is that we would be able to detect overflow, but no other integer operation really checks for that so I don't think that's as important as accuracy for the integer type.

As a reminder, absolute value and power need to also be implemented for the unsigned type. For unsigned, if one of the two is unsigned and one is integer, it needs to treat it as unsigned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, bitshifting two places to the left would multiply by 4, not raise to the power of 4. And I suspect that the naive "iterate arg1 times" approach will usually be way slower than converting to float, using the heavily optimised math library algorithm and converting back to int.

Maybe we should handle powers of 2 and 3 by using basic multiplication, that should optimise the most likely scenarios I guess. I'll push a change.

Copy link
Contributor Author

@Tomcat-Engineering Tomcat-Engineering Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we try to return a consistent datatype across the calculation? At the moment we are returning int for pow(3, my_int_series) where my_int_series is >= 0 and a float otherwise... depending on the values of the series we could have a mixed return type. Should we actually only return int if the second argument is a literal int >= 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of questions that are coming up because I'm being dumb is becoming problematic so I'll say something different that will hopefully be less dumb.

We should go back to your original implementation and pow() should only return float values and then it should be put in the documentation. For anything with x^2 you can always just do x * x if you need to guarantee accuracy as an int. For something like 2^x, we support bitshifts and I hope that's good enough. We wouldn't have 3^x, but I think the additional complication caused by this request is creating too big of an issue.

Then we just put it in the documentation. I hope this isn't a bad idea as I cannot predict what people are going to need, but we could add a new power function for influxql if it comes up and we can fix the behavior in ifql.

Sorry for throwing you around so much on this. I didn't realize it would be so complicated to do what I thought.

@Tomcat-Engineering
Copy link
Contributor Author

It would be nice to have some unit tests which actually check the values and datatypes returned from these functions... if you guys provide some for the trig functions that you implemented, I'm happy to extend the coverage to these new functions. The scaffolding required (as seen in select_test.go) looks somewhat complicated so I'm going to let you do that bit!

@jsternberg
Copy link
Contributor

I'll add a unit test for that. I've been meaning to do that anyway.

I'm thinking of changing the interface for call functions though. I thought the original interface would work well, but I've been finding the interface from the type mapper is much better and easier for construction. So here's the interface I'm thinking of changing it to:

type CallValuer interface {
    Call(name string, nargs int, args func() []interface{}) (interface{}, bool)
}

So similar to the current interface, but it passes the name of the function, the number of arguments, and a function that can be used to obtain the arguments. The purpose is to make it so that the call arguments aren't evaluated until the function knows it can handle them.

Thoughts?

@jsternberg
Copy link
Contributor

I've now updated the current code to use the new interface: #9666

@jsternberg
Copy link
Contributor

Hi @Tomcat-Engineering, do you have time to complete this? We are planning to do a release candidate probably early next week and I would like to get this included. I can help finish this up if you don't have the time.

- abs
- asin, acos, atan, atan2
- exp, ln, log, log2, log10
- pow, sqrt
@jsternberg
Copy link
Contributor

Thanks @Tomcat-Engineering for your help in starting this. I did the work of adapting my comments and then squashing it. I'm going to merge it now.

Once again, thank you so much for your help in getting these additional functions in. We're going to get them into 1.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants