-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add temporal functions #872
Conversation
return aux / count | ||
|
||
default: | ||
panic("unknown aggregation type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right, but, on the other hand, if we get to this point, something is seriously wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this, you can have
type aggFunc func(vals float64) float64
Then have these all be seperate functions, and have aggFuncs be a map of these, similar to how aggregation functions are done in aggregation/base.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought it'd be a little messier, but on second thought, this is probably better. Will make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Another nice thing you get is that you don't need to check opType twice
return aux / count | ||
|
||
default: | ||
panic("unknown aggregation type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this, you can have
type aggFunc func(vals float64) float64
Then have these all be seperate functions, and have aggFuncs be a map of these, similar to how aggregation functions are done in aggregation/base.go
assert.Len(t, edges, 1) | ||
assert.Equal(t, edges[0].ParentID, parser.NodeID("0"), "fetch should be the parent") | ||
assert.Equal(t, edges[0].ChildID, parser.NodeID("1"), "aggregation should be the child") | ||
var temporalParseTests = []struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
min = math.Min(min, v) | ||
} | ||
} | ||
return min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails edge cases; if all values are NaNs, this will return math.Inf(1) instead of math.NaN()
max = math.Max(max, v) | ||
} | ||
} | ||
return max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails edge cases; if all values are NaNs, this will return math.Inf(-1) instead of math.NaN()
for _, v := range values { | ||
if !math.IsNaN(v) { | ||
count++ | ||
mean += (v - mean) / count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a rolling average here? Can't you just keep track of sum and count then return sum/count? You'll also need to account for cases where count == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was following how Prom did it, but yeah we can just do that instead.
} | ||
return sum | ||
|
||
case StdDevTemporalType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is calculating population standard deviation rather than standard deviation, is that what prom uses?
Also will need to have special cases for count == 0 so you don't get div by zeroes if all values are NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's what Prom uses.
aux += delta * (v - mean) | ||
} | ||
} | ||
return aux / count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re: population standard variance
// B1 has NaN in first series, first position | ||
func TestAggregation(t *testing.T) { | ||
for _, testCase := range testCases { | ||
values, bounds := test.GenerateValuesAndBounds(nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately if you're using this as your values
, you only get a really nice set of data points which do not help catch weird edge case issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add other test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually can we remove this GenerateValues thing, it leads to really confusing tests and it's honestly a bit weird that we generate test data somewhere different from the test itself :\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah- I can do that in another pr
|
||
// B1 has NaN in first series, first position | ||
func TestAggregation(t *testing.T) { | ||
for _, testCase := range testCases { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typical way of running these tabled tests is:
for _, tt := range testCases {
t.Run(tt.name, func(t *testing.T) {
...
) | ||
|
||
// NewAggOp creates a new base temporal transform with a specified node | ||
func NewAggOp(args []interface{}, optype string) (transform.Params, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend just taking floats or whatever as your args, and have the logic for providing usable values in the parser instead, so you don't end up with extra parsing scattered around in the function op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the problem here is that some ops require more than one args. Then the question is where should that logic live which determines which argument is need by which function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a todo to consolidate how we pass args around, #884 should help us standardize scalar parameters, but unfortunately there's a few special cases in prom which take Scalar, Fetch
instead of Fetch, Scalar
, or even Fetch, Scalar, Scalar
. Prom have a set of required param types by position; may be an idea to do something similar?
{math.NaN(), math.NaN(), math.NaN(), math.NaN(), 1.4142}, | ||
}, | ||
afterAllBlocks: [][]float64{ | ||
{1.4142, 1.4142, 1.4142, 1.4142, 1.4142}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right, aren't there two NaNs in the first series? How does that give the same expected as the other series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls respond
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one NaN, but it gets converted back to the original value (0) on line 270.
values[0][0] = original
Codecov Report
@@ Coverage Diff @@
## master #872 +/- ##
===========================================
+ Coverage 55.8% 78.57% +22.76%
===========================================
Files 392 396 +4
Lines 33328 33549 +221
===========================================
+ Hits 18599 26360 +7761
+ Misses 13067 5385 -7682
- Partials 1662 1804 +142
Continue to review full report at Codecov.
|
min = math.Min(min, v) | ||
} | ||
} | ||
if count == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newlines
Also consider doing something like aggregation functions do where you have a sumAndCount
function which returns (sum, count float64)
, making a bunch of these easier
sum += v | ||
} | ||
} | ||
if min == math.Inf(1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newlines
Also this might be a bit dumb, but if math.Inf somehow appears in the series, you'd get incorrect results
bounds block.Bounds | ||
) | ||
|
||
if vals == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just pass in values instead of doing this generation thing? It's really bad for comprehension and understanding why the results are what they are
} | ||
|
||
func newAggNode(op baseOp, controller *transform.Controller) Processor { | ||
return &aggNode{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just store the function directly so that we don't have to do the lookup in Process ?
return aggFuncs[a.op.operatorType](values) | ||
} | ||
|
||
func avgOverTime(values []float64) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these functions look very similar to what Artem wrote. Consider consolidating them ?
return sum | ||
} | ||
|
||
func stddevOverTime(values []float64) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of implementing all these functions, can you use : gonum ? Eg: https://github.com/gonum/gonum/blob/73ea1e732937f96d723d31dc5263d214a275d204/stat/stat.go#L1199
// NewAggOp creates a new base temporal transform with a specified node | ||
func NewAggOp(args []interface{}, optype string) (transform.Params, error) { | ||
if _, ok := aggFuncs[optype]; !ok { | ||
return emptyOp, fmt.Errorf("unknown aggregation type: %s", optype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: nil
should work instead of emptyOp
return &aggNode{ | ||
op: op, | ||
controller: controller, | ||
aggFunc: aggFuncs[op.operatorType], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've already retrieved the aggFunction previously, when you were checking for map membership in NewAggOp
. Can you just pass that through instead of re-fetching it here?
type aggNode struct { | ||
op baseOp | ||
controller *transform.Controller | ||
aggFunc func([]float64) float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: define this as a type, i.e. type aggFunc func([]float64) float64
) | ||
|
||
var ( | ||
aggFuncs = map[string]func([]float64) float64{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: define and use an aggFunc
type here
return sum | ||
} | ||
|
||
func stddevOverTime(values []float64) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not using gonum, can this just return math.Sqrt(stdvarOverTime(values))
@@ -141,8 +141,10 @@ func NewFunctionExpr(name string, argValues []interface{}) (parser.Params, error | |||
linear.MinuteType, linear.MonthType, linear.YearType: | |||
return linear.NewDateOp(name) | |||
|
|||
case temporal.CountTemporalType: | |||
return temporal.NewCountOp(argValues) | |||
case temporal.AvgTemporalType, temporal.CountTemporalType, temporal.MinTemporalType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take it or leave it: All of these are getting pretty long, may be an idea to make it into a function that checks if it's one of these instead? Your call
count++ | ||
} | ||
} | ||
return count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newline missing in most of these pre-return
return a.aggFunc(values) | ||
} | ||
|
||
func avgOverTime(values []float64) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked @nikunjgit's idea in the regular aggregation functions to make a function that returns (sum, count) which simplifies sum, count, average, and can help with stddev/stdvar
} | ||
|
||
func minOverTime(values []float64) float64 { | ||
var count float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be a bool, seenNotNaN
or something?
} | ||
|
||
func maxOverTime(values []float64) float64 { | ||
var count float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be a bool, seenNotNaN
or something?
} | ||
} | ||
|
||
sum, count := sumAndCount(values) | ||
if count == 0 { | ||
return math.NaN() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this logic might be better to put in sumAndCount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then you'd still have to have an if statement here, no?
@@ -64,18 +66,23 @@ var ( | |||
|
|||
// NewAggOp creates a new base temporal transform with a specified node | |||
func NewAggOp(args []interface{}, optype string) (transform.Params, error) { | |||
if _, ok := aggFuncs[optype]; !ok { | |||
return emptyOp, fmt.Errorf("unknown aggregation type: %s", optype) | |||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would read cleaner as
if aggregationFunc, ok := aggFuncs[optype]; ok {
return newBaseOp(args, optype, newAggNode, aggregationFunc)
}
return nil, fmt.Errorf("unknown aggregation type: %s", optype)
} | ||
} | ||
|
||
sum, count := sumAndCount(values) | ||
if count == 0 { | ||
return math.NaN() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this logic might be better to put in sumAndCount?
f123dd9
to
629e97e
Compare
return sum / count | ||
} | ||
|
||
func countOverTime(values []float64) float64 { | ||
_, count := sumAndCount(values) | ||
if count == 0 { | ||
return math.NaN() | ||
} | ||
return count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newline
No description provided.