Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add temporal functions #872

Merged
merged 11 commits into from
Sep 7, 2018
Merged

Add temporal functions #872

merged 11 commits into from
Sep 7, 2018

Conversation

benraskin92
Copy link
Collaborator

No description provided.

return aux / count

default:
panic("unknown aggregation type")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right, but, on the other hand, if we get to this point, something is seriously wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing this, you can have
type aggFunc func(vals float64) float64
Then have these all be seperate functions, and have aggFuncs be a map of these, similar to how aggregation functions are done in aggregation/base.go

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought it'd be a little messier, but on second thought, this is probably better. Will make the change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Another nice thing you get is that you don't need to check opType twice

return aux / count

default:
panic("unknown aggregation type")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing this, you can have
type aggFunc func(vals float64) float64
Then have these all be seperate functions, and have aggFuncs be a map of these, similar to how aggregation functions are done in aggregation/base.go

assert.Len(t, edges, 1)
assert.Equal(t, edges[0].ParentID, parser.NodeID("0"), "fetch should be the parent")
assert.Equal(t, edges[0].ChildID, parser.NodeID("1"), "aggregation should be the child")
var temporalParseTests = []struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

min = math.Min(min, v)
}
}
return min
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails edge cases; if all values are NaNs, this will return math.Inf(1) instead of math.NaN()

max = math.Max(max, v)
}
}
return max
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails edge cases; if all values are NaNs, this will return math.Inf(-1) instead of math.NaN()

for _, v := range values {
if !math.IsNaN(v) {
count++
mean += (v - mean) / count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a rolling average here? Can't you just keep track of sum and count then return sum/count? You'll also need to account for cases where count == 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was following how Prom did it, but yeah we can just do that instead.

}
return sum

case StdDevTemporalType:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is calculating population standard deviation rather than standard deviation, is that what prom uses?

Also will need to have special cases for count == 0 so you don't get div by zeroes if all values are NaN

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what Prom uses.

aux += delta * (v - mean)
}
}
return aux / count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re: population standard variance

// B1 has NaN in first series, first position
func TestAggregation(t *testing.T) {
for _, testCase := range testCases {
values, bounds := test.GenerateValuesAndBounds(nil, nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately if you're using this as your values, you only get a really nice set of data points which do not help catch weird edge case issues

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add other test cases

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can we remove this GenerateValues thing, it leads to really confusing tests and it's honestly a bit weird that we generate test data somewhere different from the test itself :\

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah- I can do that in another pr


// B1 has NaN in first series, first position
func TestAggregation(t *testing.T) {
for _, testCase := range testCases {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typical way of running these tabled tests is:

	for _, tt := range testCases {
 		t.Run(tt.name, func(t *testing.T) {
...

)

// NewAggOp creates a new base temporal transform with a specified node
func NewAggOp(args []interface{}, optype string) (transform.Params, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend just taking floats or whatever as your args, and have the logic for providing usable values in the parser instead, so you don't end up with extra parsing scattered around in the function op.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the problem here is that some ops require more than one args. Then the question is where should that logic live which determines which argument is need by which function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a todo to consolidate how we pass args around, #884 should help us standardize scalar parameters, but unfortunately there's a few special cases in prom which take Scalar, Fetch instead of Fetch, Scalar, or even Fetch, Scalar, Scalar. Prom have a set of required param types by position; may be an idea to do something similar?

{math.NaN(), math.NaN(), math.NaN(), math.NaN(), 1.4142},
},
afterAllBlocks: [][]float64{
{1.4142, 1.4142, 1.4142, 1.4142, 1.4142},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right, aren't there two NaNs in the first series? How does that give the same expected as the other series?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls respond

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one NaN, but it gets converted back to the original value (0) on line 270.

values[0][0] = original

@codecov
Copy link

codecov bot commented Sep 4, 2018

Codecov Report

Merging #872 into master will increase coverage by 22.76%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #872       +/-   ##
===========================================
+ Coverage    55.8%   78.57%   +22.76%     
===========================================
  Files         392      396        +4     
  Lines       33328    33549      +221     
===========================================
+ Hits        18599    26360     +7761     
+ Misses      13067     5385     -7682     
- Partials     1662     1804      +142
Flag Coverage Δ
#dbnode 81.27% <ø> (+11.63%) ⬆️
#m3ninx 72.39% <ø> (+50.76%) ⬆️
#query 69.49% <100%> (+61.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cf04be...106cab0. Read the comment docs.

min = math.Min(min, v)
}
}
if count == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newlines

Also consider doing something like aggregation functions do where you have a sumAndCount function which returns (sum, count float64), making a bunch of these easier

sum += v
}
}
if min == math.Inf(1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newlines

Also this might be a bit dumb, but if math.Inf somehow appears in the series, you'd get incorrect results

bounds block.Bounds
)

if vals == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just pass in values instead of doing this generation thing? It's really bad for comprehension and understanding why the results are what they are

}

func newAggNode(op baseOp, controller *transform.Controller) Processor {
return &aggNode{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just store the function directly so that we don't have to do the lookup in Process ?

return aggFuncs[a.op.operatorType](values)
}

func avgOverTime(values []float64) float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these functions look very similar to what Artem wrote. Consider consolidating them ?

return sum
}

func stddevOverTime(values []float64) float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of implementing all these functions, can you use : gonum ? Eg: https://github.com/gonum/gonum/blob/73ea1e732937f96d723d31dc5263d214a275d204/stat/stat.go#L1199

// NewAggOp creates a new base temporal transform with a specified node
func NewAggOp(args []interface{}, optype string) (transform.Params, error) {
if _, ok := aggFuncs[optype]; !ok {
return emptyOp, fmt.Errorf("unknown aggregation type: %s", optype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nil should work instead of emptyOp

return &aggNode{
op: op,
controller: controller,
aggFunc: aggFuncs[op.operatorType],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've already retrieved the aggFunction previously, when you were checking for map membership in NewAggOp. Can you just pass that through instead of re-fetching it here?

type aggNode struct {
op baseOp
controller *transform.Controller
aggFunc func([]float64) float64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: define this as a type, i.e. type aggFunc func([]float64) float64

)

var (
aggFuncs = map[string]func([]float64) float64{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: define and use an aggFunc type here

return sum
}

func stddevOverTime(values []float64) float64 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not using gonum, can this just return math.Sqrt(stdvarOverTime(values))

@@ -141,8 +141,10 @@ func NewFunctionExpr(name string, argValues []interface{}) (parser.Params, error
linear.MinuteType, linear.MonthType, linear.YearType:
return linear.NewDateOp(name)

case temporal.CountTemporalType:
return temporal.NewCountOp(argValues)
case temporal.AvgTemporalType, temporal.CountTemporalType, temporal.MinTemporalType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take it or leave it: All of these are getting pretty long, may be an idea to make it into a function that checks if it's one of these instead? Your call

count++
}
}
return count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline missing in most of these pre-return

return a.aggFunc(values)
}

func avgOverTime(values []float64) float64 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked @nikunjgit's idea in the regular aggregation functions to make a function that returns (sum, count) which simplifies sum, count, average, and can help with stddev/stdvar

}

func minOverTime(values []float64) float64 {
var count float64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be a bool, seenNotNaN or something?

}

func maxOverTime(values []float64) float64 {
var count float64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be a bool, seenNotNaN or something?

}
}

sum, count := sumAndCount(values)
if count == 0 {
return math.NaN()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this logic might be better to put in sumAndCount?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then you'd still have to have an if statement here, no?

@@ -64,18 +66,23 @@ var (

// NewAggOp creates a new base temporal transform with a specified node
func NewAggOp(args []interface{}, optype string) (transform.Params, error) {
if _, ok := aggFuncs[optype]; !ok {
return emptyOp, fmt.Errorf("unknown aggregation type: %s", optype)
var (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would read cleaner as

if aggregationFunc, ok := aggFuncs[optype];  ok {
 return newBaseOp(args, optype, newAggNode, aggregationFunc)
} 

return nil, fmt.Errorf("unknown aggregation type: %s", optype)

}
}

sum, count := sumAndCount(values)
if count == 0 {
return math.NaN()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this logic might be better to put in sumAndCount?

@arnikola arnikola closed this Sep 6, 2018
@arnikola arnikola deleted the braskin/time_funcs branch September 6, 2018 17:52
@benraskin92 benraskin92 restored the braskin/time_funcs branch September 6, 2018 18:17
@benraskin92 benraskin92 reopened this Sep 6, 2018
return sum / count
}

func countOverTime(values []float64) float64 {
_, count := sumAndCount(values)
if count == 0 {
return math.NaN()
}
return count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline

@benraskin92 benraskin92 merged commit 27e784d into master Sep 7, 2018
@prateek prateek deleted the braskin/time_funcs branch September 29, 2018 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants