New metrics #3

mattevans · 2016-12-13T02:43:36Z

This change removes avg as a metric and splits it out into three new metrics (mean, median and mode). Also implemented is sum, cardinality and stdev

snikch

A few changes to be made.

I have to apologise for not leading by example with the comments. I dropped the ball in the measurers. It'd be good to add some comments in the appropriate format for each of the measurers.

snikch · 2016-12-13T04:56:54Z

metrics.go

+		return nil
+	}
+
+	numbers := []float64{}


So a couple of questions about the mode.

Do we need to do two loops? Looks like you could cover this all in a single loop.

Why do we return an empty loop if the modes are all of the numbers / one? I haven't dealt with modes too much but wouldn't it still be relevant to have the only possible value returned regardless?

If you could add some comments re: mode that'd be good too - in fact we should quickly explain what each of the averages are - I always get them mixed up, so it'd be nice not to have to think too hard.

snikch · 2016-12-13T05:00:19Z

metrics.go

@@ -91,3 +183,76 @@ func (a *max) Result() interface{} {
 	result, _ := a.amount.Float64()
 	return result
 }
+
+// Cadinality


snikch · 2016-12-13T05:02:22Z

metrics.go

+}
+
+func (a *cardinality) AddDatum(datum interface{}) {
+	a.size++


Cardinality should be the total unique elements added, so this should really be a map[interface{}]bool where the values are comparable, and the Result should call len(that_map).

If you can't just use interface{} because it doesn't consider a == b, then perhaps make an actual interface

Or you could just do a known type switch, i.e

switch t := datum.(type) { case string: a.values[t]++ … }

snikch · 2016-12-13T20:17:37Z

metrics.go

+	// Even slice? Get the mean of the middle values.
+	if len(a.list)%2 == 0 {
+		prev, _ := a.list[middle-1].Float64()
+		median = (median + prev) / 2


Might be worth doing decimal math here.

snikch · 2016-12-13T20:18:19Z

metrics.go

+	sort.Sort(decimalSortNumerical(a.list))
+
+	// Determine median value.
+	middle := len(a.list) / 2


You could probably do this in an else below, rather than double working.

snikch · 2016-12-13T20:20:07Z

processor.go

@@ -245,8 +246,22 @@ func (p *queryProcessor) measure() {
 			// Now add all of the data to the measurer.
 			for j := range bucket.sourceRows {
 				row := bucket.sourceRows[j]
+
+				// If measurer is cardinality, we can +1 not worrying about field value.
+				if (reflect.TypeOf(m) == reflect.TypeOf(&cardinality{})) {


You should do this for a type check instead:if _, ok := m.(*cardinality); ok {

But.... this doesn't really apply, since cardinality should be unique, I'm not sure this shortcut works any more.

…assing measuer{} to IsMetricable to determine if metric can run

mattevans · 2016-12-14T02:48:45Z

Thanks @snikch - Really appreciate it.

Added commenting
Fixed smaller issues (decimal math, removed extra loop, etc)
cardinality metric now actually calculates the cardinality. 👍
Added valueCount metric, which is exactly that.

Questions on mode metric:

Why do we return an empty loop if the modes are all of the numbers / one? I haven't dealt with modes too much but wouldn't it still be relevant to have the only possible value returned regardless?

I was always under the impression mode was the value(s) that occur/repeat most often within a set. If all values have the same number of occurrences, then they aren't technically a mode? After a quick google, some say they should be considered mode, others say they shouldn't. Now I'm not so sure. hahaha. Thoughts? I'll ask Lance if he has a preference for either option.

Examples:

[1,1,2,3,4,5,5,5]
Mode = 5

[1,1,1,2,3,3,3]
Mode = [1,3]

[1,1,1,2,2,2,3,3,3]
Mode = nil

[1,2,3,4,5]
Mode = nil

If the dataset is one (referenced as tip in the code), then it means no repeating values were found. Now that I've added comments (and cleaned it up), it should make a bit more sense. Sorry about that.

In fact we should quickly explain what each of the averages are - I always get them mixed up, so it'd be nice not to have to think too hard.

Added comments to each metric.

Metricable string values

With cardinality (and now valueCount) measurers, these needed to be run against StringCells. You pointed out the shortcut wasn't going to work once cardinality was implemented correctly. That shortcut is gone!

To resolve, I'm passing the measurer{} to IsMetricable() (see here).

Previously, the type of cell determined if the field was metricable. Passing the measurer, allows us to run certain metrics against certain cells. As it stands, only needed for StringCells, but could be handy for different metrics in the future.

Do you think this is an alright solution? Maybe I should define what measurers can be used on what cells outside of each XCell implementation?

mattevans · 2016-12-15T00:23:24Z

Going to merge this after our discussion/look over last night.
Can make further adjustments if needed.

Hope your Xmas party is in full swing!

mattevans added 3 commits December 13, 2016 15:14

Using const for repeating values

de89a53

Added new metrics (mean, median, mode, stdev, sum, cardinality)

ee5f400

Updated tests for new metrics

8368ba0

mattevans self-assigned this Dec 13, 2016

mattevans requested a review from snikch December 13, 2016 02:43

snikch requested changes Dec 13, 2016

View reviewed changes

mattevans added 7 commits December 14, 2016 14:35

Removed shortcut to bypass non-metricable cell for cardinality. Now p…

fc28ea6

…assing measuer{} to IsMetricable to determine if metric can run

IsMetricable() now requires measurer{}

02f4843

Comments. Implemented cardinality correctly. Added valueCount metric

2233a5d

Tests updated to reflect changes made

e2dc5d5

Using decimal math for median metric

1cc6987

Commenting

a95e36b

Comments

13bbd1d

mattevans merged commit 1f5dc61 into master Dec 15, 2016

mattevans deleted the new-metrics branch December 15, 2016 00:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New metrics #3

New metrics #3

mattevans commented Dec 13, 2016

snikch left a comment

snikch Dec 13, 2016

snikch Dec 13, 2016

snikch Dec 13, 2016

snikch Dec 13, 2016

snikch Dec 13, 2016

snikch Dec 13, 2016

snikch Dec 13, 2016

snikch Dec 13, 2016

mattevans commented Dec 14, 2016 •

edited

Loading

mattevans commented Dec 15, 2016 •

edited

Loading

New metrics #3

New metrics #3

Conversation

mattevans commented Dec 13, 2016

snikch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattevans commented Dec 14, 2016 • edited Loading

mattevans commented Dec 15, 2016 • edited Loading

mattevans commented Dec 14, 2016 •

edited

Loading

mattevans commented Dec 15, 2016 •

edited

Loading