Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdata: add [Type]Slice.Sort(func) to sort slices #3671

Merged
merged 4 commits into from Jul 22, 2021
Merged

pdata: add [Type]Slice.Sort(func) to sort slices #3671

merged 4 commits into from Jul 22, 2021

Conversation

jacobmarble
Copy link
Contributor

Description:

@tigrannajaryan
This change helps my efforts to validate the recent changes to the pdata package, as requested

Comparing two generated slices can be tedious when the slices are not
ordered, and order does not matter for the test.

This change adds a .Sort(func(i, j int) bool) method to generated [Type]Slice. This is
similar to method StringMap.Sort(), but takes a less function because
the order may differ by context, where the sort order of a map is obvious (by key).

Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.>

Documentation:

Added some godoc.

@jacobmarble jacobmarble requested a review from a team as a code owner July 19, 2021 01:10
@project-bot project-bot bot added this to In progress in Collector Jul 19, 2021
@jacobmarble jacobmarble changed the title pdata: add TypeSlice.Sort(func) to sort slices pdata: add [Type]Slice.Sort(func) to sort slices Jul 19, 2021
@bogdandrutu
Copy link
Member

@jacobmarble so this is mostly for testing correct? Need a bit of motivation and where this is needed.

@jacobmarble
Copy link
Contributor Author

@jacobmarble so this is mostly for testing correct? Need a bit of motivation and where this is needed.

Correct, this is mostly for testing.

I found one use of StringMap.Sort() outside of tests:

func timeseriesSignature(ilmName string, metric pdata.Metric, labels pdata.StringMap) string {
var b strings.Builder
b.WriteString(metric.DataType().String())
b.WriteString("*" + ilmName)
b.WriteString("*" + metric.Name())
labels.Sort().Range(func(k string, v string) bool {
b.WriteString("*" + k + "*" + v)
return true
})
return b.String()
}

I also found one use of AttributeMap.Sort() outside of tests:

func attributeMapToString(av pdata.AttributeMap) string {
var b strings.Builder
b.WriteString("{\n")
av.Sort().Range(func(k string, v pdata.AttributeValue) bool {
fmt.Fprintf(&b, " -> %s: %s(%s)\n", k, v.Type(), tracetranslator.AttributeValueToString(v))
return true
})
b.WriteByte('}')
return b.String()
}

Neither of the above Sort() calls could be replaced by an Equals(other) method. Equivalence is what I really need; sorting TypeSlice objects allows me to do the following (I haven't found a better way to sort buckets yet):

func SortResourceMetrics(rm pdata.ResourceMetricsSlice) {
	for i := 0; i < rm.Len(); i++ {
		r := rm.At(i)
		for j := 0; j < r.InstrumentationLibraryMetrics().Len(); j++ {
			il := r.InstrumentationLibraryMetrics().At(j)
			for k := 0; k < il.Metrics().Len(); k++ {
				m := il.Metrics().At(k)
				switch m.DataType() {
				case pdata.MetricDataTypeGauge:
					for l := 0; l < m.Gauge().DataPoints().Len(); l++ {
						m.Gauge().DataPoints().At(l).LabelsMap().Sort()
					}
				case pdata.MetricDataTypeSum:
					for l := 0; l < m.Sum().DataPoints().Len(); l++ {
						m.Sum().DataPoints().At(l).LabelsMap().Sort()
					}
				case pdata.MetricDataTypeHistogram:
					for l := 0; l < m.Histogram().DataPoints().Len(); l++ {
						sortBuckets(m.Histogram().DataPoints().At(l))
						m.Histogram().DataPoints().At(l).LabelsMap().Sort()
					}
				case pdata.MetricDataTypeSummary:
					for l := 0; l < m.Summary().DataPoints().Len(); l++ {
						m.Summary().DataPoints().At(l).LabelsMap().Sort()
						m.Summary().DataPoints().At(l).QuantileValues().Sort(func(i, j int) bool {
							left := m.Summary().DataPoints().At(l).QuantileValues().At(i)
							right := m.Summary().DataPoints().At(l).QuantileValues().At(j)
							return left.Quantile() < right.Quantile()
						})
					}
				default:
					panic(fmt.Sprintf("unsupported metric data type %d", m.DataType()))
				}
			}
			il.Metrics().Sort(func(i, j int) bool {
				return il.Metrics().At(i).Name() < il.Metrics().At(j).Name()
			})
		}
		r.InstrumentationLibraryMetrics().Sort(func(i, j int) bool {
			left := r.InstrumentationLibraryMetrics().At(i).InstrumentationLibrary()
			right := r.InstrumentationLibraryMetrics().At(j).InstrumentationLibrary()
			if left.Name() == right.Name() {
				return left.Version() < right.Version()
			}
			return left.Name() < right.Name()
		})
		r.Resource().Attributes().Sort()
	}
}

func sortBuckets(hdp pdata.HistogramDataPoint) {
	buckets := make(sortableBuckets, len(hdp.ExplicitBounds()))
	for i := range hdp.ExplicitBounds() {
		buckets[i] = sortableBucket{hdp.BucketCounts()[i], hdp.ExplicitBounds()[i]}
	}
	sort.Sort(buckets)
	for i, bucket := range buckets {
		hdp.BucketCounts()[i], hdp.ExplicitBounds()[i] = bucket.count, bucket.bound
	}
}

type sortableBucket struct {
	count uint64
	bound float64
}

type sortableBuckets []sortableBucket

func (s sortableBuckets) Len() int {
	return len(s)
}

func (s sortableBuckets) Less(i, j int) bool {
	return s[i].bound < s[j].bound
}

func (s sortableBuckets) Swap(i, j int) {
	s[i].count, s[j].count = s[j].count, s[i].count
	s[i].bound, s[j].bound = s[j].bound, s[i].bound
}

Comparing two generated slices can be tedious when the slices are not
ordered, and order does not matter for the test.

This change adds a .Sort(func(i, j int) bool) method to generated [Type]Slice. This is
similar to method StringMap.Sort(), but takes a less function because
the order may differ by context, where the sort order of a map is obvious (by key).
@punya
Copy link
Member

punya commented Jul 21, 2021

Sorry for the late feedback, but given that this is solely for tests, is it worth writing a single assertion that's implemented using reflection, rather than generating all this code? I was imagining a signature like

// slice1 and slice2 are []T
// compareFunc is func(a, b T) int
func AssertEqualSlices(t *testing.T, slice1, slice2, compareFunc interface{}) { ... }

Clearly this signature is less self-documenting than the generated signatures. But could it be a better trade-off than all the code generation.

(This is also a matter of personal preference, so I would defer to the author or maintainers if they believe the original idea is a better fit.)

@jacobmarble
Copy link
Contributor Author

@punya your comment reflects my comment about StringMap.Sort() and AttributeMap.Sort(). If Sort() is a method on those map/structs, then it should also exist on slice/structs, following the same pattern.

@punya
Copy link
Member

punya commented Jul 21, 2021

@jacobmarble understood. I would make a different trade-off for those too, but I recognize that this is a personal preference.

@bogdandrutu
Copy link
Member

@punya there are some metrics protocols that require ordering of timeseries/points. In that case I think we may need to do a sorting of the points by timestamp, so sorting is most likely to be needed for non-test code as well. Same may apply for logs or even span.

@jacobmarble
Copy link
Contributor Author

@bogdandrutu is there anything left for me on this one?

Collector automation moved this from In progress to Reviewer approved Jul 22, 2021
@bogdandrutu bogdandrutu merged commit 6814536 into open-telemetry:main Jul 22, 2021
Collector automation moved this from Reviewer approved to Done Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Collector
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants