pdata: add [Type]Slice.Sort(func) to sort slices #3671

jacobmarble · 2021-07-19T01:10:27Z

Description:

@tigrannajaryan
This change helps my efforts to validate the recent changes to the pdata package, as requested

Comparing two generated slices can be tedious when the slices are not
ordered, and order does not matter for the test.

This change adds a .Sort(func(i, j int) bool) method to generated [Type]Slice. This is
similar to method StringMap.Sort(), but takes a less function because
the order may differ by context, where the sort order of a map is obvious (by key).

Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.>

Documentation:

Added some godoc.

bogdandrutu · 2021-07-20T18:20:44Z

@jacobmarble so this is mostly for testing correct? Need a bit of motivation and where this is needed.

jacobmarble · 2021-07-20T21:25:32Z

@jacobmarble so this is mostly for testing correct? Need a bit of motivation and where this is needed.

Correct, this is mostly for testing.

I found one use of StringMap.Sort() outside of tests:

opentelemetry-collector/exporter/prometheusexporter/accumulator.go

Lines 374 to 384 in f9dacb8

    
           func timeseriesSignature(ilmName string, metric pdata.Metric, labels pdata.StringMap) string { 
        
           	var b strings.Builder 
        
           	b.WriteString(metric.DataType().String()) 
        
           	b.WriteString("*" + ilmName) 
        
           	b.WriteString("*" + metric.Name()) 
        
           	labels.Sort().Range(func(k string, v string) bool { 
        
           		b.WriteString("*" + k + "*" + v) 
        
           		return true 
        
           	}) 
        
           	return b.String() 
        
           }

I also found one use of AttributeMap.Sort() outside of tests:

opentelemetry-collector/internal/otlptext/databuffer.go

Lines 303 to 313 in c6f9f29

    
           func attributeMapToString(av pdata.AttributeMap) string { 
        
           	var b strings.Builder 
        
           	b.WriteString("{\n") 
        
           	av.Sort().Range(func(k string, v pdata.AttributeValue) bool { 
        
           		fmt.Fprintf(&b, "     -> %s: %s(%s)\n", k, v.Type(), tracetranslator.AttributeValueToString(v)) 
        
           		return true 
        
           	}) 
        
           	b.WriteByte('}') 
        
           	return b.String() 
        
           }

Neither of the above Sort() calls could be replaced by an Equals(other) method. Equivalence is what I really need; sorting TypeSlice objects allows me to do the following (I haven't found a better way to sort buckets yet):

func SortResourceMetrics(rm pdata.ResourceMetricsSlice) {
	for i := 0; i < rm.Len(); i++ {
		r := rm.At(i)
		for j := 0; j < r.InstrumentationLibraryMetrics().Len(); j++ {
			il := r.InstrumentationLibraryMetrics().At(j)
			for k := 0; k < il.Metrics().Len(); k++ {
				m := il.Metrics().At(k)
				switch m.DataType() {
				case pdata.MetricDataTypeGauge:
					for l := 0; l < m.Gauge().DataPoints().Len(); l++ {
						m.Gauge().DataPoints().At(l).LabelsMap().Sort()
					}
				case pdata.MetricDataTypeSum:
					for l := 0; l < m.Sum().DataPoints().Len(); l++ {
						m.Sum().DataPoints().At(l).LabelsMap().Sort()
					}
				case pdata.MetricDataTypeHistogram:
					for l := 0; l < m.Histogram().DataPoints().Len(); l++ {
						sortBuckets(m.Histogram().DataPoints().At(l))
						m.Histogram().DataPoints().At(l).LabelsMap().Sort()
					}
				case pdata.MetricDataTypeSummary:
					for l := 0; l < m.Summary().DataPoints().Len(); l++ {
						m.Summary().DataPoints().At(l).LabelsMap().Sort()
						m.Summary().DataPoints().At(l).QuantileValues().Sort(func(i, j int) bool {
							left := m.Summary().DataPoints().At(l).QuantileValues().At(i)
							right := m.Summary().DataPoints().At(l).QuantileValues().At(j)
							return left.Quantile() < right.Quantile()
						})
					}
				default:
					panic(fmt.Sprintf("unsupported metric data type %d", m.DataType()))
				}
			}
			il.Metrics().Sort(func(i, j int) bool {
				return il.Metrics().At(i).Name() < il.Metrics().At(j).Name()
			})
		}
		r.InstrumentationLibraryMetrics().Sort(func(i, j int) bool {
			left := r.InstrumentationLibraryMetrics().At(i).InstrumentationLibrary()
			right := r.InstrumentationLibraryMetrics().At(j).InstrumentationLibrary()
			if left.Name() == right.Name() {
				return left.Version() < right.Version()
			}
			return left.Name() < right.Name()
		})
		r.Resource().Attributes().Sort()
	}
}

func sortBuckets(hdp pdata.HistogramDataPoint) {
	buckets := make(sortableBuckets, len(hdp.ExplicitBounds()))
	for i := range hdp.ExplicitBounds() {
		buckets[i] = sortableBucket{hdp.BucketCounts()[i], hdp.ExplicitBounds()[i]}
	}
	sort.Sort(buckets)
	for i, bucket := range buckets {
		hdp.BucketCounts()[i], hdp.ExplicitBounds()[i] = bucket.count, bucket.bound
	}
}

type sortableBucket struct {
	count uint64
	bound float64
}

type sortableBuckets []sortableBucket

func (s sortableBuckets) Len() int {
	return len(s)
}

func (s sortableBuckets) Less(i, j int) bool {
	return s[i].bound < s[j].bound
}

func (s sortableBuckets) Swap(i, j int) {
	s[i].count, s[j].count = s[j].count, s[i].count
	s[i].bound, s[j].bound = s[j].bound, s[i].bound
}

cmd/pdatagen/internal/base_slices.go

Comparing two generated slices can be tedious when the slices are not ordered, and order does not matter for the test. This change adds a .Sort(func(i, j int) bool) method to generated [Type]Slice. This is similar to method StringMap.Sort(), but takes a less function because the order may differ by context, where the sort order of a map is obvious (by key).

punya · 2021-07-21T14:21:38Z

Sorry for the late feedback, but given that this is solely for tests, is it worth writing a single assertion that's implemented using reflection, rather than generating all this code? I was imagining a signature like

// slice1 and slice2 are []T
// compareFunc is func(a, b T) int
func AssertEqualSlices(t *testing.T, slice1, slice2, compareFunc interface{}) { ... }

Clearly this signature is less self-documenting than the generated signatures. But could it be a better trade-off than all the code generation.

(This is also a matter of personal preference, so I would defer to the author or maintainers if they believe the original idea is a better fit.)

jacobmarble · 2021-07-21T14:31:44Z

@punya your comment reflects my comment about StringMap.Sort() and AttributeMap.Sort(). If Sort() is a method on those map/structs, then it should also exist on slice/structs, following the same pattern.

punya · 2021-07-21T15:08:12Z

@jacobmarble understood. I would make a different trade-off for those too, but I recognize that this is a personal preference.

bogdandrutu · 2021-07-21T15:54:15Z

@punya there are some metrics protocols that require ordering of timeseries/points. In that case I think we may need to do a sorting of the points by timestamp, so sorting is most likely to be needed for non-test code as well. Same may apply for logs or even span.

jacobmarble · 2021-07-21T18:18:22Z

@bogdandrutu is there anything left for me on this one?

jacobmarble requested a review from a team as a code owner July 19, 2021 01:10

jacobmarble requested a review from bogdandrutu July 19, 2021 01:10

project-bot bot added this to In progress in Collector Jul 19, 2021

tigrannajaryan assigned bogdandrutu Jul 19, 2021

jacobmarble changed the title ~~pdata: add TypeSlice.Sort(func) to sort slices~~ pdata: add [Type]Slice.Sort(func) to sort slices Jul 19, 2021

jacobmarble mentioned this pull request Jul 19, 2021

Extract pdata into a separate Go Module #3104

Closed

9 tasks

bogdandrutu reviewed Jul 20, 2021

View reviewed changes

cmd/pdatagen/internal/base_slices.go Outdated Show resolved Hide resolved

jacobmarble added 2 commits July 20, 2021 16:22

fix: improve TypeSlice.Sort(less) signature

5458d80

fix: resolve goimports CI check

48ab7da

chore: make genpdata

5474b2c

bogdandrutu approved these changes Jul 22, 2021

View reviewed changes

Collector automation moved this from In progress to Reviewer approved Jul 22, 2021

bogdandrutu merged commit 6814536 into open-telemetry:main Jul 22, 2021

Collector automation moved this from Reviewer approved to Done Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdata: add [Type]Slice.Sort(func) to sort slices #3671

pdata: add [Type]Slice.Sort(func) to sort slices #3671

jacobmarble commented Jul 19, 2021

bogdandrutu commented Jul 20, 2021

jacobmarble commented Jul 20, 2021

punya commented Jul 21, 2021

jacobmarble commented Jul 21, 2021

punya commented Jul 21, 2021

bogdandrutu commented Jul 21, 2021

jacobmarble commented Jul 21, 2021

pdata: add [Type]Slice.Sort(func) to sort slices #3671

pdata: add [Type]Slice.Sort(func) to sort slices #3671

Conversation

jacobmarble commented Jul 19, 2021

bogdandrutu commented Jul 20, 2021

jacobmarble commented Jul 20, 2021

punya commented Jul 21, 2021

jacobmarble commented Jul 21, 2021

punya commented Jul 21, 2021

bogdandrutu commented Jul 21, 2021

jacobmarble commented Jul 21, 2021