Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Ordered with an iterator in export.Labels. #567

Merged
merged 24 commits into from
Mar 19, 2020

Conversation

krnowak
Copy link
Member

@krnowak krnowak commented Mar 17, 2020

This is to address the first step in #539. The metric.SDK.Labels() used to store the slice passed to it, so the slice could still be modified after the function returned. Now it still modifies the passed slice in place, but it isn't used after the function is finished. The export.Labels does not expose the labels slice anymore. Instead it has a getter for an iterator, so we are not forced to use a slice, but have some other possibilities of storing the labels.

Currently, for empty labelset, we use a "zero" iterator, for labelsets with less than 11 keys, we use the reflect-based iterator, for larger labelsets we do a one-time allocation at the checkpoint time and use a slice iterator.

Benchmarks before:

BenchmarkLabels_1-4    8917160    140 ns/op     96 B/op   2 allocs/op
BenchmarkLabels_2-4    6682400    178 ns/op    144 B/op   2 allocs/op
BenchmarkLabels_4-4    4760353    244 ns/op    240 B/op   2 allocs/op
BenchmarkLabels_8-4    3176577    390 ns/op    432 B/op   2 allocs/op
BenchmarkLabels_16-4    785216   1465 ns/op   1584 B/op   3 allocs/op

Benchmarks after:

BenchmarkLabels_1-4    8303445    144 ns/op    128 B/op   2 allocs/op
BenchmarkLabels_2-4    5826736    181 ns/op    176 B/op   2 allocs/op
BenchmarkLabels_4-4    4678610    251 ns/op    272 B/op   2 allocs/op
BenchmarkLabels_8-4    2968780    390 ns/op    464 B/op   2 allocs/op
BenchmarkLabels_16-4    807482   1513 ns/op   1616 B/op   3 allocs/op

Iterator benchmarks:

BenchmarkIterator_0-4    313194547    3.84 ns/op   0 B/op   0 allocs/op
BenchmarkIterator_1-4    54833654    20.2  ns/op   0 B/op   0 allocs/op
BenchmarkIterator_2-4    58748677    20.2  ns/op   0 B/op   0 allocs/op
BenchmarkIterator_4-4    54932950    20.5  ns/op   0 B/op   0 allocs/op
BenchmarkIterator_8-4    58982492    20.4  ns/op   0 B/op   0 allocs/op
BenchmarkIterator_16-4   152314864    7.90 ns/op   0 B/op   0 allocs/op

@jmacd
Copy link
Contributor

jmacd commented Mar 18, 2020

I apologize in advance, but there are some things I don't like about this approach. Sorry!

The use of an iterator pattern feels not like Go to me, and it requires some additional allocations that don't feel necessary. When the SDK creates its orderedLabels value (i.e., the variable-size array copied from the original slice), the array is immutable at that point and can be used in the export pipeline.

We have a labels struct in the SDK that is already a pointer, so it can be placed in an interface without an allocation. I expect the labels struct to go away, but the record will take its place and store the orderedLabels value. So either the labels or record can implement an interface with two methods, instead of an iterator to address the problem at hand--a NumLabels() int and a GetLabel(int) core.KeyValue perhaps. I think this would be a better option--although it is a slightly less usable interface, it saves on an allocation. It forces the user to allocate an index and step through an array.

Of the three fields in labels today, two can disappear (meter becomes available in the embedded instrument in #560, and slice disappears as a result of this work). The remaining orderedLabels array can move into the record and at that point, the export pipeline will take a *record that implements the two methods, which will be passed into the export.Labels. These methods will use reflection and will not cause an allocation if the address is used, like so:

type ExportLabels interface {
   NumLabels() int
   GetLabel(int) core.KeyValue
}
func (r *record) NumLabels(idx int) core.KeyValue {
   return reflect.ValueOf(record.ordered).Len()
}
func (r *record) GetLabel(idx int) core.KeyValue {
   return *(reflect.ValueOf(record.ordered).Index(i).Addr().Interface().(*core.KeyValue))
}

The export.NewLabels() would take a ExportLabels which is really just a pointer to the record. A slightly different approach is required for observers. Can we change the asynchronousInstrument's map[orderedLabels]labeledRecorder to a map[orderedLabels]*labeledRecorder, put the orderedLabels value inside labeledRecorder, and then have labeledRecorder also implement ExportLabels.

Sorry to ask for another approach--I'm open to debate on this topic. Maybe I've got it wrong.

As for the next part of #539, I was imagining new fields added to the record and labeledRecorder to support a cache of encoded strings by encoder. These two types would also, then, implement a new interface to support getting access to the cached encoded string. It makes sense to create a new struct that would contain just an orderedLabels at this point, later it can be extended with support for cached label encodings.

@krnowak
Copy link
Member Author

krnowak commented Mar 18, 2020

I apologize in advance, but there are some things I don't like about this approach. Sorry!

No worries.

The use of an iterator pattern feels not like Go to me, and it requires some additional allocations that don't feel necessary.

I got the inspiration for the iterator pattern from reflect.MapIter and bufio.Scanner. I guess this is as Go-like as doing for i := 0; i < storage.NumLabels(); i++ (which feels like C to me, IMO ;) ).

When the SDK creates its orderedLabels value (i.e., the variable-size array copied from the original slice), the array is immutable at that point and can be used in the export pipeline.

We have a labels struct in the SDK that is already a pointer, so it can be placed in an interface without an allocation. I expect the labels struct to go away, but the record will take its place and store the orderedLabels value. So either the labels or record can implement an interface with two methods, instead of an iterator to address the problem at hand--a NumLabels() int and a GetLabel(int) core.KeyValue perhaps. I think this would be a better option--although it is a slightly less usable interface, it saves on an allocation. It forces the user to allocate an index and step through an array.

Alright, I have kept the iterator, but I made it a simple struct, so creating it shouldn't result in any allocation. This resulted in dropping the zero iterator and reflect iterator thingies. So that's good. I added the LabelStorage interface with methods as you described, so labels implement it for now.

Of the three fields in labels today, two can disappear (meter becomes available in the embedded instrument in #560, and slice disappears as a result of this work). The remaining orderedLabels array can move into the record and at that point, the export pipeline will take a *record that implements the two methods, which will be passed into the export.Labels.

I think I'd like to do it in the next PR to avoid making this bigger. But one thing about the slice (which I renamed to sortSlice in this PR) - we need to keep it for now to avoid the allocation in Labels().

Also (for the future PR), I can't yet see how to move contents of labels into record before actually dropping the Labels method from Meter - I can't return *record from Labels. I think this could be done in a single step.

These methods will use reflection and will not cause an allocation if the address is used, like so:

type ExportLabels interface {
   NumLabels() int
   GetLabel(int) core.KeyValue
}
func (r *record) NumLabels(idx int) core.KeyValue {
   return reflect.ValueOf(record.ordered).Len()
}
func (r *record) GetLabel(idx int) core.KeyValue {
   return *(reflect.ValueOf(record.ordered).Index(i).Addr().Interface().(*core.KeyValue))
}

We seem to lose the ability of using Addr() after r.ordered = reflect.ValueOf(array).Interface(). See https://play.golang.org/p/IM0qJfW04HV. But maybe we can get away from allocations during indexing, so using Addr() should not be needed. Benchmark seems to confirm that.

The export.NewLabels() would take a ExportLabels which is really just a pointer to the record. A slightly different approach is required for observers. Can we change the asynchronousInstrument's map[orderedLabels]labeledRecorder to a map[orderedLabels]*labeledRecorder, put the orderedLabels value inside labeledRecorder, and then have labeledRecorder also implement ExportLabels.

Sorry to ask for another approach--I'm open to debate on this topic. Maybe I've got it wrong.

It's all good - it actually led me to a simpler design I think. Thanks for that. I have pushed my changes so far. One thing - I called my interface LabelStorage instead of ExportLabels. I think it's a bit clearer name as ExportLabels is yet another name with "labels" in it in the export package. Also, export.ExportLabels. Stutter bad.

As for the next part of #539, I was imagining new fields added to the record and labeledRecorder to support a cache of encoded strings by encoder. These two types would also, then, implement a new interface to support getting access to the cached encoded string. It makes sense to create a new struct that would contain just an orderedLabels at this point, later it can be extended with support for cached label encodings.

Yeah, something to experiment with when this PR lands.

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

exporters/otlp/internal/transform/metric.go Show resolved Hide resolved
sdk/export/metric/metric.go Outdated Show resolved Hide resolved
sdk/export/metric/metric.go Show resolved Hide resolved
sdk/metric/sdk.go Show resolved Hide resolved
sdk/metric/sdk.go Outdated Show resolved Hide resolved
sdk/metric/sdk.go Show resolved Hide resolved
@krnowak
Copy link
Member Author

krnowak commented Mar 19, 2020

Updated.

This is really an inconvenient implementation detail leak - we may
want to store labels in a different way. Replace it with an iterator -
it does not force us to use slice of key values as a storage in the
long run.
It may come in handy in several situations, where we don't have access
to export.Labels object, but only to the label iterator.
Makes my life easier when writing a benchmark. Might also be an
alternative to cloning the iterator.
By not using the value created by `reflect.New()`, but rather by
`reflect.ValueOf()`, we get a non-addressable array in the value,
which does not infer an allocation cost when getting an element from
the array.
This can be substituted by a reflect value iterator that goes over a
value with a zero-sized array.
In the long run this will completely replace the LabelIterator
interface.
It's a leftover from interface times and now it's pointless - the
iterator is a simple struct, so cloning it is a simple copy.
The sole existence of Reset was actually for benchmarking convenience.
Now we can just copy the iterator cheaply, so a need for Reset is no
more.
So we won't get into problems when several goroutines want to iterate
the same labels at the same time. Not sure if this would be a big
deal, since every goroutine would compute the same reflect.Value, but
concurrent write to the same memory is bad anyway. And it doesn't cost
us any extra allocations anyway.
@krnowak
Copy link
Member Author

krnowak commented Mar 19, 2020

Rebased, resolved the conflicts.

@jmacd jmacd merged commit a01f63b into open-telemetry:master Mar 19, 2020
@krnowak krnowak deleted the labelset-removal-prep branch March 25, 2020 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants