Metrics export pipeline + metrics stdout exporter #341

lzchen · 2019-12-19T19:14:06Z

This completes an end-to-end metrics export pipeline.

Uses alot of elements from the go implementation as well as the WIP spec.

The new files that are added relate to the export pipeline: batcher.py, aggregate.py and controller.py, which contain the implementations for Batcher, Aggregator and Controller classes (see this diagram for an overview of how the pieces work together).

Currently, only the UngroupedBatcher and CounterAggregator are implemented. This also means that there are no AggregationSelector s since aggregations for Measure have not been implemented. a PushController is also included, which is worker thread that continuously initiates collection of metrics and calls the exporter. metrics_example.py` in the examples folder shows how to use these components.

Some SDK changes:

Metric s have a reference now to the Meter that created them. This is used for constructing an aggregator for the specific metric type when creating a handle.
LabelSet has a field for its encoded value

…try-python into aggregate

codecov-io · 2019-12-31T20:45:09Z

Codecov Report

Merging #341 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master     #341   +/-   ##
=======================================
  Coverage   85.85%   85.85%           
=======================================
  Files          41       41           
  Lines        2078     2078           
  Branches      242      242           
=======================================
  Hits         1784     1784           
  Misses        223      223           
  Partials       71       71

Impacted Files	Coverage Δ
...elemetry-api/src/opentelemetry/metrics/__init__.py	`95.58% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6136987...5f68832. Read the comment docs.

toumorokoshi

As this is built on a WIP spec I don't see much of a reason to not approve it.

A couple thoughts:

it would be really nice if there was some clear, concise examples. The unit tests seem to cover different pieces without giving me a full example on how I configure a meter in an app to actually record and export a value to console. That would be very helpful if the goal is to evaluate the API
to that end, some sort of document that helps outline how to work on more and more complex cases would be helpful, even before implementing any more code. For example:

a. start with configuring a single, no dimensionality metric and exporting to console
b. single metric with dimensions (introduce label sets)
c. multiple metrics that need to be aggregated (aggregator).

I've left some thoughts, but it's very hard to evaluate the interface and usage without a common use case. Thoughts on making the next PR a full example of collecting application metrics (CPU / memory, thread count) so we can see how it'll actually work in practice?

toumorokoshi · 2020-01-01T21:36:47Z

opentelemetry-api/src/opentelemetry/metrics/__init__.py

@@ -220,7 +220,7 @@ def create_metric(
        metric_type: Type[MetricT],
        label_keys: Sequence[str] = (),
        enabled: bool = True,
-        monotonic: bool = False,
+        alternate: bool = False,


IMO "alternate" doesn't really correspond to the behavior. is this still in flux? what's the best way to give feedback if this is a spec-level decision?

You're correct. There are new terms for expressing default behaviour here.

open-telemetry/opentelemetry-specification#430 also touches that definition -> https://github.com/jmacd/opentelemetry-specification/blob/jmacd/api-metrics-03/specification/api-metrics.md#optional-semantic-restrictions.

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

toumorokoshi · 2020-01-01T21:59:46Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

@@ -298,9 +373,6 @@ def get_label_set(self, labels: Dict[str, str]):
        # Use simple encoding for now until encoding API is implemented
        encoded = tuple(sorted(labels.items()))


what's the purpose behind the label encodings? is this similar to the previous PR?

There isn't much of a performance improvement here, since the creation of the encoded cache key to even check against is quite expensive (iterate dict -> sort -> tuple).

thinking through it, there's some level of merit in not recreating the same label set over and over again, but constructing the labels dictionary is of itself quite expensive.

Correct, I believe the optimization is so that the same label set does not need to be calculated over and over again. Currently, the encoding used is a default placeholder encoding type. There is talks about implementing a configurable encoder in the SDK, so user's can configure the type of encoding appropriate to the exporter they want to send metrics to. There is no specs for it yet but the Go implementation has created one. I think once we have this, vendors can pass in their own encoders with however they want to encode, so the iterate dict -> sort -> tuple algorithm does not always apply.

toumorokoshi · 2020-01-01T22:07:00Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/aggregate.py

+    def __init__(self):
+        self.current = None
+        self.check_point = None
+        self._lock = threading.Lock()


is a lock the right thing to put in the base class? I understand most implementations will want that, but it's not particularly related to the interface, and we may be needlessly locking if we have an async aggregator or use atomic integers and doubles.

I'm not sure if an async aggregator makes sense, as we would want to export metric values at a certain point in time. With that being said, all implementations would need a way to handle concurrent updates, so I would think putting the lock in the base class makes sense.

More generally, I think there's a good reason for keeping implementation out of ABCs even though it's technically allowed. We addressed (but didn't resolve) this in #311 (comment).

In this particular case I still don't think it makes sense to instantiate the lock here. This (half-abstract) implementation doesn't use it, and it's not part of the class' API.

toumorokoshi · 2020-01-01T22:08:00Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/aggregate.py

+        self.current += value
+
+    def checkpoint(self):
+        # TODO: Implement lock-free algorithm for concurrency


to completely contradict myself earlier: is it worth implementing a lock-free algorithm here? checkpoint shouldn't be called particularly often, and lock-free algorithms are easy to get wrong and can often lead to worse performance.

There is no specs for for how to implement checkpoint to be concurrent. There was talks that using a lock would be too slow, but I agree that lock-free algorithms are prone to error and might not guarantee performance. I think I might change this to a question, rather than a TODO.

opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/batcher.py

toumorokoshi · 2020-01-01T22:30:07Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/aggregate.py

+            self.current = 0
+
+    def merge(self, other):
+        self.check_point += other.check_point


should current be updated as well? seems like this addition would be cleared immediately by the next call to checkpoint, which will match what exists in current.

merge is usually called right after checkpoint() in the process step in a Batcher. This handles the race condition where an update to the metric happens at the same time as process. merge updates the checkpoint value to prepare for exporting, while the current is left untouched (but changed by the update seperately to be exported later on).

toumorokoshi · 2020-01-01T22:34:41Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/controller.py

+        # Collect all of the meter's metrics to be exported
+        self.meter.collect()
+        # Export the given metrics in the batcher
+        self.exporter.export(self.meter.batcher.check_point_set())


is there a design concern with calling batcher methods directly for things like setting checkpoints and declaring the collection as finished, but calling the more general meter on collect?

I guess to be clearer: are there operations that the general meter should perform, on top of just the batcher?

The architecture is designed so that the meter handles the "metric related events" such as construction, updates and label sets, while the batcher is responsible for collection and aggregation. Yes, the meter could technically perform all the instructions handled by the batcher, but this separation of responsibilities by the above seems like a good logical pattern. What design concerns are you thinking of specifically?

toumorokoshi · 2020-01-01T22:41:16Z

opentelemetry-sdk/tests/metrics/export/test_export.py

+        aggregator.update(1.0)
+        label_set = {}
+        batch_map = {}
+        batch_map[(metric, "")] = (aggregator, label_set)


it looks like this block sets a very specific expectation around the format of the batch_map. In general, I feel like this is a bit error-prone as assigning the value requires knowing the right pair of tuples to set as the key and the value.

Is there some refactoring that can be done to not require direct manipulation of the batch_map, providing helper methods instead? that way, there's clear documentation on the structure of the key-value pairs, or hopefully one doesn't have to set the key-value pairs at all.

Good point. I think adding a helper method to assign values to the batch map would be good.

EDIT: There seems to be some problems with circular imports (since init.py depends on the batcher.py and batcher.py would have to depend on init.py for typing). Sphinx also complains too since the forward refs aren't able to be found. Any ideas on how to fix this?

where do you see the circular imports? In general this can be solved by factoring out the circular component in it's own module, and having the circular importees import that instead.

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

lzchen · 2020-01-07T00:36:49Z

@toumorokoshi
metrics_example.py shows how users can create, update and export metrics. Is the example sufficient?

lzchen · 2020-01-07T23:58:02Z

I've added some more examples under examples/metrics/ with comments. It should outline the basic usages of metrics.

…try-python into aggregate

mauriciovasquezbernal

It's a partial review, I haven't finished but I don't want to risk losing my comments.

So far I am concerned about two points:

I don't see usage of locks / atomic types to handle concurrency.
Handles are not being released.

examples/metrics/non_stateful.py

opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/batcher.py

examples/metrics/non_stateful.py

opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/batcher.py

opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/aggregate.py

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

mauriciovasquezbernal

Some few nits, nothing to worry about.

I think it is a good base to move ahead. It would be nice if we could have a list of tasks to be done after this is merged, like, implementing other aggregators, etc.

mauriciovasquezbernal · 2020-02-07T19:25:48Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/controller.py

+            self.tick()
+
+    def shutdown(self):
+        self.finished.set()


Should the controller perform the last tick() before shutting down?

Good question. There isn't clearly defined behaviour yet for flushing metrics on application close. I think this is a discussion for the spec.

examples/metrics/non_stateful.py

ext/opentelemetry-ext-flask/tests/test_flask_integration.py

examples/metrics/non_stateful.py

examples/opentelemetry-example-app/src/opentelemetry_example_app/metrics_example.py

opentelemetry-api/src/opentelemetry/metrics/__init__.py

…try-python into aggregate

lzchen · 2020-02-08T00:47:12Z

@mauriciovasquezbernal Thanks so much for reviewing! Can you approve if everything else looks okay?

mauriciovasquezbernal

Last point, if tick() is not added in #341 (comment), then a sleep has to be added to the record example, otherwise it'll not print anything, besides that LGTM!

c24t

Some drive-by comments on units in the examples.

examples/metrics/record.py

examples/metrics/stateful.py

examples/metrics/stateless.py

c24t · 2020-02-10T07:03:41Z

I don't want to include too many loader changes in this PR. As of now, the examples would not work because the loader doesn't seem to be addressing use case.

I opened lzchen#7 to fix the default meter type. A better long term fix might be to pass both the abstract and defaut types to the loader.

Make default meter a DefaultMeter

…try-python into aggregate

c24t

No more blocking comments from me, I'll follow up with a comment about my lingering design concerns.

Initial implementation of the end-to-end metrics pipeline.

* feat(jaeger-exporter): adds flushing on an interval closes open-telemetry/opentelemetry-js#340 * respond to comments * yarn fix

lzchen added 6 commits December 12, 2019 16:41

metrics

bbfb6f6

implementation

da5a9f4

comments

c783e48

fix tests

88bdc57

tests

ea77627

Merge branch 'master' of https://github.com/open-telemetry/openteleme…

07e5bb5

…try-python into aggregate

lzchen requested a review from a team as a code owner December 19, 2019 19:14

lzchen added 3 commits December 19, 2019 14:11

fix stateful logic

b8e4aed

batcher tests

bb814ec

test aggregate

2f44e84

This was referenced Dec 20, 2019

Add Prometheus exporter #157

Closed

Add OT collector metrics exporter #344

Closed

lzchen added 5 commits December 26, 2019 15:15

fix lint

e7c8eaa

fix lint

248a793

fix lint

4ce4ae1

fix lint

9be693c

fix lint

f0f302e

toumorokoshi approved these changes Jan 1, 2020

View reviewed changes

toumorokoshi added discussion Issue or PR that needs/is extended discussion. needs reviewers PRs with this label are ready for review and needs people to review to move forward. labels Jan 1, 2020

address comments

8351ef9

lzchen added 4 commits January 6, 2020 17:13

fix tests

4408d94

fix lint

9db1540

Add setters

cc862b9

add examples

7a5a14d

lzchen added 2 commits January 8, 2020 11:04

fix lint

f9fbd6d

fix lint

ba41d38

lzchen added 2 commits February 5, 2020 11:08

Build

fe44402

Merge branch 'master' of https://github.com/open-telemetry/openteleme…

d011b8e

…try-python into aggregate

mauriciovasquezbernal reviewed Feb 6, 2020

View reviewed changes

lzchen added 5 commits February 6, 2020 16:13

address comments

6588aee

Shutdown exporter on exit

d99a2a3

fix tests

1c9d44d

fix test

56f68e8

black

cb51341

mauriciovasquezbernal reviewed Feb 7, 2020

View reviewed changes

lzchen added 2 commits February 7, 2020 16:45

Address comments

b1bfa38

Merge branch 'master' of https://github.com/open-telemetry/openteleme…

ac1aff9

…try-python into aggregate

mauriciovasquezbernal approved these changes Feb 8, 2020

View reviewed changes

This was referenced Feb 9, 2020

Add observer, remove gauge #403

Closed

Remove absolute, monotonic metric instrument options #404

Closed

c24t reviewed Feb 9, 2020

View reviewed changes

Make default meter a DefaultMeter

f135cc8

lzchen added 6 commits February 10, 2020 10:12

Merge pull request #7 from c24t/fix-meter-loader-types

897a8ba

Make default meter a DefaultMeter

Fix typing

f73da8d

add sleep

1619575

fix units

6136987

Merge branch 'master' of https://github.com/open-telemetry/openteleme…

08a5357

…try-python into aggregate

Fix lint

5f68832

c24t approved these changes Feb 11, 2020

View reviewed changes

c24t merged commit fe99dbc into open-telemetry:master Feb 11, 2020

toumorokoshi pushed a commit to toumorokoshi/opentelemetry-python that referenced this pull request Feb 17, 2020

Metrics export pipeline + metrics stdout exporter (open-telemetry#341)

e4ccc2f

Initial implementation of the end-to-end metrics pipeline.

mauriciovasquezbernal mentioned this pull request Feb 19, 2020

Implement observer instrument #425

Merged

lzchen mentioned this pull request Oct 7, 2020

Add timestamps to aggregators and OTLP metrics exporter #1199

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics export pipeline + metrics stdout exporter #341

Metrics export pipeline + metrics stdout exporter #341

lzchen commented Dec 19, 2019

codecov-io commented Dec 31, 2019 •

edited

Loading

toumorokoshi left a comment

toumorokoshi Jan 1, 2020

lzchen Jan 6, 2020

mauriciovasquezbernal Feb 6, 2020

toumorokoshi Jan 1, 2020

toumorokoshi Jan 1, 2020

lzchen Jan 6, 2020

toumorokoshi Jan 1, 2020

lzchen Jan 6, 2020

c24t Jan 22, 2020

toumorokoshi Jan 1, 2020

lzchen Jan 6, 2020

toumorokoshi Jan 1, 2020

lzchen Jan 6, 2020

toumorokoshi Jan 1, 2020

lzchen Jan 6, 2020

toumorokoshi Jan 1, 2020

lzchen Jan 6, 2020

lzchen Jan 7, 2020

toumorokoshi Jan 9, 2020

lzchen commented Jan 7, 2020

lzchen commented Jan 7, 2020

mauriciovasquezbernal left a comment

mauriciovasquezbernal left a comment

mauriciovasquezbernal Feb 7, 2020

lzchen Feb 8, 2020

lzchen commented Feb 8, 2020

mauriciovasquezbernal left a comment

c24t left a comment

c24t commented Feb 10, 2020

c24t left a comment

		@@ -298,9 +373,6 @@ def get_label_set(self, labels: Dict[str, str]):
		# Use simple encoding for now until encoding API is implemented
		encoded = tuple(sorted(labels.items()))

Metrics export pipeline + metrics stdout exporter #341

Metrics export pipeline + metrics stdout exporter #341

Conversation

lzchen commented Dec 19, 2019

codecov-io commented Dec 31, 2019 • edited Loading

Codecov Report

toumorokoshi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lzchen commented Jan 7, 2020

lzchen commented Jan 7, 2020

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lzchen commented Feb 8, 2020

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

c24t left a comment

Choose a reason for hiding this comment

c24t commented Feb 10, 2020

c24t left a comment

Choose a reason for hiding this comment

codecov-io commented Dec 31, 2019 •

edited

Loading