Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: introduce stats client #992

Merged
merged 33 commits into from Aug 11, 2020
Merged

stats: introduce stats client #992

merged 33 commits into from Aug 11, 2020

Conversation

jingwei99
Copy link
Contributor

@jingwei99 jingwei99 commented Jul 30, 2020

Description: Introduces a StatsClient capable of giving out Counters that, when incremented, emit stats from Envoy's internal stats engine. Also refactors the top-level platform Engine interface to make both the StreamClient and the StatsClient available.
Risk Level: Low
Testing: Pending

Co-authored-by: Mike Schore mike.schore@gmail.com

library/common/engine.cc Outdated Show resolved Hide resolved
library/common/engine.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@rebello95 rebello95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this approach! One thing I'm wondering about is if we can make elements more intuitive/user friendly in the public interface. Would it make sense to have the platform expose a strongly typed interface/type for that rather than having the user provide an array of strings?
Documentation on the expected usage would also be great.

@goaway goaway changed the title Jh ms/stats poc stats: introduce stats client Aug 1, 2020
@goaway goaway marked this pull request as ready for review August 1, 2020 07:35
@goaway
Copy link
Contributor

goaway commented Aug 3, 2020

@rebello95 I like the idea of having Element be a strong type. There are actually invalid character strings that one could pass, and we can prevent that by having Element have a conditional initializer. This would allow error discovery to happen more or less statically.

goaway and others added 11 commits August 4, 2020 18:30
Co-authored-by: Jingwei <57155915+jingwei99@users.noreply.github.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
goaway and others added 2 commits August 4, 2020 13:23
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Jingwei Hao and others added 2 commits August 5, 2020 18:25
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
library/kotlin/src/io/envoyproxy/envoymobile/BUILD Outdated Show resolved Hide resolved
library/kotlin/src/io/envoyproxy/envoymobile/Counter.kt Outdated Show resolved Hide resolved
library/kotlin/src/io/envoyproxy/envoymobile/Counter.kt Outdated Show resolved Hide resolved
library/kotlin/src/io/envoyproxy/envoymobile/Element.kt Outdated Show resolved Hide resolved
library/kotlin/src/io/envoyproxy/envoymobile/Engine.kt Outdated Show resolved Hide resolved
library/swift/src/StatsClient.swift Show resolved Hide resolved
library/swift/src/StatsClientImpl.swift Outdated Show resolved Hide resolved
library/swift/src/mocks/MockEnvoyEngine.swift Show resolved Hide resolved
library/swift/src/StreamClientImpl.swift Outdated Show resolved Hide resolved
) : StatsClient {

override fun getCounter(vararg elements: Element): Counter {
return Counter(WeakReference(engine), elements.asList())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iOS has a strong ref below, we should keep them consistent (I think a strong ref is what you want here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the weak reference is intended. We would not want to have stats spread through the application to necessarily hold the engine alive, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, this is passing a weak reference to the Counter, whereas on iOS the Counter itself makes the reference weak. Would it make sense to move WeakReference(engine) into the counter file's constructor to make it clear that it doesn't retain the engine?

Separately, I'm wondering if it makes sense to allow counters to retain the engine. Do we want the engine to stick around as long as a consumer is holding on to a counter ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move WeakReference(engine) into the counter file's constructor to make it clear that it doesn't retain the engine?

yea

Do we want the engine to stick around as long as a consumer is holding on to a counter ref?

probably not, counters could be referenced by feature modules of the platform(iOS/android) code, and we probably don't want the feature modules to (indirectly) hold a strong ref to engine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition right now is that it seems natural for engine clients to strongly retain the engine, but perhaps less so for the artifacts they produce. This has more to do with users of the library being able to implicitly manage the lifecycle of the engine than anything else. I should say that I'm not strongly convinced this is right, but it seems reasonable for now for experimentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a resolution here? Looks like the weak ref is still being passed from this line

Copy link
Member

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, and a big question.

@@ -204,6 +204,9 @@ stats_flush_interval: {{ stats_flush_interval_seconds }}s
- safe_regex:
google_re2: {}
regex: '^http.dispatcher.*'
- safe_regex:
google_re2: {}
regex: '^client.*'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer having a more restrictive rule so that people have to think before their application level stats get emitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what rules do you have in mind?
right now all the elements used for the stats are checked against the regex: ^[A-Za-z_]+\$

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with @junr03, long-term it might make sense to make people specify this in configuration, so that some thought goes into the stats they emit. For now, as a new, experimental interface, this is probably good enough to move forward.

library/common/engine.cc Outdated Show resolved Hide resolved
library/common/engine.cc Show resolved Hide resolved
library/common/engine.h Show resolved Hide resolved
library/common/engine.cc Show resolved Hide resolved
) : StatsClient {

override fun getCounter(vararg elements: Element): Counter {
return Counter(WeakReference(engine), elements.asList())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the weak reference is intended. We would not want to have stats spread through the application to necessarily hold the engine alive, right?

@@ -93,6 +93,19 @@ Engine::~Engine() {
main_thread_.join();
}

void Engine::recordCounter(std::string elements, uint64_t count) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we cover this in tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a ticket to cover testing generally at this layer, will link here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goaway don't forget to add this ticket

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created (but it's internal).

Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Jingwei Hao and others added 11 commits August 7, 2020 17:01
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Copy link
Member

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My big question about performance was answered, and a disclaimer was added. @goaway and @jingwei99 will follow up with @jmarantz to discuss the dynamic vs static stats path.

1 missing issue. Otherwise, LGTM.

@@ -93,6 +93,19 @@ Engine::~Engine() {
main_thread_.join();
}

void Engine::recordCounter(std::string elements, uint64_t count) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goaway don't forget to add this ticket

void Engine::recordCounter(std::string elements, uint64_t count) {
if (server_) {
server_->dispatcher().post([this, elements, count]() -> void {
static const std::string client = "client";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No non-pod statics (static init fiasco). Moreover what you really want here since this string is known at compile-time is to save a StatName for "client" using either a StatNamePool or StatNameManagedStorage. You want to do that in a context object that is created once at process startup and then re-used. Then you have no thread contention issues.

Copy link
Contributor

@rebello95 rebello95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but some comments.

  • We seem to be missing Kotlin stats client tests
  • Can we also update documentation / tutorials which are now invalid due to the engine interface changes?
  • Would also be good to add an issue for documenting the stats interface and linking that in the PR description

Comment on lines +57 to +58
"StatsClient.kt",
"StatsClientImpl.kt",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be in stats/ which is captured below with globbing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these ideally live here, basically in the same level as StreamClient, as the interface Engine.kt exposes a StatsClient and a StreamClient.

) : StatsClient {

override fun getCounter(vararg elements: Element): Counter {
return Counter(WeakReference(engine), elements.asList())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a resolution here? Looks like the weak ref is still being passed from this line

Comment on lines 8 to 14
* Element values must conform to the regex /^[A-Za-z_]+$/.
*/
class Element(val element: String) {
init {
if (!Pattern.compile("^[A-Za-z_]+\$").matcher(element).matches()) {
throw IllegalArgumentException(
"Element values must conform to the regex /^[A-Za-z_]+$/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex is hardcoded 3 places here. Can we use a constant instead so we don't accidentally fail to change one of the usages in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair

Comment on lines +26 to +27
"StatsClient.swift",
"StatsClientImpl.swift",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here regarding living in stats/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, StatsClient and its implementation ideally live at the same level as StreamClient given the Engine.swift interface.

Comment on lines 9 to 10
/// - parameter elements: Elements to identify a counter
/// - returns: A Counter based on the joined elements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - parameter elements: Elements to identify a counter
/// - returns: A Counter based on the joined elements.
/// - parameter elements: Elements to identify a counter
///
/// - returns: A Counter based on the joined elements.

MockEnvoyEngine.onRecordCounter = nil
}

func testCounterDelegatesToEngine() throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func testCounterDelegatesToEngine() throws {
func testCounterDelegatesToEngine() {

XCTAssertEqual(actualCount, 1)
}

func testCounterDelegatesToEngineWithCount() throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func testCounterDelegatesToEngineWithCount() throws {
func testCounterDelegatesToEngineWithCount() {

XCTAssertEqual(actualCount, 5)
}

func testCounterWeaklyHoldsEngine() throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func testCounterWeaklyHoldsEngine() throws {
func testCounterWeaklyHoldsEngine() {

import Foundation
import XCTest

final class StatsClientImplTests: XCTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add these tests for Kotlin too?

Comment on lines 55 to 64
name = "engine_builder_tests",
srcs = [
"StreamClientBuilderTests.swift",
"EngineBuilderTests.swift",
],
deps = [
"//library/objective-c:envoy_engine_objc_lib",
],
)

envoy_mobile_swift_test(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we alphabetize these test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking out loud: prolly worth adding a lint/pre-commit check for this in the long run

@jingwei99
Copy link
Contributor Author

@rebello95 : for this comment, yea i missed it in my previous commit, and have a fix locally, planned to push along with other suggested changes

Jingwei Hao added 2 commits August 10, 2020 19:17
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
Signed-off-by: Jingwei Hao <jingweih@lyft.com>
assertThat(elementsCaptor.getValue()).isEqualTo("test.stat")
assertThat(countCaptor.getValue()).isEqualTo(5)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to emulate StatsClientImplTests.testCounterWeaklyHoldsEngine, but I wasn't able to get the EnvoyEngine instance garbage collected - as calling System.gc() doesn't guarantee gc.

@goaway goaway merged commit b9a4334 into main Aug 11, 2020
@goaway goaway deleted the jh-ms/stats-poc branch August 11, 2020 17:50
@rebello95
Copy link
Contributor

@jingwei99 @goaway can we update the docs to reflect the interface changes from this PR?

@jingwei99
Copy link
Contributor Author

@jingwei99 @goaway can we update the docs to reflect the interface changes from this PR?

yup, i'll work on the docs.

@rebello95
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants