Migrate from okhttp to armeria #2653

anuraaga · 2019-07-08T09:03:17Z

I haven't gotten to skim through this myself yet, so there's probably silly cleanups needed still, but feel free to take an initial look too if you have time. I also haven't run the integration tests yet.

In a followup PR, I'll probably migrate from moshi to jackson since we already depend on the latter through armeria / spring, so we can remove moshi and okio for a few hundred K of library savings. The fact that we don't natively use okio buffers now makes using moshi more complicated, and probably slower.

codefromthecrypt · 2019-07-08T09:58:29Z

this is the second time you've written elasticsearch support now. this time you dont need it yourself! many thanks for the help

codefromthecrypt · 2019-07-09T00:48:57Z

ps I plan to dig deeper into this over the weekend because trying to finish javascript and it is taking all my willpower to resist other things ;)

…

On Mon, Jul 8, 2019, 5:58 PM Adrian Cole ***@***.***> wrote: this is the second time you've written elasticsearch support now. this time you dont need it yourself! many thanks for the help

anuraaga · 2019-07-09T10:43:38Z

Updated zipkin-server too and this is getting better, but one WIP is setting connect timeout, which isn't too hard but will bring back the close method on the storage which I thought wasn't needed by using Armeria's default clientfactory.

Also for the record, want to say it was quite painful updating the zipkin-server tests due to Kotlin. Not only is it unfamiliar and a big context switch, but I find it has quite strange syntax and quirks when integrating with normal Java libraries - for the life of me I couldn't get assertThat(Throwable).satisfies { checks } to even compile. So another post-humous vote against the Kotlin tests ;)

codefromthecrypt · 2019-07-09T15:19:18Z

appreciate the feedback rag. it isnt an irreversible thing (kotlin tests) though I am not sure if intellij will help the other direction or not. in hindsight I think more may have thought it was a nice idea in abstract than were directly impacted in a good or bad way. you are the first to feel impact and while it might be water under the bridge.. reversing is an option.

…

On Tue, Jul 9, 2019, 6:43 PM Anuraag Agrawal ***@***.***> wrote: Updated zipkin-server too and this is getting better, but one WIP is setting connect timeout, which isn't too hard but will bring back the close method on the storage which I thought wasn't needed by using Armeria's default clientfactory. Also for the record, want to say it was quite painful updating the zipkin-server tests due to Kotlin. Not only is it unfamiliar and a big context switch, but I find it has quite strange syntax and quirks when integrating with normal Java libraries - for the life of me I couldn't get assertThat(Throwable).satisfies { checks } to even compile. So another post-humous vote against the Kotlin tests ;) — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#2653?email_source=notifications&email_token=AAAPVV7PNDPWZ4ZBQIXAXLLP6RTVZA5CNFSM4H6ZPMGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZP4ACI#issuecomment-509591561>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAPVV5PSJNHPURV73VROB3P6RTVZANCNFSM4H6ZPMGA> .

anuraaga · 2019-07-10T09:16:41Z

V6 integration tests are working fine now, for some reason the armeria client can't connect to V7. The build passing is deceptive, since the V7 tests got skipped due to the connection failure will look more.

…ort it.

anuraaga · 2019-07-10T09:23:34Z

Ok V7 should be all good too

codefromthecrypt

Thanks for all the hard yards here.

Main last mile will be making sure zipkin-aws elastcisearch is possible to refactor over the work here
https://github.com/openzipkin/zipkin-aws/tree/master/autoconfigure/storage-elasticsearch-aws/src/main/java/zipkin/autoconfigure/storage/elasticsearch/aws

.../java/zipkin2/server/internal/elasticsearch/ZipkinElasticsearchStorageAutoConfiguration.java

...ver/src/test/kotlin/zipkin2/elasticsearch/ZipkinElasticsearchStorageAutoConfigurationTest.kt

codefromthecrypt · 2019-07-11T00:47:32Z

zipkin-server/src/test/kotlin/zipkin2/server/internal/elasticsearch/BasicAuthInterceptorTest.kt

-  var client: OkHttpClient = OkHttpClient.Builder()
-    .addNetworkInterceptor(BasicAuthInterceptor(ZipkinElasticsearchStorageProperties(false, 0)))
-    .build()
+  companion object {


sorry.. this looks like it probably wasn't fun to figure out

codefromthecrypt · 2019-07-11T00:48:23Z

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java

  }

-  @Override
-  public void close() {
-    if (!shutdownClientOnClose()) return;


intentional to remove the shutdownClientOnClose?

Yeah without customization, we use the default ClientFactory, and it's a no-op to close it so no need for us to worry.

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java

codefromthecrypt · 2019-07-11T00:49:24Z

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/PseudoAddressRecordSet.java

- * This returns a Dns provider that combines the IPv4 or IPv6 addresses from a supplied list of
- * urls, provided they are all http and share the same port.
- */
-final class PseudoAddressRecordSet {


wow forgot how much code this was..

codefromthecrypt · 2019-07-11T00:52:16Z

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/internal/client/HttpCall.java

@@ -13,17 +13,20 @@
 */
 package zipkin2.elasticsearch.internal.client;

-import java.io.Closeable;
+import com.fasterxml.jackson.databind.util.ByteBufferBackedInputStream;


can we delay jackson to a later pull request with some of the okio utilities?

ex does one or the other here help?
https://github.com/square/okio/pull/346/files
https://github.com/square/okio/blob/d7fcee8db94177ac4d02e205f928d8425f352d52/okio/src/jvmTest/java/okio/NioTest.java

if not, add a TODO: to follow-up to swap out okio/moshi as we shouldn't really have multiple things doing the same if we can avoid it.

Sure - fixed the bug in ReadBuffer and used it

anuraaga

First Kotlin weirdness, now weirdest Armeria issue ever line/armeria#1895

I've learned my lesson to not call issues straightforward without care ;)

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java

anuraaga · 2019-07-11T09:15:40Z

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java

  }

-  @Override
-  public void close() {
-    if (!shutdownClientOnClose()) return;


Yeah without customization, we use the default ClientFactory, and it's a no-op to close it so no need for us to worry.

anuraaga · 2019-07-11T09:16:56Z

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/internal/client/HttpCall.java

@@ -13,17 +13,20 @@
 */
 package zipkin2.elasticsearch.internal.client;

-import java.io.Closeable;
+import com.fasterxml.jackson.databind.util.ByteBufferBackedInputStream;


Sure - fixed the bug in ReadBuffer and used it

codefromthecrypt · 2019-07-12T00:24:00Z

build is green apart from being too long which made travis kill the job

codefromthecrypt · 2019-07-12T00:34:28Z

Thanks for all the work on this, Rag.

I started ES 6.8 then ran zipkin off this branch like so:

$ curl -sSL https://jitpack.io/com/github/anuraaga/zipkin/zipkin-server/elasticsearch-armeria-SNAPSHOT/zipkin-server-elasticsearch-armeria-SNAPSHOT-exec.jar > zipkin.jar
$ STORAGE_TYPE=elasticsearch ES_HTTP_LOGGING=BODY java -jar zipkin.jar

I then hit the health check endpoint like this:

$ curl -s localhost:9411/health

The health check hasn't returned a value (hung). I looked at the server logs and see the requests made (though it isn't unrolling the gzipped response into the console which iirc the old code did)

2019-07-12 09:31:11.045  INFO 34688 --- [-worker-nio-2-5] c.l.a.c.l.LoggingClient                  : Request: {startTime=2019-07-12T00:31:11.035Z(1562891471035865), length=0B, duration=7805µs(7805101ns), scheme=none+h1c, headers=[:method=GET, :path=/_cluster/health/zipkin*span-*, :authority=localhost:9200, :scheme=http, accept-encoding=gzip,deflate, user-agent=armeria/0.88.0]}
2019-07-12 09:31:11.071  INFO 34688 --- [-worker-nio-2-5] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-12T00:31:11.070Z(1562891471070979), length=234B, duration=372µs(372805ns), headers=[:status=200, content-type=application/json; charset=UTF-8, content-encoding=gzip, content-length=234]}

codefromthecrypt · 2019-07-13T11:17:49Z

I added --no-cache to the docker-compose instructions just in case.. I will do it with the exact commit id if we think somehow jitpack is stale..

codefromthecrypt · 2019-07-13T11:23:13Z

sorry will check!

anuraaga · 2019-07-13T11:24:41Z

Thanks :) FWIW, ran a simple stress test via ITElasticStorage (hence log4j2.properties is different from normal server) on master and compared the logs look closer to those ones which must be OKHTTP logs

INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (3970ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (3977ms, 248851-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4031ms, 248631-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (3956ms, 248455-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4043ms, 248522-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4064ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4028ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4072ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4138ms, 249038-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4297ms, 248647-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4180ms, 248849-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4334ms, 248827-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4341ms, 248816-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://localhost:32800/_bulk?refresh=wait_for (4246ms, 248712-byte body)
Jul 13, 2019 8:21:42 PM okhttp3.internal.platform.Platform log
INFO: --> POST http://localhost:32800/_bulk?refresh=wait_for (-1-byte body)

codefromthecrypt · 2019-07-13T11:27:18Z

the problem was that I looked at stats and 100% of the bulk requests failed. I'll lock in the commit and try again though.

codefromthecrypt · 2019-07-13T11:30:13Z

ok I rebuilt using the commit tag

--- a/zipkin/Dockerfile
+++ b/zipkin/Dockerfile
@@ -16,7 +16,7 @@ FROM alpine
 
 ENV USER anuraaga
 ENV BRANCH elasticsearch-armeria
-ENV VERSION SNAPSHOT
+ENV VERSION 76263f9bd0-1
 
 # Add environment settings for supported storage types
 COPY zipkin/ /zipkin/

from here https://jitpack.io/com/github/anuraaga/zipkin/elasticsearch-armeria-76263f9bd0-1/build.log

I'll do the test again now..

codefromthecrypt · 2019-07-13T11:36:11Z

definitely I had an old version before.. maybe because I didn't do --no-cache at first, but anyway I will only test with commit IDs now to reduce distraction.

I'm sure you are right by the way, they were okhttp logs!

Definitely the logs look different

zipkin                      | 2019-07-13 11:34:10.080  INFO 1 --- [orker-epoll-4-6] c.l.a.c.l.LoggingClient                  : Request: {startTime=2019-07-13T11:34:10.074Z(1563017650074736), length=3739KiB(3829669B), duration=5635?s(5635700ns), scheme=none+h1c, headers=[]}
zipkin                      | 2019-07-13 11:34:10.313  INFO 1 --- [orker-epoll-4-7] c.l.a.c.l.LoggingClient                  : Request: {startTime=2019-07-13T11:34:10.311Z(1563017650311478), length=3397KiB(3479449B), duration=1783?s(1783500ns), scheme=none+h1c, headers=[]}
zipkin                      | 2019-07-13 11:34:12.416  WARN 1 --- [orker-epoll-4-1] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:12.416Z(1563017652416394), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:13.197  WARN 1 --- [orker-epoll-4-8] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:13.197Z(1563017653197137), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:15.379  WARN 1 --- [orker-epoll-4-2] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:15.378Z(1563017655378782), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:17.530  WARN 1 --- [orker-epoll-4-3] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:17.530Z(1563017657530053), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:18.613  WARN 1 --- [orker-epoll-4-4] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:18.613Z(1563017658613401), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:18.815  WARN 1 --- [orker-epoll-4-5] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:18.815Z(1563017658815107), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:20.228  WARN 1 --- [orker-epoll-4-6] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:20.228Z(1563017660228235), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      | 
zipkin                      | 2019-07-13 11:34:20.304  WARN 1 --- [orker-epoll-4-7] c.l.a.c.l.LoggingClient                  : Response: {startTime=2019-07-13T11:34:20.304Z(1563017660304298), length=0B, duration=0ns, cause=com.linecorp.armeria.client.ResponseTimeoutException, headers=[]}
zipkin                      | 
zipkin                      | com.linecorp.armeria.client.ResponseTimeoutException: null
zipkin                      |

codefromthecrypt · 2019-07-13T11:37:25Z

so basically it is still dropping almost every span before and after I guess..

codefromthecrypt · 2019-07-13T11:37:46Z

$ curl -s localhost:9411/metrics|jq .
{
  "counter.zipkin_collector.messages.http": 46,
  "counter.zipkin_collector.spans_dropped.http": 253822,
  "gauge.zipkin_collector.message_bytes.http": 1857454,
  "counter.zipkin_collector.bytes.http": 92070197,
  "gauge.zipkin_collector.message_spans.http": 5877,
  "counter.zipkin_collector.spans.http": 258104,
  "counter.zipkin_collector.messages_dropped.http": 0
}

codefromthecrypt · 2019-07-13T11:39:15Z

I'm running against last release to check also..

codefromthecrypt · 2019-07-13T11:50:06Z

to make docker not fall over, I needed to change the sender configuration to send less data at a time. This is before and after the change. I just got a clean run of before.. will do one of the after now.

diff --git a/webmvc4/pom.xml b/webmvc4/pom.xml
index af9f420..79d2a2c 100755
--- a/webmvc4/pom.xml
+++ b/webmvc4/pom.xml
@@ -18,7 +18,7 @@
 
     <spring.version>4.3.24.RELEASE</spring.version>
     <log4j.version>2.11.2</log4j.version>
-    <brave.version>5.6.6</brave.version>
+    <brave.version>5.6.7</brave.version>
   </properties>
 
   <dependencyManagement>
diff --git a/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java b/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java
index 095c043..38722b2 100644
--- a/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java
+++ b/webmvc4/src/main/java/brave/webmvc/TracingConfiguration.java
@@ -20,6 +20,7 @@ import org.springframework.context.annotation.Import;
 import org.springframework.web.servlet.config.annotation.InterceptorRegistry;
 import org.springframework.web.servlet.config.annotation.WebMvcConfigurerAdapter;
 import zipkin2.Span;
+import zipkin2.codec.Encoding;
 import zipkin2.reporter.AsyncReporter;
 import zipkin2.reporter.Sender;
 import zipkin2.reporter.okhttp3.OkHttpSender;
@@ -38,7 +39,10 @@ public class TracingConfiguration extends WebMvcConfigurerAdapter {
 
   /** Configuration for how to send spans to Zipkin */
   @Bean Sender sender() {
-    return OkHttpSender.create("http://127.0.0.1:9411/api/v2/spans");
+    return OkHttpSender.newBuilder()
+        .endpoint("http://127.0.0.1:9411/api/v2/spans")
+        .messageMaxBytes(4096)
+        .encoding(Encoding.PROTO3).build();
   }
 
   /** Configuration for how to buffer spans into messages for Zipkin */

anuraaga · 2019-07-13T11:54:28Z

Tweaked toString to get this now

{"status":"UP","zipkin":{"status":"UP","details":{"ElasticsearchStorage{clientCustomizer=[TimeoutCustomizer{timeout=10000ms}, LoggingCustomizer{httpLogging=BODY}], clientFactoryCustomizer=[TimeoutCustomizer{timeout=10000ms}], hostsSupplier=[http://localhost:9200], pipeline=null, flushOnWrites=false, maxRequests=64, strictTraceId=true, searchEnabled=true, autocompleteKeys=[], autocompleteTtl=3600000, autocompleteCardinality=20000, indexShards=5, indexReplicas=1, indexNameFormatter=IndexNameFormatter{index=zipkin, dateSeparator=-, dateFormat=yyyy-MM-dd}, namesLookback=86400000}":{"status":"UP"}}}}

codefromthecrypt · 2019-07-13T12:00:00Z

cool... I did a run with the prior commit, but metrics weren't reported at all for spans. Some unrelated glitch I'm sure..

curl -s localhost:9411/metrics|jq .
{
  "counter.zipkin_collector.messages.http": 17424,
  "counter.zipkin_collector.spans_dropped.http": 0,
  "gauge.zipkin_collector.message_bytes.http": 2736,
  "counter.zipkin_collector.bytes.http": 69106481,
  "gauge.zipkin_collector.message_spans.http": 0,
  "counter.zipkin_collector.spans.http": 0,
  "counter.zipkin_collector.messages_dropped.http": 0
}

Anyway, I'm running off ENV VERSION c1d3eac893-1 and will try again

codefromthecrypt · 2019-07-13T12:08:15Z

metrics aren't coming through, but it is only the span count.. so weird.

codefromthecrypt · 2019-07-13T12:16:44Z

I will build local as I can't see easily why the metrics are zero for spans. debugging the client shows the http response was 202

codefromthecrypt · 2019-07-13T12:32:13Z

the problem in span stats is kindof making it hard for me to tell what's going on. it is possible something subtle during the bump to latest micrometer, but it is so weird. Ex I can manually send traffic, then a few times it will increase the number but after a while it doesn't. meanwhile the other stats seem fine.

codefromthecrypt · 2019-07-13T12:41:32Z

ok something's odd

codefromthecrypt · 2019-07-13T12:49:05Z

the above thing has to do with proto encoding. I'm switching tests back to json, just smaller messages, until it is figured out.

codefromthecrypt · 2019-07-13T13:01:30Z

ok sorry about all the chatter. a successful test scenario where any load occurs requires storage throttling as we know ES falls over especially on localhost.

Ex if directly like this..

$ STORAGE_THROTTLE_ENABLED=true STORAGE_TYPE=elasticsearch java -jar ./zipkin-server/target/zipkin-server-*exec.jar

Then, something unrelated to this is busted in proto3 encoding. To avoid this, don't use it :P Ex. use the following sender setup in brave-webmvc-example. It is important to override the max bytes to something less than default 5MiB, if using laptop etc, but I don't know what that value is.

  /** Configuration for how to send spans to Zipkin */
  @Bean Sender sender() {
    return OkHttpSender.newBuilder()
        .endpoint("http://127.0.0.1:9411/api/v2/spans")
        .messageMaxBytes(4096)
        .build();
  }

Anyway, with storage throttling enabled and the above running directly off this branch I got a clean bench against brave-webmvc-example. zero spans dropped. I haven't tried to setup this in docker, yet but I don't have it in me to do that tonight :P

anuraaga · 2019-07-13T14:01:21Z

Well that's enough for me to move this off of draft :)

codefromthecrypt

This was a great amount of work. thank you! While not critical for this PR, I want to look into the self-tracing and see why it isn't working. Otherwise when you feel ready to merge, go ahead!

codefromthecrypt · 2019-07-13T23:06:39Z

zipkin-server/src/main/java/zipkin2/server/internal/MetricsHealthController.java

@@ -85,7 +85,7 @@ public ObjectNode fetchMetricsFromMicrometer() {
  // Delegates the health endpoint from the Actuator to the root context path and can be deprecated
  // in future in favour of Actuator endpoints
  @Get("/health")
-  public HttpResponse getHealth() throws JsonProcessingException {
+  public AggregatedHttpResponse getHealth() throws JsonProcessingException {


side q: will any of the other read methods end up blocking the event loop? ex getTraces etc?

Nope they all return AggregatedHttpResponse so don't run on the event loop. Though I'm definitely interested in eventually migrating them to use enqueue instead of execute to be fully async :)

.../java/zipkin2/server/internal/elasticsearch/ZipkinElasticsearchStorageAutoConfiguration.java

zipkin-storage/elasticsearch/README.md

codefromthecrypt · 2019-07-13T23:13:27Z

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java

+     * Customizes the {@link HttpClientBuilder} used when connecting to ElasticSearch. Mostly for
+     * testing.
+     */
+    public abstract Builder clientCustomizer(Consumer<HttpClientBuilder> clientCustomizer);


if only for testing, we should make it package private like the above. the more we expose the more we break on next refactor :P

I suspect one or two of these are exposed for zipkin-aws, if so a comment under the javadoc saying literally what is using it will help us from accidentally breaking.

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java

...n-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/internal/IndexNameFormatter.java

zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchStorage.java

...asticsearch/src/main/java/zipkin2/elasticsearch/internal/client/RawContentLoggingClient.java

...storage/elasticsearch/src/test/java/zipkin2/elasticsearch/ElasticsearchSpanConsumerTest.java

zipkin/src/main/java/zipkin2/internal/ReadBuffer.java

codefromthecrypt · 2019-07-14T00:06:27Z

self-tracing seems it was busted prior to this. we can follow-up with the small things in future pull requests. Great job @anuraaga!

codefromthecrypt · 2019-07-14T00:35:08Z

self-tracing follow-up #2663

codefromthecrypt · 2019-07-15T00:30:36Z

thx for the loopback

…

On Sun, Jul 14, 2019, 5:28 PM Anuraag Agrawal ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In zipkin-server/src/main/java/zipkin2/server/internal/MetricsHealthController.java <#2653 (comment)>: > @@ -85,7 +85,7 @@ public ObjectNode fetchMetricsFromMicrometer() { // Delegates the health endpoint from the Actuator to the root context path and can be deprecated // in future in favour of Actuator endpoints @get("/health") - public HttpResponse getHealth() throws JsonProcessingException { + public AggregatedHttpResponse getHealth() throws JsonProcessingException { Nope they all return AggregatedHttpResponse so don't run on the event loop. Though I'm definitely interested in eventually migrating them to use enqueue instead of execute to be fully async :) — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#2653?email_source=notifications&email_token=AAAPVVYPCSODEUD2TDVDSETP7LPUXA5CNFSM4H6ZPMGKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB6LT7JI#discussion_r303232383>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAPVV3D2YR6MENG62OWZCDP7LPUXANCNFSM4H6ZPMGA> .

I ran into this while debugging why `awaitInitialEndpoints` in openzipkin/zipkin#2653 often takes seconds to resolve `localhost`. I don't think I found the real culprit yet, but it does look like when closing and re-registering a health checked endpoint group, health checks from the closed one will still continue to be sent forever.

anuraaga added 4 commits July 5, 2019 18:13

Migrate more

e2314d0

Migrate more

26eb587

Migrate some tests.

e476ab6

Finish migrating tests

2e9e1f1

anuraaga requested a review from codefromthecrypt July 8, 2019 09:03

anuraaga changed the title ~~Migrate from okhttp to elasticsearch~~ Migrate from okhttp to armeria Jul 8, 2019

anuraaga added 2 commits July 9, 2019 19:21

Update zipkin-server

7b3031b

Remove okhttp dependency

f0d6f72

benchmarks compile

3e844b9

anuraaga added 3 commits July 10, 2019 16:52

Allow customizing connect timeout

65c543e

Integration tests pass.

4090c8f

Fix flake?

4a9f9de

Disable HTTP/2 preface by default since elasticsearch v7 doesn't supp…

d96d9ee

…ort it.

codefromthecrypt reviewed Jul 11, 2019

View reviewed changes

anuraaga added 3 commits July 11, 2019 17:04

Workaround Armeria server broken when disabling HTTP/2 preface.

fa39b6f

Missed one spot.

f1ccb11

Fix ReadBuffer and do DNS round robin.

b44792b

anuraaga commented Jul 11, 2019

View reviewed changes

anuraaga added 2 commits July 11, 2019 18:28

Update README some more.

88cc4c3

Wait for load balanced endpoints when starting.

043b7b1

anuraaga added 2 commits July 12, 2019 17:22

Log bodies.

9eb0412

Copyright

851d294

Better toString

c1d3eac

anuraaga marked this pull request as ready for review July 13, 2019 14:01

codefromthecrypt approved these changes Jul 13, 2019

View reviewed changes

codefromthecrypt merged commit fc8cc19 into openzipkin:master Jul 14, 2019

anuraaga mentioned this pull request Jul 14, 2019

Await endpoints in check, not httpClient #2667

Merged

abesto pushed a commit to abesto/zipkin that referenced this pull request Sep 10, 2019

Migrates Elasticsearch storage from okhttp to armeria (openzipkin#2653)

cbaa397

Migrate from okhttp to armeria #2653

Migrate from okhttp to armeria #2653

Conversation

anuraaga commented Jul 8, 2019 • edited

codefromthecrypt commented Jul 8, 2019 via email

codefromthecrypt commented Jul 9, 2019 via email

anuraaga commented Jul 9, 2019

codefromthecrypt commented Jul 9, 2019 via email

anuraaga commented Jul 10, 2019

anuraaga commented Jul 10, 2019

codefromthecrypt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anuraaga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codefromthecrypt commented Jul 12, 2019

codefromthecrypt commented Jul 12, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

anuraaga commented Jul 13, 2019 • edited

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

anuraaga commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

codefromthecrypt commented Jul 13, 2019

anuraaga commented Jul 13, 2019

codefromthecrypt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codefromthecrypt commented Jul 14, 2019

codefromthecrypt commented Jul 14, 2019

codefromthecrypt commented Jul 15, 2019 via email

anuraaga commented Jul 8, 2019 •

edited

anuraaga commented Jul 13, 2019 •

edited