Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase grpc retry to 1000 in xdock #1419

Merged

Conversation

pavolloffay
Copy link
Member

Related to d8ab694#r32679636

Signed-off-by: Pavol Loffay ploffay@redhat.com

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
@ghost ghost assigned pavolloffay Mar 11, 2019
@ghost ghost added the review label Mar 11, 2019
pavolloffay referenced this pull request Mar 11, 2019
* Make grpc reporter default and add retry

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Polish

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Fix port

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Polish

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Use higher retry

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Increase retry to 100

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
@codecov
Copy link

codecov bot commented Mar 11, 2019

Codecov Report

Merging #1419 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1419   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files         164     164           
  Lines        7502    7502           
======================================
  Hits         7502    7502

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85f9ea9...ca79b25. Read the comment docs.

@objectiser
Copy link
Contributor

Short term solution, as if still fails, it would take a long time to report - but useful to see if this makes things more stable.

@pavolloffay
Copy link
Member Author

Yep this is our goal.

The current configuration waits 1000*50ms with 10% jitter so around 5s -+10%.

@pavolloffay pavolloffay merged commit 91d9f95 into jaegertracing:master Mar 11, 2019
@ghost ghost removed the review label Mar 11, 2019
@yurishkuro
Copy link
Member

Restarting the build in jaegertracing/jaeger-client-java#602

@yurishkuro
Copy link
Member

The build still failed.

@pavolloffay
Copy link
Member Author

This is unfortunate. We could go even higher to 10000.

Isn't there problem with the PR itself. It failed on

 [endtoend] test_driver→ (services=java-https testdriver=test_driver) ⇒ Fail: could not retrieve traces from query service

@yurishkuro
Copy link
Member

It's possible, I didn't have time to investigate. I just know that PRs in many client libs have been failing recently on crossdock.

@iori-yja
Copy link

I got the reason. I remembered that once I had trouble with nginx which caches the name resolve result forever by default, so I wrote a workaround.
My hypothesis is that:

  1. jaeger-collector dies until the database comes up.
  2. docker restarts collector and occasionally it changes the IP.
  3. nginx caches old IP so it shows Bad Gateway.

I wrote:

  1. loop over wget collector's healthcheck before nginx's start-up.
  2. nginx exposes :8080 port plainly as healthcheck.
  3. crossdock waits for nginx.

On one of my environments that tends to fail (my other environments tend to succeed) could build successfully for several times.
I will push it to see it works for Travis, but the change isn't related to my PR. Should I revert it after seeing the result?

@iori-yja
Copy link

iori-yja commented Mar 12, 2019

@yurishkuro
Copy link
Member

Should I revert it after seeing the result?

@iori-yja what are you referring to?

@iori-yja
Copy link

@yurishkuro This commit is intended to work around the behavior and a little redundant. If the same problem happens in other PRs in client libs, possibly fix it in jaeger repository should be better (adding wait_for in crossdock's docker-compose, for example).

@yurishkuro
Copy link
Member

I think the Java PR may be at fault for the failure itself, e.g. a Go client PR was successful (jaegertracing/jaeger-client-go#379).

@iori-yja
Copy link

OK, it's figured out. The fault is in my change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants