-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No response body from /metrics
with multiple servers
#151
Comments
/metrics
) with multiple servers /metrics
with multiple servers
Hi @ahmgeek We haven't seen this issue before, however, and while JRuby is supported, we're not really JRuby in production, so we don't have much experience with it unfortunately |
hey @dmagliola, we are already using |
Hey @dmagliola,
Is this is meant to be
I see it rescued here
So not sure if it should be rescued somewhere else, although I don't think it should be rescued but for some reasons, when it raises the error couple of times, the |
client_ruby/lib/prometheus/client.rb Lines 10 to 16 in b3bbe23
Is this is thread safe? notice |
Not really sure how to answer that. It's not meant to be rescued within the client itself. You'll get this error if you try to register the same metric twice. Adding metrics is thread-safe, however if multiple threads add the same metric you'll get this error. If you're going to have multiple threads declaring the metrics (adding them to the registry), you may need to either synchronize checking whether they are there already and adding if not, or, you can rescue the exception and get the already registered metric from the registry, I guess? It's a bit hard to answer what's best without knowing how your code is doing this.
No, you are right. That is very much not thread safe. Could you try temporarily adding a mutex, in your code, around the code that access this method, to see if that solves the problem? |
@dmagliola Thanks for the response, I am pretty doing nothing within my code, just using the collector and the exporter as they are: require "prometheus/middleware/collector"
require "prometheus/middleware/exporter"
module Sinatra
class Api < Base
# Used for Prometheus
use Rack::Deflater
use Prometheus::Middleware::Collector
use Prometheus::Middleware::Exporter
configure do
set :raise_errors, true
end
end
end However I have the issues because jruby 😄 I figured that there's some race conditions or something because the errors seems like: couple of threads are trying to register the same metric then it fails to continue for some reasons from inside the collector, I am trying to re-produce this locally with tests then send a PR to fix it. Will let you know soonish the outcome. |
Turned out there's a raised issue, but swallowed by rack's |
I was having a related issue. We're running inside an AWS Fargate container, with Puma. Our initialization code was some I found somewhere, and it looked like this:
Notice the 3 lines I commented out for deleting the old files. I believe this was the cause of the problem. I understand removing old files if you're in a persistent scenario with a single-threaded startup. But I watched where it was producing files then suddenly then all disappeared and /metrics stopped doing anything useful. I believe it was Puma finally deciding to start another thread, but I'm not sure how to prove it. No idea why I felt I was supposed to delete old files, but as I shouldn't be bouncing willy-nilly without bouncing the entire container, I don't feel it's necessary. |
I'm going to close this on the basis that it's pretty old and we have fixed a bunch of threading issues in the meantime. I'm slightly uneasy at the idea that we may have something outstanding that only gets surfaced by the parallelism of JRuby, but even if that's the case then this lead has long since gone cold, and we need a fresh report that we can chase up. |
Code first
Technology
We use
JRuby-9.2.4.1
and Trinidad serverSetup
We have 2 servers running the application with the same ruby version along with same Trinidad server. On front of them HAProxy as a load balancer.
The issue
When you call the
/metrics
page, HAProxy forward the request to the available server, if both are available then selection is a bit random.The issue is, on one server you get the response and you can see the metrics with OpenMetrics format, on the other, you get
200
status code back but no body provided.Just blank page.
curl
ing the servers, one is working pretty fine and response body has the metrics, other works pretty fine response body is empty.Logs are also fine.
Didn't really go deep in the application to see what's wrong (maybe threading issue with the datastores)
But wanted to ask you guys first, maybe you had this issue before or have some hints
The text was updated successfully, but these errors were encountered: