Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MetricsFormatter to allow develops could collect metrics by their own way. #788

Closed
wants to merge 4 commits into from

Conversation

tjiuming
Copy link

Add MetricsFormatter to allow develops collect metrics by their own way.
For more context, you can see #782

@tjiuming tjiuming changed the title Add MetricsFormatter to allow develops collect metrics by their own way. Add MetricsFormatter to allow develops could collect metrics by their own way. May 24, 2022
@dhoard
Copy link
Collaborator

dhoard commented May 24, 2022

@tjiuming has a test, for fun... Example initial version of a Collector callback visitor implementation...

https://github.com/dhoard/client_java/tree/WIP-collector-callback-support https://github.com/dhoard/client_java/tree/WIP-collector-registry-visitor-support

While I like the callback visitor pattern, etc... I think the paradigm shift would be hard to get adopted.

Update: branch name

@tjiuming
Copy link
Author

@dhoard Do you means that this PR would be hard to get adopted?
I believe it's an important feature, could you please give some suggestions that could make this PR get merged?

@fstab
Copy link
Member

fstab commented May 24, 2022

@tjiuming I am struggling to understand what the feature is about. The description is just a reference to #782, so is this about a performance improvement?

@dhoard
Copy link
Collaborator

dhoard commented May 24, 2022

@dhoard Do you means that this PR would be hard to get adopted? I believe it's an important feature, could you please give some suggestions that could make this PR get merged?

This is just my opinion given the amount of work that I feel would be required to implement it correctly.

Implement your solution and provide benchmarks that prove your solution increases performance/decreases memory usage (GCs), etc. (which it may well do.)

@tjiuming
Copy link
Author

@dhoard This PR is providing a facade, it's an extention. I think we don't need to care about custom implementation's performance

@dhoard
Copy link
Collaborator

dhoard commented May 27, 2022

@tjiuming just pushed some changes to my branch https://github.com/dhoard/client_java/tree/WIP-collector-callback-support https://github.com/dhoard/client_java/tree/WIP-collector-registry-visitor-support that simulates what you are trying to do (at least the core changes.)

Based on my profiling with YourKit... the performance isn't as good using a similar "TextFormat" class similar to the current one.

Update: branch name

@tjiuming
Copy link
Author

tjiuming commented May 27, 2022

@dhoard you can try #782 (The code is too messy and needs to be refactored), there are some optimization methods.

  1. Use unsafe to prevent generate string objects like doubleToString
  2. Use netty's CompositeDirectByteBuf to prevent buffer expansions
  3. When expose metrics, we can also use jetty and to prevent mem_copy by using it'sHttpOutput#write(java.nio.ByteBuffer)
  4. When write extraLabelName and extraLabelValue, we can write(List<String> labelNames, String extraLabelName, List<String> labelValues, double extraLabelValue) to prevent new ArrayList()
  5. In TextFormat#write004, there is a local var omFamilies, we can also write it to a subMetricWrite, and in the finally after all metrics collected, append subMetricWrite to mainMetricWrite's tail directly.

Each of the above methods may have only a little performance improvement, but it's very useful for these systems which has many meters.

@dhoard
Copy link
Collaborator

dhoard commented May 27, 2022

@tjiuming you need to close one of the PRs. It's not clear which branch of code you are actually working with.

@tjiuming
Copy link
Author

@tjiuming you need to close one of the PRs. It's not clear which branch of code you are actually working with.

I closed that PR. Please help approve this PR.

@dhoard
Copy link
Collaborator

dhoard commented May 31, 2022

@tjiuming In my opinion (not an official maintainer)... You need to provide reproducible evidence that the PR provides value.

In my preliminary tests of the PR (ignoring the actual code to write the metrics... since this is what you have done in MettricsFormatterTest)...

  1. The new collection method takes longer to collect metrics
  2. GC count has increased
  3. Average per GC time has decreased
  4. Overall GC time (GC count * average GC time) has increased.

The stated goal of the original PR was to decrease the creation of Sample objects... but I don't see that as being an issue based on profiling.

The stated goal of this PR is to "allow developers to collect metrics in their own way." Given that the Enumeration<Collector.MetricFamilySamples> lazily collects metrics, I feel this can be accomplished without any changes.

@tjiuming
Copy link
Author

tjiuming commented Jun 7, 2022

@tjiuming I am struggling to understand what the feature is about. The description is just a reference to #782, so is this about a performance improvement?

I want to contribute the facade to prometheus and developers could implement it in their own projects.

@tjiuming
Copy link
Author

tjiuming commented Jun 7, 2022

@dhoard @fstab I pushed some commits to the branch https://github.com/tjiuming/client_java/tree/dev/collect_performance, you can run io.prometheus.client.exporter.common.CollectPerformanceTest in your local and use jconsole to profile, heap-usages and CPU usages are different.

@tjiuming
Copy link
Author

@dhoard @fstab could you please help review?

@asafm
Copy link

asafm commented Jun 29, 2022

Hi @dhoard ,

First, thanks for taking the time to read and experiment with this PR.

I think I can contribute a bit about the motivation for this PR.

When an application which uses Prometheus Client library for defining and exposing metrics in Prometheus, has a very large number of times series (metric + labels) and by that I mean millions, it means every time you are scraping the /metrics endpoint, you'll be generating millions of Samples objects. This has an hefty price in terms of:
memory allocations which eventually leads to more longer Garbage Collection cycles, meaning more CPU time dedicated to GC and the full stops GC cost more, meaning more latency.

Applications which only generates thousands of samples per scrape, wouldn't really care/notice. Those that are in the other end of the spectrum are heavily affected. One of those examples is Apache Pulsar - it is messaging system smilier to Apache Kafka. As opposed to Kafka, a broker can support up to 1 million topics, each with its own Producers and Consumers. Each topic has its own metrics. You can easily reach millions of samples per scrape.
Since Pulsar is an infrastructure component, it has very sensitive to both memory, CPU and latency caused by GCs.

This PR attempts to take a stab in reducing the memory consumption and memory allocations by introducing an additional mechanism to collect the samples by using the visitor pattern.

A Prometheus Collector will have additional method for obtaining the samples:

void collect(SamplesVisitor samplesVisitor) 

and SamplesVisitor would have a different method for visiting:

   counterStart(name, labels, description);
   counterSample(name, labels, value)
   histogramStart(name, labels, description);
   ...

This allows to pass the samples information without any memory allocations.

Apache Puslar in this example would implement its own SamplesVisitor. In our case we write to multiple ByteBuf (from Netty) which is located off-heap (thus no GC), and flush those in certain order to the Http OutputStream.

That's the gist of the idea.

I do believe we can create a DevNullSamplesObserver and run in on 1M and 5M samples scrape, and compare amount of memory allocated that is garage collectible, and also CPU time as comparison, but I'm not sure it's required for this case.

I haven't used the exact implementation names and design of @tjiuming , because I wanted to convey the idea first.

What do you think?

@dhoard
Copy link
Collaborator

dhoard commented Jun 30, 2022

@asafm Based on the initial description provided by @tjiuming, as well as your description, I am in alignment on the purpose of the changed.

My branch https://github.com/dhoard/client_java/tree/WIP-collector-registry-visitor-support contains my initial implementation of the change (as you described.) I personally like the approach

The 2 PRs (as presented), in my opinion, are disjoint (not clear/messy) and provide no basis (profiling, proof, testing, etc.) that it's an improvement.

My basic profiling of the change using YourKit (ignoring the actual Visitor implementation) doesn't show a remarkable improvement on my machine using a JUnit test (https://github.com/dhoard/client_java/blob/WIP-collector-registry-visitor-support/simpleclient/src/test/java/io/prometheus/client/CollectorRegistryTest2.java) - real-world usage would be better.

I'm not a maintainer, so other than my on interest/curiosity, my opinions are just that - one person's opinions.

@fstab @tomwilkie @brian-brazil and others would need to weigh in on this alternate approach to collect metrics after @tjiuming and you can provide the necessary proof it's a good valid solid change. This may/may not be a direction they want to go with the library - I can't speak for them.

@brian-brazil
Copy link
Contributor

My basic profiling of the change using YourKit (ignoring the actual Visitor implementation) doesn't show a remarkable improvement on my machine using a JUnit test

Java GC is pretty smart when it comes to very short lived garbage, I'm not surprised. Even then we're probably only talking a few hundred MB, which isn't much given the use case. Considering how often this is called, this sort of micro-optimisation is unlikely to make sense in broader terms when considered against the additional API complexity. The API we have already provides a generic API to allow for other output formats.

More generally a single target exposing millions of samples is far beyond what is typical, and will likely cause cardinality problems on the Prometheus end, if you can even manage to consistently scrape it successfully once per scrape interval.

I would suggest you look at solutions that don't involve trying to exposing per-topic information in systems with millions of topics.

@asafm
Copy link

asafm commented Jul 4, 2022

@brian-brazil Thanks for taking the time to review this.
I do agree this isn't the standard case.
I do agree the API will be more complicated, by adding an additional method for collecting metrics.

On the other hand, on super sensitive to latency systems such as Apache Pulsar, or databases in general, a 3rd party library, especially observability one, should aim to inflict as little impact as possible - be a "fly on the wall" so to speak in the "room".
In my opinion, that additional visitor-pattern based collect() method, can be used to achieve that, if you even change the way client writes the collected metrics, directly to the HTTP response stream, without going through that intermediate objects. So the "default" implementation will have less impact on the application side.

We're definitely looking at applicative ways to overcome the 1M unique time series issues.

I understand if you decide to leave things as is.

@brian-brazil
Copy link
Contributor

if you even change the way client writes the collected metrics, directly to the HTTP response stream, without going through that intermediate objects.

We have to buffer the output at least, otherwise we don't know if an error occurs during collection which requires a HTTP 500 to be returned.

@asafm
Copy link

asafm commented Jul 4, 2022

I'm not familiar enough with the client - what can cause an error when iterating over existing collectors? It's all under client control no?

@brian-brazil
Copy link
Contributor

A collector could throw an exception, or in future there may be additional checking that the output would be valid.

@asafm
Copy link

asafm commented Jul 4, 2022

@brian-brazil I've read the code a few times now, and I can't find where the http server catches an exception during collection and return 500. I couldn't find any buffering. I saw that each time we get a collector, we ask for its samples and iterate over it, writing it one by one to the stream.

Could you please guide me to that location?

@brian-brazil
Copy link
Contributor

It's not code that we explicitly have, the HTTP servers do it for us.

You can see output buffering in https://github.com/prometheus/client_java/blob/master/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L86 for example.

@asafm
Copy link

asafm commented Jul 4, 2022

I searched through Sun Http Server and HTTP Server and couldn't find anything that catches an exception and sends 500.
So I took TestHTTPServer.testSimpleRequest() and modified it a bit:

  @Test
  public void testSimpleRequest() throws IOException {
    HTTPServer httpServer = new HTTPServer(new InetSocketAddress(0), registry);

    Collector collector = new Collector() {
      @Override
      public List<MetricFamilySamples> collect() {
        throw new RuntimeException("Something went wrong during collect");
      }
    };

    registry.register(collector);
    try {
      HttpResponse response = createHttpRequestBuilder(httpServer, "/metrics").build().execute();
      System.out.println("Status code = " + response.getResponseCode());
    } finally {
      httpServer.close();
    }
  }

The result is that connection is dead in execute, since it reaches:

          } catch (Exception e4) {
                logger.log (Level.TRACE, "ServerImpl.Exchange (4)", e4);
                closeConnection(connection);
            } catch (Throwable t) {
                logger.log(Level.TRACE, "ServerImpl.Exchange (5)", t);
                throw t;
            }

in ServerImpl

Regarding the buffering - thanks for the reference, I somehow missed that.

@dhoard
Copy link
Collaborator

dhoard commented Jul 4, 2022

@asafm You are correct - a 500 Internal Server Error response is not sent if an Exception occurs during collection.

The HTTPServer code is designed/implemented such that a collection has to be 100% successful for a response (hence the buffering.)

For a failed collection, there are two approaches to handle the Exception:

  1. return no response/close the connection.
  2. return a 500 Internal Server Error response.

The code currently implements approach 1.

There are arguments for both approaches... but ideally, in my opinion, we should at least catch/log the Exception.

@asafm
Copy link

asafm commented Jul 5, 2022

Thanks for taking the time to verify @dhoard.

I do agree that if the library already has buffering in place, it makes sense to try-catch and throw 500 in case the collection failed.

One idea that can be used, in tandem with what I suggested or not, is ByteBuffer-backed OutputStream. That
ByteBuffer will be allocated directly meaning it's off-heap which removes any need for garbage collection.
I found this as an example.

Then using the new Visitor Pattern, the default HTTP server can use it and write directly off-heap, without any additional objects allocated, thereby minimizing the impact on the application this library is running at.

@dhoard
Copy link
Collaborator

dhoard commented Jul 5, 2022

@asafm the code uses thread-local ByteArrayOutputStream when writing results:

ByteArrayOutputStream response = this.response.get();
response.reset();

This eliminates the GC issue around the buffer. (conversation on the ByteArrayOutputStream memory usage #703)

Moving to a ByteBufferOutputStream would move the memory off-heap, but shouldn't be a performance improvement.

@asafm
Copy link

asafm commented Jul 5, 2022

Interesting. Didn't think of those implications - having the same buffer 5 times, although you only need one does seem not so good.
I agree with you - it does eliminate the GC issue.
Moving to ByteBuffer wouldn't improve performance, but it will save up on memory consumption (5 unused arrays --> 1 array off-heap).

Circling back to the original issue - the visitor pattern can reduce the burden on memory allocations, with a bit of added complexity. As I said previously I believe an observability library should aim to minimize its impact on a running application as much as possible. I think the price to pay is quite small to get one more footing towards that goal.

Combine it with moving to ByteBufferOutputStream, and you reduce both heap memory consumption and reduce the garbage collection work with the visitor pattern. Both serve to minimize the impact on the hosting application.

@dhoard
Copy link
Collaborator

dhoard commented Jul 5, 2022

@asafm As I pointed out earlier, I like the visitor pattern, which is why I am in the conversation/provided feedback.

Adoption/inclusion is up to the maintainers. (I'm not a maintainer.)

@asafm
Copy link

asafm commented Jul 7, 2022

Ok, I'll wait to see what @brian-brazil thinks about all of this conversation.

@fstab
Copy link
Member

fstab commented Jul 9, 2022

Hi, first of all, thanks a lot for the discussion, I really appreciate everyone is taking the time to look into this and to find a good solution.

Few remarks on points that were mentioned above but aren't directly related to the PR:

having the same buffer 5 times

We start only 1 thread by default, so there is only one thread local buffer (see executor.setCorePoolSize(1)). The number of threads will increase up to 5 if there are multiple scrapes in parallel, but as long as scrapes are sequential there should be only a single thread and only a single buffer.

couldn't find anything that catches an exception and sends 500

Yes, your analysis is correct. I just want to add that closing the connection without sending an HTTP response will make the Prometheus server assume a failed scrape, and the up metric will be 0. So the Prometheus server handles this in exactly the same way as a 500 response.

catch/log the Exception

Technically it is caught and logged in the HttpServer, so if you configure java.util.logging with a ConsoleHandler and set the log level for "com.sun.net.httpserver" to ALL you'll see it. I totally get that this is not convenient. However I'm also hesitant to include a logging library as a dependency in client_java. This will almost certainly conflict with the logging libraries in the applications that use client_java. To avoid potential conflicts it might be better to just throw the exception and not introduce our own logging in client_java.

Anyway, here are some comments on the actual PR:

  • I understand that this is about a performance improvement. However, the PR does not implement the actual performance improvement, but provides API that allows users to implement their own optimized collect() methods. Is that correct? I'm wondering why you don't provide the performance improvement itself, so that everyone can benefit from it.
  • Performance improvements should always be verified with experiments. Please implement some benchmarks using jmh. You can find some examples in benchmarks. Please run the benchmarks with the current implementation and the new implementation and post the results here, so that we can compare the numbers and discuss whether the performance gain is worth the additional complexity.

@asafm
Copy link

asafm commented Dec 29, 2022

One thought that crossed my mind today. Apache Pulsar can easily accommodate 100k topics per Broker and plan to support 1M topics per Broker. Each topic has roughly 70 unique metrics (metric names). Given 1M topics times 70 unique metrics, we get 70M garbage-collected objects per collection cycle. For a system that has the performance of less than 5ms latency per message, this gets the GC to chock and increase latency.
Given that scenario, I don't think JMH is required to prove a sudden peak of 70M objects to be garbage collected in a live system like Apache Pulsar will cause latency degradation. We have witnessed this several times in real production systems with only 100k topics per broker, so imagine 1M.

Of course, we are fully aware that no human can consume so many metrics, so we planned to have a "filter" that aggregates it to reasonable groups of topics, limiting the final amount going out.

Regarding JMH in general, in my opinion, it's hard to measure its impact on the garbage collector on a stand-alone test. Those 70M objects to be garbage collected could be the added work for the GC to reduce latency well beyond the 5ms required. That's something you can't see in a lab JMH test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants