Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrestoExchangeSource HttpClient histogram metric is blocking query #23338

Open
karteekmurthys opened this issue Jul 30, 2024 · 4 comments
Open
Assignees
Labels

Comments

@karteekmurthys
Copy link
Contributor

karteekmurthys commented Jul 30, 2024

PrometheusStatsReporter updates worker stats synchronously. This is fine as most of the stats are updated through PeriodicTaskManager in a separate thread periodically without blocking queries.

But this specific metric is updated in the query path and is updated way too frequently.

        RECORD_HISTOGRAM_METRIC_VALUE(
            kCounterHttpClientPrestoExchangeOnBodyBytes, bufferBytes);
      });

The Q67 of SF10K TPCDS workload takes several hours to finish since we merged PrometheusStatsReporter. We are working on how to improve this in the stats reporter but we should also avoid reporting metrics in the query path if possible.

Request to review if presto_cpp.http.client.presto_exchange_source.on_body_bytes can be removed or made optional for tracking.

We tested Q67 at SF10K by removing this metric and the query completes successfully.

We also have a workaround of updating histogram asynchronously in PrometheusStatsReporter. This solution was tested successfully as well against Q67

@karteekmurthys karteekmurthys self-assigned this Jul 30, 2024
@karteekmurthys
Copy link
Contributor Author

cc: @majetideepak @amitkdutta

@karteekmurthys
Copy link
Contributor Author

karteekmurthys commented Jul 31, 2024

@pramodsatya gathered some stats on total time spent at different locations in Presto CPP where RECORD_HISTOGRAM is called. The Q67 query was run for over an hour and PrestoExchangeSource.cpp:117 took total of 3256278906 us in an hour of query run.

reg/add   function                               line num
Add       addRootPool                            Memory.cpp:248                          76
          clearThread                            Driver.h:153                        166326
          operator()                             ExchangeClient.cpp:163               87166
                                                 ExchangeClient.cpp:167               61760
                                                 ExchangeClient.cpp:170               31759
                                                 FileHandle.cpp:64                       45
                                                 PrestoExchangeSource.cpp:117    3256278906
          runInternal                            Driver.cpp:522                      348599
          updatePrestoExchangeSourceMemoryStats  PeriodicTaskManager.cpp:268          18196
          ~MemoryPoolImpl                        MemoryPool.cpp:462                      45
          ~ScopedArbitration                     SharedArbitrator.cpp:880              1110
                                                 SharedArbitrator.cpp:910                12

@jaystarshot
Copy link
Member

jaystarshot commented Jul 31, 2024

What if the implementation instead created a subprocess like the exposer mentioned https://github.com/jupp0r/prometheus-cpp. Resources can be controlled when starting the exposer. In uber we use the exposer as a separate process since we don't have to worry about reporting/maintainng the new http server

@jaystarshot
Copy link
Member

I don't think this will help directly here^ since I could repro this internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

2 participants