Log SQL metrics #1875

aehmttw · 2024-06-14T21:16:14Z

Is this a user-visible change (yes/no): no

This pull request goes with another one in the benchmarks repository: https://github.com/feldera/benchmarks/pull/4

Changes in this PR:

add one more statistic to the results of the Nexmark SQL test: peak memory usage.
add logging per-run metrics of Nexmark SQL tests to sql_nexmark_metrics.csv. These metrics are: test name, elapsed_seconds, rss_bytes, buffered_input_records, total_input_records, total_processed_records. There are also several disk metrics, for runs using disk storage: total_files_created, total_bytes_written, total_writes_success, buffer_cache_hit, and write_latency_histogram.
add csv-metrics and metrics-interval args to feldera-sql/run.py to allow controlling metrics csv file location and how often metrics are recorded in that file.

These files are then used by the benchmarks repository to generate new graphs. See that PR for more details.

gz

I'm not too familiar with this script so maybe @blp can take a look too

gz · 2024-06-14T22:55:24Z

benchmark/feldera-sql/run.py

+                    if elapsed - last_metrics > metricsinterval:
+                        last_metrics = elapsed
+                        metrics_list = [pipeline_name, elapsed, global_metrics["rss_bytes"], global_metrics["buffered_input_records"], global_metrics["total_input_records"], global_metrics["total_processed_records"]]
+                        disk_index = 6


this seems a little brittle maybe it should be disk_index = len(metrics_list)+1 or similar?

I'm going to go with this; the reason being that we don't necessarily know what order we are receiving things in, so we want to maintain the same order in the list of lists which will be output as a CSV.

gz · 2024-06-14T22:55:44Z

benchmark/feldera-sql/run.py

+                            metrics_list += [""]
+                        for s in stats["metrics"]:
+                            if s["key"] == "disk.total_files_created":
+                                metrics_list[disk_index] = s["value"]["Counter"]


or this here just does .push() instead of append by indexing?

Signed-off-by: Matei <matei@feldera.com>

gz approved these changes Jun 14, 2024

View reviewed changes

Log SQL metrics

84de958

Signed-off-by: Matei <matei@feldera.com>

aehmttw force-pushed the sql-metrics branch from 183d5d8 to 84de958 Compare June 15, 2024 02:21

aehmttw mentioned this pull request Jun 17, 2024

SQL storage metrics #1886

Merged

aehmttw merged commit 35e47f9 into main Jun 17, 2024
5 checks passed

aehmttw deleted the sql-metrics branch June 17, 2024 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log SQL metrics #1875

Log SQL metrics #1875

aehmttw commented Jun 14, 2024

gz left a comment

gz Jun 14, 2024

aehmttw Jun 15, 2024

gz Jun 14, 2024

Log SQL metrics #1875

Log SQL metrics #1875

Conversation

aehmttw commented Jun 14, 2024

gz left a comment

Choose a reason for hiding this comment

gz Jun 14, 2024

Choose a reason for hiding this comment

aehmttw Jun 15, 2024

Choose a reason for hiding this comment

gz Jun 14, 2024

Choose a reason for hiding this comment