Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log SQL metrics #1875

Merged
merged 1 commit into from
Jun 17, 2024
Merged

Log SQL metrics #1875

merged 1 commit into from
Jun 17, 2024

Conversation

aehmttw
Copy link
Contributor

@aehmttw aehmttw commented Jun 14, 2024

Is this a user-visible change (yes/no): no

This pull request goes with another one in the benchmarks repository: https://github.com/feldera/benchmarks/pull/4

Changes in this PR:

  • add one more statistic to the results of the Nexmark SQL test: peak memory usage.
  • add logging per-run metrics of Nexmark SQL tests to sql_nexmark_metrics.csv. These metrics are: test name, elapsed_seconds, rss_bytes, buffered_input_records, total_input_records, total_processed_records. There are also several disk metrics, for runs using disk storage: total_files_created, total_bytes_written, total_writes_success, buffer_cache_hit, and write_latency_histogram.
  • add csv-metrics and metrics-interval args to feldera-sql/run.py to allow controlling metrics csv file location and how often metrics are recorded in that file.

These files are then used by the benchmarks repository to generate new graphs. See that PR for more details.

Copy link
Collaborator

@gz gz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with this script so maybe @blp can take a look too

if elapsed - last_metrics > metricsinterval:
last_metrics = elapsed
metrics_list = [pipeline_name, elapsed, global_metrics["rss_bytes"], global_metrics["buffered_input_records"], global_metrics["total_input_records"], global_metrics["total_processed_records"]]
disk_index = 6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a little brittle maybe it should be disk_index = len(metrics_list)+1 or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to go with this; the reason being that we don't necessarily know what order we are receiving things in, so we want to maintain the same order in the list of lists which will be output as a CSV.

metrics_list += [""]
for s in stats["metrics"]:
if s["key"] == "disk.total_files_created":
metrics_list[disk_index] = s["value"]["Counter"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or this here just does .push() instead of append by indexing?

Signed-off-by: Matei <matei@feldera.com>
@aehmttw aehmttw mentioned this pull request Jun 17, 2024
@aehmttw aehmttw merged commit 35e47f9 into main Jun 17, 2024
5 checks passed
@aehmttw aehmttw deleted the sql-metrics branch June 17, 2024 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants