Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1537: Sanity checking on client side load should be included into core metrics and async warnings #1601

Merged
merged 5 commits into from
Oct 12, 2023

Conversation

ShaunakDas88
Copy link
Contributor

@ShaunakDas88 ShaunakDas88 commented Oct 4, 2023

PR for #1537

@ShaunakDas88 ShaunakDas88 marked this pull request as draft October 4, 2023 23:46
@ShaunakDas88
Copy link
Contributor Author

ShaunakDas88 commented Oct 4, 2023

None of the parsers or the checker are currently living in the correct place. Please suggest a better place for me to move those classes to.

* limitations under the License.
*/

package io.nosqlbench.engine.api.activityimpl.uniform;
Copy link
Contributor

@jshook jshook Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have all of these classes under some "clientload" package in the engine-core module.

Copy link
Collaborator

@MarkWolters MarkWolters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be really cool to have! Given the similarities in the various Reader classes you could probably collapse them into a class that provides the correct logic based on input but that's more of a stylistic choice than anything else. Nice work!

private static final Logger logger = LogManager.getLogger(ClientSystemMetricChecker.class);
private final int pollIntervalSeconds;
private final ScheduledExecutorService scheduler;
private final Map<String,Gauge<Double>> nameToNumerator;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that, as these values are all accessed together, it might be more efficient to create a nested class like
private class ClientMetrics {
Gauge numerator,
Gauge denominator,
...
} ,
Store that by name in a Map and store/retrieve everything with a single call

int sectorSizeBytes = 512;
String line;
while ((line = reader.readLine()) != null) {
String[] parts = line.trim().split("\\s+");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend putting a try/catch inside the while loop as well, so that if a single line in the file is corrupted and e.g. parseDouble runs into something weird and throws and exception, the whole file reading process doesn't have to bail

metricsMap.put("loadAvg15min", loadAvgFifteenMinute);

} catch (FileNotFoundException e) {
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This then results in the null value bubbling up to the original caller silently. If that's what's intended, that's fine. Just mentioning it.

Copy link
Contributor

@jshook jshook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes on approach:
It would be best to reduce the usage of Map here where possible. Many small maps will have an out-sized impact on memory usage. In some cases here, you could easily use arrays or Lists, since the consumer side of this data simply wants to iterate over the established data sources.
I would also advocate for bundling the basic (find, extract, check) style patterns into a type which can be applied to each, rather than keep things in tabular form. Happy to go into more details if you like.

@ShaunakDas88 ShaunakDas88 force-pushed the client_metrics branch 4 times, most recently from ec9e848 to 19769fc Compare October 11, 2023 04:48
@ShaunakDas88 ShaunakDas88 marked this pull request as ready for review October 11, 2023 05:16
@ShaunakDas88
Copy link
Contributor Author

ShaunakDas88 commented Oct 11, 2023

Thanks for the feedback, this is now ready for another round of reviews. A few changes, so the highlights:

  • moved all the appropriate classes under engine.core.clientload package

  • Inheritance. base LinuxSystemFileReader class added, which contains much of the common logic across the different file readers

    • Less code duplication
  • Regex. Per Shooky's suggestion, each file reader now leverages regex to match for the first line in a file for a given pattern. This now avoids trying to parse all the things for each call to a public reader method for a Double:

    • more efficient (not processing all the things within a file, when invoking for a particular metric)
    • still get to re-use a common method
    • gets rid of Maps within the reader classes
    • NOTE. I avoided creating classes for each client-side metric (where provide a name, file, and regex pattern for matching line in the file), because there were maybe too many metrics that needed further arithmetic applied to them (e.g. StatReader.getTotalTime(), DiskStatsReader.getKbReadForDevice()). Even with some inheritance, it felt like it would perhaps be class overkill right now, so opted for methods within the file readers right now. However, if we prefer that approach, this can be done (with lambdas I imagine).
  • ClientMetric object. As suggested by Mark, we should encapsulate all the things I previously mapped a name to in ClientSystemMetricChecker within a separate class. However, beyond just serving as a struct, motivated by Shooky's comment, I added extract() and check() methods as well within this class, so that all value-parsing and checking logic lives here, and the ClientSystemMetricChecker logic is simpler to read

    • gets rid of Maps within ClientSystemMetricChecker class

Local Testing
These are all the metrics that get reported from a run on my local machine (apologies, it's a lot; but that's the goal):

shaunak@shaunak-Precision-5530:~$ cat test_file | head -6000000 | grep -i GAUGE | grep -v ro
  11023 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=cpu_idle, value=1.49710641E8
  11023 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=cpu_iowait, value=124459.0
  11024 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=cpu_system, value=1301349.0
  11024 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=cpu_total, value=1.59660476E8
  11024 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=cpu_user, value=8398934.0
  11025 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=docker0_rx_bytes, value=0.0
  11025 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=docker0_rx_packets, value=0.0
  11026 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=docker0_tx_bytes, value=0.0
  11026 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=docker0_tx_packets, value=0.0
  11027 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=lo_rx_bytes, value=3.3086387E7
  11027 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=lo_rx_packets, value=439409.0
  11028 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=lo_tx_bytes, value=3.3086387E7
  11028 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=lo_tx_packets, value=439409.0
  11028 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loadavg_15min, value=1.79
  11028 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loadavg_1min, value=0.98
  11029 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loadavg_5min, value=1.06
  11029 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop0_kB_read, value=113.0
  11029 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop0_kB_written, value=0.0
  11029 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop0_transactions, value=41.0
  11030 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop10_kB_read, value=1154.0
  11031 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop10_kB_written, value=0.0
  11032 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop10_transactions, value=139.0
  11032 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop11_kB_read, value=8.0
  11033 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop11_kB_written, value=0.0
  11034 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop11_transactions, value=5.0
  11034 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop12_kB_read, value=1117.0
  11035 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop12_kB_written, value=0.0
  11035 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop12_transactions, value=102.0
  11035 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop13_kB_read, value=1322.0
  11036 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop13_kB_written, value=0.0
  11036 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop13_transactions, value=1024.0
  11036 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop14_kB_read, value=1332.0
  11036 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop14_kB_written, value=0.0
  11037 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop14_transactions, value=310.0
  11037 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop15_kB_read, value=1169.0
  11037 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop15_kB_written, value=0.0
  11038 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop15_transactions, value=144.0
  11038 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop16_kB_read, value=1048.0
  11038 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop16_kB_written, value=0.0
  11038 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop16_transactions, value=53.0
  11039 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop17_kB_read, value=1134.0
  11039 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop17_kB_written, value=0.0
  11039 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop17_transactions, value=119.0
  11040 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop18_kB_read, value=1161.0
  11040 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop18_kB_written, value=0.0
  11040 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop18_transactions, value=146.0
  11040 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop19_kB_read, value=48.0
  11041 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop19_kB_written, value=0.0
  11041 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop19_transactions, value=18.0
  11041 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop1_kB_read, value=1694.0
  11041 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop1_kB_written, value=0.0
  11042 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop1_transactions, value=696.0
  11042 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop20_kB_read, value=1192.0
  11042 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop20_kB_written, value=0.0
  11042 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop20_transactions, value=175.0
  11043 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop21_kB_read, value=21730.0
  11043 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop21_kB_written, value=0.0
  11043 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop21_transactions, value=20746.0
  11043 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop22_kB_read, value=1045.0
  11044 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop22_kB_written, value=0.0
  11044 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop22_transactions, value=56.0
  11044 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop23_kB_read, value=1146.0
  11044 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop23_kB_written, value=0.0
  11045 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop23_transactions, value=152.0
  11045 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop24_kB_read, value=5894.0
  11045 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop24_kB_written, value=0.0
  11045 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop24_transactions, value=4871.0
  11046 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop25_kB_read, value=51.0
  11046 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop25_kB_written, value=0.0
  11046 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop25_transactions, value=21.0
  11046 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop26_kB_read, value=1186.0
  11047 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop26_kB_written, value=0.0
  11047 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop26_transactions, value=176.0
  11047 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop27_kB_read, value=514.0
  11047 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop27_kB_written, value=0.0
  11048 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop27_transactions, value=213.0
  11048 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop28_kB_read, value=1158.0
  11048 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop28_kB_written, value=0.0
  11048 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop28_transactions, value=146.0
  11049 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop29_kB_read, value=1338.0
  11049 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop29_kB_written, value=0.0
  11049 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop29_transactions, value=323.0
  11049 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop2_kB_read, value=108.0
  11050 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop2_kB_written, value=0.0
  11050 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop2_transactions, value=39.0
  11050 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop30_kB_read, value=124.0
  11050 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop30_kB_written, value=0.0
  11051 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop30_transactions, value=49.0
  11051 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop31_kB_read, value=1072.0
  11051 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop31_kB_written, value=0.0
  11051 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop31_transactions, value=69.0
  11052 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop32_kB_read, value=33189.0
  11052 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop32_kB_written, value=0.0
  11052 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop32_transactions, value=32169.0
  11052 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop33_kB_read, value=126.0
  11053 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop33_kB_written, value=0.0
  11053 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop33_transactions, value=48.0
  11053 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop34_kB_read, value=1222.0
  11053 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop34_kB_written, value=0.0
  11054 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop34_transactions, value=205.0
  11054 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop35_kB_read, value=51.0
  11054 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop35_kB_written, value=0.0
  11055 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop35_transactions, value=21.0
  11055 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop36_kB_read, value=1351.0
  11055 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop36_kB_written, value=0.0
  11055 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop36_transactions, value=338.0
  11056 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop3_kB_read, value=1123.0
  11056 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop3_kB_written, value=0.0
  11056 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop3_transactions, value=115.0
  11056 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop4_kB_read, value=525.0
  11057 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop4_kB_written, value=0.0
  11057 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop4_transactions, value=233.0
  11057 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop5_kB_read, value=49.0
  11057 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop5_kB_written, value=0.0
  11058 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop5_transactions, value=19.0
  11058 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop6_kB_read, value=1077.0
  11058 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop6_kB_written, value=0.0
  11058 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop6_transactions, value=78.0
  11058 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop7_kB_read, value=1991.0
  11059 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop7_kB_written, value=0.0
  11059 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop7_transactions, value=974.0
  11059 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop8_kB_read, value=1151.0
  11059 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop8_kB_written, value=0.0
  11060 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop8_transactions, value=140.0
  11060 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop9_kB_read, value=526.0
  11060 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop9_kB_written, value=0.0
  11060 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=loop9_transactions, value=228.0
  11060 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=mem_available, value=1.92181E7
  11061 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=mem_buffered, value=847784.0
  11061 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=mem_cached, value=6630832.0
  11061 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=mem_free, value=1.258554E7
  11061 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=mem_total, value=3.2592764E7
  11061 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=mem_used, value=2.0007224E7
  11061 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1_kB_read, value=3.3481276E7
  11062 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1_kB_written, value=3.89650153E8
  11062 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1_transactions, value=4226781.0
  11062 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p1_kB_read, value=5891.0
  11062 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p1_kB_written, value=5.0
  11063 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p1_transactions, value=223.0
  11063 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p2_kB_read, value=88.0
  11063 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p2_kB_written, value=0.0
  11063 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p2_transactions, value=22.0
  11064 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p3_kB_read, value=2908.0
  11064 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p3_kB_written, value=6216.0
  11064 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p3_transactions, value=1001.0
  11064 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p4_kB_read, value=3.3470741E7
  11064 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p4_kB_written, value=3.89643932E8
  11065 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=nvme0n1p4_transactions, value=3915819.0
  11067 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=swap_free, value=991996.0
  11067 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=swap_total, value=999420.0
  11068 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=swap_used, value=7424.0
  11068 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=wlp59s0_rx_bytes, value=3.058544805E9
  11068 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=wlp59s0_rx_packets, value=5087893.0
  11068 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=wlp59s0_tx_bytes, value=1.24846801E9
  11068 INFO  [metrics-logger-reporter-2-thread-1] MetricReporters type=GAUGE, name=wlp59s0_tx_packets, value=3553609.0

For the client metric checking, I run cpuburn locally to artificially max out CPU usage, and it triggers the threshold check for CPU percentage spent in user space, then no more after CPU burn is killed (the sleep 30s):

shaunak@shaunak-Precision-5530:~$ grep -i cpu_user_percent test_file ; sleep 30s; echo; grep -i cpu_user_percent test_file
 111038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 96.89999999999999 > threshold 50.0
 121038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 95.64202983084743 > threshold 50.0
 131038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 96.59138261521794 > threshold 50.0
 141038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 95.18333333333334 > threshold 50.0
 151038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 96.233647196067 > threshold 50.0

 111038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 96.89999999999999 > threshold 50.0
 121038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 95.64202983084743 > threshold 50.0
 131038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 96.59138261521794 > threshold 50.0
 141038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 95.18333333333334 > threshold 50.0
 151038 WARN  [pool-2-thread-1] ClientMetric cpu_user_percent value = 96.233647196067 > threshold 50.0

Copy link
Collaborator

@MarkWolters MarkWolters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. The only thing I would change is to move the creation/registration logic from NBCLI to NBSession. In the new component based model these metrics should be part of the hierarchy and NBCLI stands outside of the hierarchy.

Copy link
Contributor

@jshook jshook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me.

@ShaunakDas88
Copy link
Contributor Author

Thanks for the reviews, guys.

@MarkWolters I agree that it makes sense to move the registration to NBSession, but that class currently is not available in main. Let me go ahead and merge this, and I'll create a separate ticket to move all that to NBSession once the component-based changes land in main

@ShaunakDas88 ShaunakDas88 merged commit 7ae6028 into main Oct 12, 2023
8 checks passed
@ShaunakDas88 ShaunakDas88 deleted the client_metrics branch October 12, 2023 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants