Replies: 3 comments 2 replies
-
After having a discussion with @ribalba this morning and taking a look into the current draft of the criteria for the Blue Angel certification for server-side applications I can now answer one of my questions: Warming up of the software under test is not a considered aspect during measurement as part of the Blue Angel certification process. Only a number is given as to how often a measurement run should be repeated: at least 10 times is a must, 30 times is recommended. This is of course not sufficient to warm up e.g. a JVM-based application. |
Beta Was this translation helpful? Give feedback.
-
Hey @davidkopp excellent issue, thanks for bringing this up. The GMT is agnostic about this and I think this makes it in one way powerful as users can decide how to do it, but also may leave a user puzzled which is "the way to go". Here is my take: I think a benchmark should reflect how the application behaves in order to give a third party an estimation where load occurs and how much resources the application will consume. I see two variants here:
In both cases I would argue that the warmup shall be part of the usage scenario. In case A I would make it part of the Runtime phase as a separate step. In case B I would make it part of the boot process through the already mentioned Here is an example for instance of an app we monitor that comes with a warmup script that is however not typically how the app is deployed. Thus it is part of the runtime phase. Having an optionally empty warmup phase is for sure also a way to go, but it would limit the GMT to have this phase at a fixed location. I think it is quite valuable to have the Warmup in the runtime phase and maybe not even as the first step. Hope I could give some insights. How does the feedback resonate with you? |
Beta Was this translation helpful? Give feedback.
-
Answer #2 on your questions regarding the JVM: I think the performance gain is due to how the java compiler behaves. Do you know about the tuning switches? I have not worked with java lately, but I know they exist. So I consulted chatGPT for it. This seems like a good starting point:
My first step would be to query the JVM what the default settings are for the container you are using and then driving these values up / down and see how the tests change. If you do, please ping me with the results as I am really curious to see them! I can imagine even making a case study / article about it if that is something that is interesting for you. |
Beta Was this translation helpful? Give feedback.
-
In performance benchmarking of e.g. Java web applications it is crucial to warm up the application before running the actual measurement. Without a warm-up phase, the measurement would be unrealistic and unfair. Java and other languages with a runtime environment using interpretation and/or Just-In-Time (JIT) compilation are able to optimize the overall performance during runtime over time (e.g. by optimizing code paths to frequently used methods). Here a short article about "How to Warm Up the JVM". I don't know much about this topic, but the optimizations that are possible seem to be quite significant.
So, this principle in performance benchmarking should be also true for energy benchmarking, right?
My main question is now: Should I always warm up a (Java) application before running actual energy measurements for better accuracy? My current use case is the comparison of two Java-based web applications in terms of their energy efficiency.
To better understand the problem, I made a quick test with GMT, executing 25 flows with the same workload (100 executions with 3 HTTP requests to a backend):
https://metrics.green-coding.berlin/stats.html?id=b5478c99-c8b4-4f65-a25b-99180f5ced2f
In this run in total 7500 requests were made to the backend (a Java web application based on Spring Boot). As far as I know, this number is quite low, and usually you need a much higher number of executions to warm up the application (at least for microbenchmarks). Nevertheless, this quick measurement seems to show that optimizations take place already after a short time: The average CPU utilization of the backend component of the first flow (29.38 %) and the second flow (23.90 %) is much higher than the subsequent flows (<14 %). After the 10. flow the average CPU utilization stays under 10 %. Or is there a different reason for the reduction?
Energy consumption has also decreased accordingly (not so much, because the most energy was consumed by the load generator and not by the backend).
GMT is based on the concept of a standard usage scenario (SUS). The concept was originally introduced for the Blue Angel certification for software, until now only available for desktop applications. Warming up is probably not relevant for desktop applications, but it probably is for server applications. As far as I know, the Blue Angel certification will soon also be available for server applications. Will the issue of warming up play a role there?
If the conclusion is that there are situations in which a warm-up is important, the next question is: how?
setup-commands
as part of the boot phase (disadvantage: you can't see when the application can be considered as warm)I am new to the field of benchmarks (performance, load, energy, etc.). So I am very happy about any answers and suggestions!
Beta Was this translation helpful? Give feedback.
All reactions