Warming up of applications #595

davidkopp · 2023-12-14T16:40:24Z

davidkopp
Dec 14, 2023

In performance benchmarking of e.g. Java web applications it is crucial to warm up the application before running the actual measurement. Without a warm-up phase, the measurement would be unrealistic and unfair. Java and other languages with a runtime environment using interpretation and/or Just-In-Time (JIT) compilation are able to optimize the overall performance during runtime over time (e.g. by optimizing code paths to frequently used methods). Here a short article about "How to Warm Up the JVM". I don't know much about this topic, but the optimizations that are possible seem to be quite significant.

So, this principle in performance benchmarking should be also true for energy benchmarking, right?
My main question is now: Should I always warm up a (Java) application before running actual energy measurements for better accuracy? My current use case is the comparison of two Java-based web applications in terms of their energy efficiency.

To better understand the problem, I made a quick test with GMT, executing 25 flows with the same workload (100 executions with 3 HTTP requests to a backend):
https://metrics.green-coding.berlin/stats.html?id=b5478c99-c8b4-4f65-a25b-99180f5ced2f

In this run in total 7500 requests were made to the backend (a Java web application based on Spring Boot). As far as I know, this number is quite low, and usually you need a much higher number of executions to warm up the application (at least for microbenchmarks). Nevertheless, this quick measurement seems to show that optimizations take place already after a short time: The average CPU utilization of the backend component of the first flow (29.38 %) and the second flow (23.90 %) is much higher than the subsequent flows (<14 %). After the 10. flow the average CPU utilization stays under 10 %. Or is there a different reason for the reduction?
Energy consumption has also decreased accordingly (not so much, because the most energy was consumed by the load generator and not by the backend).

GMT is based on the concept of a standard usage scenario (SUS). The concept was originally introduced for the Blue Angel certification for software, until now only available for desktop applications. Warming up is probably not relevant for desktop applications, but it probably is for server applications. As far as I know, the Blue Angel certification will soon also be available for server applications. Will the issue of warming up play a role there?

If the conclusion is that there are situations in which a warm-up is important, the next question is: how?

How many requests/executions are required until the application can be considered to be warm? (probably highly depends on your specific runtime environment, application and usage scenario)
Does GMT want to support warm-ups? Or do the disadvantages possibly outweigh the advantages, e.g. due to longer and more resource-intensive measurements?
If GMT wants to support warm-ups, how would be the best way?
- using setup-commands as part of the boot phase (disadvantage: you can't see when the application can be considered as warm)
- using multiple flows (more a workaround than a solution)
- introduce a new optional “warm-up” phase (either use an existing flow that is executed multiple times, or define a specialized warm-up-flow)

I am new to the field of benchmarks (performance, load, energy, etc.). So I am very happy about any answers and suggestions!

davidkopp · 2023-12-15T14:54:31Z

davidkopp
Dec 15, 2023
Author

After having a discussion with @ribalba this morning and taking a look into the current draft of the criteria for the Blue Angel certification for server-side applications I can now answer one of my questions: Warming up of the software under test is not a considered aspect during measurement as part of the Blue Angel certification process.

Only a number is given as to how often a measurement run should be repeated: at least 10 times is a must, 30 times is recommended. This is of course not sufficient to warm up e.g. a JVM-based application.

0 replies

ArneTR · 2023-12-19T14:40:05Z

ArneTR
Dec 19, 2023
Maintainer

Hey @davidkopp

excellent issue, thanks for bringing this up.

The GMT is agnostic about this and I think this makes it in one way powerful as users can decide how to do it, but also may leave a user puzzled which is "the way to go".

Here is my take: I think a benchmark should reflect how the application behaves in order to give a third party an estimation where load occurs and how much resources the application will consume.
That said implies: The warmup MUST be part of the usage scenario somewhere as this is actually compute that needs to happen to bring the application to a state that it will also be later used in.

I see two variants here:

A: Applications that fade into performance through user requests (very typical for web applications. They are just deployed and the caches warm over time)
B: Business critical applications that are never deployed without a warm cache (the application is here warmed through a synthetic load beforehand)

In both cases I would argue that the warmup shall be part of the usage scenario. In case A I would make it part of the Runtime phase as a separate step. In case B I would make it part of the boot process through the already mentioned setup-commands.

Here is an example for instance of an app we monitor that comes with a warmup script that is however not typically how the app is deployed. Thus it is part of the runtime phase.

Having an optionally empty warmup phase is for sure also a way to go, but it would limit the GMT to have this phase at a fixed location. I think it is quite valuable to have the Warmup in the runtime phase and maybe not even as the first step.

Hope I could give some insights. How does the feedback resonate with you?

2 replies

davidkopp Dec 19, 2023
Author

Thank you for these insights! This was helpful for me.

I think I will go with Variant A and add a warm-up flow step to my usage scenarios. The only challenge will be to determine the right amount of requests for my application to be considered warm. Without any experience, this will require some measurements beforehand.

I assume that people coming from the performance benchmarking of Java applications would argue differently than you. In the field of performance benchmarking, you typically ignore the first measurements during warm-up and only consider the measurement results when the application is considered to be warm.

ArneTR Dec 20, 2023
Maintainer

I would argue the same way when it comes to performance benchmarking. However when your main goal is to account and quantify the energy consumption and CO2 emission of an application I think this cannot just be neglected and is actually a vital part of the data.
Since this is actually compute that must happen at some point to bring the application to it's operating point it is also energy consumption and CO2 emission that happens.

My 2 cents :)

ArneTR · 2023-12-19T14:42:39Z

ArneTR
Dec 19, 2023
Maintainer

Answer #2 on your questions regarding the JVM:

I think the performance gain is due to how the java compiler behaves. Do you know about the tuning switches?

I have not worked with java lately, but I know they exist. So I consulted chatGPT for it. This seems like a good starting point:

If you want to tune the Java Virtual Machine (JVM) without using GraalVM and specifically want to adjust the JIT compiler settings, you can explore options available in the standard HotSpot JVM, which is the default JVM for Oracle JDK and OpenJDK.

Here are some common JVM tuning options related to the JIT compiler:

1. **Disable JIT Compilation (Not Recommended):**
   While it's generally not recommended, you can disable the JIT compiler by using the following option:

   ```bash
   java -Djava.compiler=NONE -jar your-application.jar
   ```

   This option instructs the JVM to use the interpreter only and not perform any Just-In-Time compilation.

2. **Adjust JIT Compiler Thresholds:**
   You can adjust the thresholds at which the JIT compiler kicks in. These options control when methods are compiled based on their invocation count:

   ```bash
   java -XX:CompileThreshold=1000 -XX:CompileCommand=print,YourClass::yourMethod -jar your-application.jar
   ```

   - `CompileThreshold`: Sets the number of invocations before a method is compiled.
   - `CompileCommand`: Specifies a command to execute when a method is compiled. In this example, it prints information about the compilation of a specific method.

3. **Print Compilation Information:**
   You can use the following option to print information about JIT compilation:

   ```bash
   java -XX:+PrintCompilation -jar your-application.jar
   ```

   This option will print information about methods as they are compiled by the JIT compiler.

4. **Use Different JIT Compiler Levels:**
   HotSpot JVM provides different levels of JIT compilation, such as client and server modes. You can experiment with different compilation levels:

   ```bash
   java -client -jar your-application.jar
   ```

   or

   ```bash
   java -server -jar your-application.jar
   ```

   The client mode is optimized for faster startup times, while the server mode is optimized for long-running server applications.

It's important to note that fine-tuning the JVM for performance can be complex and application-specific. It often involves analyzing the behavior of your application, monitoring performance, and experimenting with different JVM options. Be cautious when disabling JIT compilation, as it may significantly impact the performance of your Java application. Always test and profile the application under realistic conditions to ensure that the chosen settings provide the desired performance improvements.

My first step would be to query the JVM what the default settings are for the container you are using and then driving these values up / down and see how the tests change.

If you do, please ping me with the results as I am really curious to see them! I can imagine even making a case study / article about it if that is something that is interesting for you.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warming up of applications #595

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Warming up of applications #595

davidkopp Dec 14, 2023

Replies: 3 comments · 2 replies

davidkopp Dec 15, 2023 Author

ArneTR Dec 19, 2023 Maintainer

davidkopp Dec 19, 2023 Author

ArneTR Dec 20, 2023 Maintainer

ArneTR Dec 19, 2023 Maintainer

davidkopp
Dec 14, 2023

Replies: 3 comments 2 replies

davidkopp
Dec 15, 2023
Author

ArneTR
Dec 19, 2023
Maintainer

davidkopp Dec 19, 2023
Author

ArneTR Dec 20, 2023
Maintainer

ArneTR
Dec 19, 2023
Maintainer