# Java Service Startup

This started as gathering Java snapshot performance numbers for marketing/BD purposes, but evolved into something bigger as container startup overhead was added. Now this is a mix of performance numbers and product implementation lessons.

Startup time for applications is becoming increasingly important as the world moves from monolitic applications to applications as a collection of microservces. As the number of microservices grows it becomes increasingly expensive to maintain acceptable microservice response time by pre-running binaries for these microservices in memory.

Many microservices are in fact implenented in Java using frameworks like Spring Boot and Micronaut. These frameworks automate configuration and compoment linkage tasks that are extrememly error-prone when done by hand. The cost for this is long start-up times as the compoments are initialized and the application is ready to process requests.

The Konatin Monitor (KM) can start an on-disk image of an application after it has been initialized to be started and ready to process requests in a fraction of the time of the original application.

The perfomance metric is the 'startup to active' interval with the following definitions:

startup
: the time when the application process is started.

active
: the time when the process first responds to HTTP requests as measured by a `curl` loop with a 1 ms sleep.

These tests were run against the Micronaut application  (`https://github.com/CopadoSolutions/micronaut-poc.git`). Previous results showed that the Micronaut application's startup time is significantly faster than Spring Boot application while the KM snapshot startup time was the same for both.

The application requires a `redis` server and a `postgres` server. Docker containers with the latest public image for these components were used. We are only measuring the startup timeof the Micronaut application. Hence, the `redis` and `postgres` servers are running throughout the startup time measurements. 

Previous results showed that when run from the CLI, 
>The mean startup time for the native Micronaut application was 1.352 seconds with a standard deviation of 0.004 seconds. In contrast, a KM snapshot of the application had a mean startup time of 0.074 seconds with a standard deviation of 0.004 seconds.

These results serve as a baseline for further measurments with `docker run`. Anything beyond the baseline is attributed to being Docker overhead.

## Docker Overhead Measurements

For sets of tests were run, two for Native Java and two for KM snapshots. One set of tests started docker with standard network virtualization. The second set ran with `--network=host` to isolate the network virtualization. A summary of the results is given in the table below.

| Test | Docker Start | Baseline | Difference | Std Dev |
| ---- | ---- | ---- | ---- | ---- |
| Native Java (docker -p 9091:9091) | 1.876 | 1.352 | 0.524 | 0.036 |
| Native Java (docker --network=host) | 1.747 | 1.352 | 0.395 | 0.034 |
| KM Snapshot (docker -p 9091:9091) | 1.370 | 0.074 | 1.296 | 0.184 |
| KM Snapshot (docker --network=host) | 0.560 | 0.074 | 0.486 | 0.171 |


What stands out here is the Native Java has fairly consistent results as measured by the Std Dev values. The `--network=host` option improved native startup by 0.129 sec consistently.

By contrast, the KM Snapshot measurements have fairly high variance as mesured by Std Dev. In addition, the difference between docker networking and host networking is much larger than the native case (0.810 sec vs 0.129 sec).

In either case, docker startup is an order of magnitude slower than KM snapshot startup.

## Technical Design Information

The most important product implementation lesson learned is initializing the virtual network typically takes over a second. If possible, we want to use a pool of pre-created virtual networks rather than create the virtual netowks on the fly. This applies to all Kontainer types, not just Java.

The biggest lesson learned about Java is the overhead of the class loader. It is high. Startup to active time for Java processes can be greatly improved by warming up the application before taking a snapshot. For example, in the Capado Spring Boot example we saw this behavior:

| Snaphot Created | Time to Active | 
| ------------- | ------- |
| After an HTTP call | ~70ms |
| Before an HTTP call | ~300ms |


In [52]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

def mean_time(times):
    # time is nanosecond value. Convert to seconds.
    return (np.mean([(x[1] - x[0]) for x in times])/1000000000, 
            np.std([(x[1] - x[0]) for x in times])/1000000000)

### Docker Native Java

Run the Capado test service inside a container with `docker run`.


In [80]:
# docker run -p9091:9091
mn_docker_native_run = [
    [1595278353511510465, 1595278355367045560],
    [1595278398008031140, 1595278399898038626],
    [1595278536101508326, 1595278537926023951],
    [1595278607566591376, 1595278609433606683],
    [1595278664878484104, 1595278666822209275],
    [1595278900991058176, 1595278902868976371],
]

res = mean_time(mn_docker_native_run)
print("Docker Native startup time (-p 9091:9091):   {} stddev: {}".format(res[0], res[1]))

nm_docker_native_nonet_run = [
    [1595279518128891227, 1595279519835081660],
    [1595279555340967913, 1595279557095066808],
    [1595279642346918183, 1595279644074168690],
    [1595279686626700561, 1595279688433605363],
    [1595279718472849617, 1595279720214826372]
]
res2 = mean_time(nm_docker_native_nonet_run)
print("Docker Native Startup Time (--network=host): {} stddev:{}".format(res2[0], res2[1]))

Docker Native startup time (-p 9091:9091):   1.8764528131666667 stddev: 0.03636742039838661
Docker Native Startup Time (--network=host): 1.7472842784 stddev:0.03382091590909761


### Docker KM Snapshot



In [89]:
nm_docker_snap_run = [
    [1595280583538337306, 1595280584696069852],
    [1595280744765748262, 1595280746230770515],
    [1595280793914133391, 1595280795446732738],
    [1595280839758130102, 1595280841316198830],
    [1595281110020447581, 1595281111158720069]
]
res3 = mean_time(nm_docker_snap_run)
print("Docker Snapshot Startup Time (-p 9091:9091): {} stddev:{}".format(res3[0], res3[1]))

nm_docker_snap_nonet_run = [
    [1595281673304859810, 1595281673668762526],
    [1595281754360749812, 1595281755042536242],
    [1595281860439049533, 1595281861131077214],
    [1595281927239362437, 1595281927963292009],
    [1595282015781949280, 1595282016120140919]
]
res4 = mean_time(nm_docker_snap_nonet_run)
print("Docker Snapshot Startup Time (--network=host): {} stddev:{}".format(res4[0], res4[1]))

Docker Snapshot Startup Time (-p 9091:9091): 1.3703390724000002 stddev:0.1841695738478267
Docker Snapshot Startup Time (--network=host): 0.5599676076 stddev:0.1713413179672848
