Performance
How to measure a logging library performance? It is a good question. Here we are going to make a relative comparison with log4j well-known among java community. We will take one logger configuration for gflog and log4j and see how they perform from 3 different angles: sustained throughput, service latency and memory footprint. We will use log4j async garbage-free mode and gflog almost-default configuration. For sanity's sake we will consider only one benchmark case with 5 params message. But you can always check out the sources to change, build and run any benchmark, as well as the results to different benchmark cases tested on the same environment (you need to clone wiki repo and go to benchmark directory).
Software/Hardware:
- Java 11 (OpenJDK 11.0.10)
- CentOS 7
- Kernel Boot Parameters: isolcpus=2-7 nohz_full=2-7 rcu_nocbs=2-7
- Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz (8 cores, HT disabled)
- Samsung SSD 850 (500 GB capacity, 540 MB/sec sequential read, 520 MB/sec sequential write)
Hyper-threading is disabled. 2-7 cpus are isolated from the kernel scheduler. Application threads use thread affinity to pin themselves to a specific cpu in sequential order (0 -> 2, 1 -> 3, etc). The background logger threads are not pinned and allowed to migrate from 0-1 cpus.
Versions:
- gflog 3.0.1
- log4j 2.14.0, disruptor 3.4.2.
gflog config:
<config>
<appender name="noop" factory="com.epam.deltix.gflog.benchmark.util.NoOpAppenderFactory"/>
<!-- noop or file (see the value in chart's brackets)-->
<appender name="file" factory="com.epam.deltix.gflog.core.appender.FileAppenderFactory" file="${temp-file}">
<layout template="%d %p '%c' [%t] %m%n"/>
</appender>
<logger>
<appender-ref ref="noop"/>
<!-- or -->
<appender-ref ref="file"/>
</logger>
<service entryEncoding="UTF-8"/>
</config>
The basic idea is not to change gflog defaults to provide optimal performance out of the box. So only one changed setting is entry encoding. ASCII encoding is default one and a subset of UTF-8 encoding. It is changed to make an apple-to-apple comparison, though only ASCII characters are used in benchmarks.
log4j config:
<Configuration name="Benchmark" status="info">
<Appenders>
<Null name="noop">
</Null>
<!-- noop or file (see the value in chart's brackets)-->
<File name="file" fileName="${sys:temp-file}" immediateFlush="false">
<PatternLayout pattern="%d %p '%c' [%t] %m%n"/>
</File>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="noop"/>
<!-- or -->
<AppenderRef ref="file"/>
</Root>
</Loggers>
</Configuration>
log4j system properties:
Log4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
AsyncLogger.RingBufferSize=262144
AsyncLogger.WaitStrategy=Yield
log4j2.enable.threadlocals=true
log4j2.enable.direct.encoders=true
The same log4j configuration is used as here.
Benchmark parameters:
- Logger Name (30 chars): "123456789012345678901234567890"
- Thread Name (23 chars): "12345678901234567890123"
- Log Message (52 chars): "Some array: [string,c,1234567,12345678901234,string]"
- Log Record (122 chars): "2021-03-30 17:48:43.390 INFO '123456789012345678901234567890' [main] Some array: [string,c,1234567,12345678901234,string]\n"
- Log Message Params (5): string, int, long, char, string
Java Microbenchmark Harness (JMH) library is used to measure sustained throughput:
- Warmup Iterations: 2
- Warmup Interval: 10 sec
- Measurement Iterations: 5
- Measurement Interval: 10 sec
- Forks: 3
- Threads: 1-6
- Affinity: 2-7
A no-operation (noop) appender is used to measure the max achievable sustained throughput. Log records are formatted and passed to the background thread, but the appender does nothing.
A file appender is used to measure the sustained throughput when logging to a file.
Disk write throughput utilization from the declared (520 MB/s):
- gflog - 71.55% - (3.05 * 122 / 520)
- log4j - 29.56% - (1.26 * 122 / 520)
The following code is used to measure service latency:
- Warmup Duration: 1 min
- Measurement Duration: 3 min
- Thread Interval: 50 us
- Thread Throughput: 10K
- Overall Throughput: 50K
- Threads: 5
- Affinity: 2-6
A file appender is used to measure the service latency. The duration (3 min) is split into intervals (50 us per a thread) at which a message is logged. An application thread catches up instantly if the next point is missed. Note that it might cause a batch of messages to be logged at once on a context switch.
baseline is an empty method and represents the overhead of the harness which runs latency benchmarks.
timestamp is System.currentTimeMillis() and represents the cost of getting the current time within the harness.
Java Object Layout (JOL) library is used to estimate memory footprint. It is run to traverse object reachability graph from a logger instance after logging 1/1M/10M/100M messages with the following random distribution:
- 50% - a message with a length from 0 to 100
- 20% - a message with a length from 0 to 200
- 10% - a message with a length from 0 to 300
- 10% - a message with a length from 0 to 400
- 9% - a message with a length from 0 to 500
- 1% - a message with a length from 0 to 1000
gflog top 5 (java: 11, messages: 1)
Instances | Footprint/Instances (bytes) | Footprint (bytes) | Type |
---|---|---|---|
3512 | 200 | 702416 | byte[] |
2443 | 36 | 88256 | java.lang.Object[] |
3390 | 24 | 81360 | java.lang.String |
2338 | 32 | 74816 | java.lang.invoke.LambdaForm$Name |
148 | 504 | 74624 | int[] |
24960 | - | 1654232 | (total) |
gflog top 5 (java: 11, messages: 1 million)
Instances | Footprint/Instances (bytes) | Footprint (bytes) | Type |
---|---|---|---|
3738 | 190 | 710792 | byte[] |
1235 | 72 | 88920 | java.lang.reflect.Field |
2443 | 36 | 88256 | java.lang.Object[] |
3602 | 24 | 86448 | java.lang.String |
632 | 120 | 75840 | java.lang.Class |
25984 | - | 1702872 | (total) |
log4j top 7 (java: 11, messages: 1)
Instances | Footprint/Instances (bytes) | Footprint (bytes) | Type |
---|---|---|---|
262144 | 104 | 27262976 | org.apache.logging.log4j.core.async.RingBufferLogEvent |
262143 | 32 | 8388576 | org.apache.logging.log4j.util.SortedArrayStringMap |
262144 | 24 | 6291456 | org.apache.logging.log4j.core.time.MutableInstant |
437 | 2583 | 1128808 | int[] |
601 | 1812 | 1089336 | java.lang.Object[] |
5486 | 148 | 816392 | byte[] |
5331 | 24 | 127944 | java.lang.String |
815846 | - | 45866000 | (total) |
log4j top 7 (java: 11, messages: 1 million)
Instances | Footprint/Instances (bytes) | Footprint (bytes) | Type |
---|---|---|---|
268263 | 283 | 76023096 | byte[] |
262144 | 104 | 27262976 | org.apache.logging.log4j.core.async.RingBufferLogEvent |
262745 | 60 | 15769368 | java.lang.Object[] |
262144 | 24 | 6291456 | java.lang.StringBuilder |
262144 | 24 | 6291456 | org.apache.logging.log4j.core.time.MutableInstant |
452 | 2497 | 1129048 | int[] |
5958 | 24 | 142992 | java.lang.String |
1342827 | - | 133755584 | (total) |
Note that JOL does not count off-heap memory and static roots. 10 MB is added to gflog result. 8 MB is off-heap ring buffer. 2 MB is extra space in reserve.