Skip to content

kreier/benchmark

Repository files navigation

Benchmark collection

There are countless benchmarks out there. This here is just an overview of benchmarks I used over the years, and their results. Some benchmarks were created back in 1972. I measured the speed of a Lattice C compiler against Omicron Basic on my Atari ST in 1992. The data collection presented here started in 2007 with Memtest86+ and the website http://saiht.de/computer/benchmark.html .

Whetstone - 1972

This synthetic benchmark from 1972 is one of the oldest ones to estimate the FLOPS - floating point operations a CPU can perform per second. Well described in an article of arstechnica from 2013. And it is neither related to wet nor a whetstone, but the town of Whetstone in England.

Linpack - FLOPS since 1979

Well described at Wikipedia this benchmark from 1979 is still usefull today because it scales with IPC (instructions per cycle), frequency and number of cores. It is therefore used since 1993 to determine the speed of supercomputers with a single value to put it in a list for comparison as the TOP500. It is measured in FLOPS in DP (double precision, 64bit).

Dhrystone - 1984

If whet (wet) is for floating point operations, then dhry (dry) is for integer performance. This led to the Dhrystone benchmark in 1984. Gave it a try as well.

NBench - 1996

This benchmark (Wikipedia) form the mid 1990s runs on a variety of hardware and has been maintained until 2012 by Uwe F. Mayer. For modern computers it is rather lightwight and can be compiled on a simple linux machine with gcc in merely seconds, then run for some minutes. I collect data for this benchmark since 2006.

The data gives some inside to the architechture of the CPU and the speed of the connected memory.

CoreMark - 2009 for embedded systems

Intended for embedded systems by EEMBC in 2009 for embedded system it is too large for an Arduino Uno, but runs from an Arduino Mega 2560 onwards to multithreaded Octacore Xeon processor. The results show that it scales mostly with frequency and a little with improved IPC.

Results 2020

The results vary from some 7 points for the Arduino Mega 2560 (the Leonardo and Uno have not enough RAM) to 39082 for an i7-13700T or 390614 in a 24-thread execution. That's 5580x or 55802x faster, almost 5 magnitudes! That's hard to display with linear bar graphs. The range of frequencies is vast as well from 16 MHz in the Arduino to 4600 MHz in a Quadcore i7, another 280x. Putting these two values (CoreMark and Frequency) into relationship narrows the differences:

Results 2020

It is still comparing an old 8bit CPUs with a modern 32bit ARM and 64bit x86 CPUs, with long pipelines, many registers and large caches. The difference in simple IPC is seen.

embench - 2019 for RISC-V comparison

When comparing the power of the new RISC-V ISA some people used drystone and coremark on May 12th, 2018. Not everybody was happy. While 4.9 CoreMarks/MHz and 2.5 DMIPS/Mhz sound interesting, there are shortcomings. To provide a proper tool for comparison they developed this suite from 2019 on. Version 0.5 was released in June 2020 at the Embedded World Conference in Nürnberg, Germany.

3Dmark - since 1999 for 3D graphics cards

Testing the speed of a 3D graphics card is inherent to its purpose of presenting a smooth image for the user. The benchmark from 1999 still runs on modern hardware. I longed for years to eventually have the hardware just to see the benchmark without stutter. Instead of investing in expensive hardware I relied on Moore's law and waited. Endurance payed off!

GeekBench 2007, 2019 (5.0)

Some collected values from version 2 to 5 and compared in a table.

Evolved speed of non volatile memory NVM over the years. Compare my Samsung R20 from 2008 with mechanical harddrive to my Lenovo Yoga 370 from 2017 with NVMe:

The benchmarks drove the innovation in browser development significantly, but became obsolete quite fast as well. Notable examples are Browermark 2.0, Basemark Web 3.0, peacekeeper, octane, sunspider, kraken, Jetstream and Speedometer.

GPU performance

With image classification using GPUs in AlexNet 2012 people saw the potential of GPUs for machine learning. And in 2022 the whole world payed attention with ChatGPT. The performance today is comparable with Supercomputers of the late 90s of the last century:

GFLOPS logarithmic

GLFOPS over time

A limiting factor for these LLMs is often the memory speed, not just processing power in GFLOPS. Here is how some of them compare to other ways of information transfer like Ethernet, DDR3 and USB4.

Comparison speed linear

The speed differences are so vast I had to include a logarithmic graph. This visualizes the magnitudes of differences between the solutions.

Comparison speed logarithmic

SuperPi 1M

Calcualting Pi to 1 million digits does not take too long nowadayes:

Prime numbers in basic, python and CPU

One of my first benchmarks, written 1992 in Omicron Basic on the Atari ST 520 STFM and then compared to an edition in Lattice C. Surprisingly the Basic variant was faster.

With text output in Mu it took the ESP32-S2 38.9 seconds. Commenting the print command in line 17 reduced the time to 13.3 seconds.

I used it for some microcomputers as well:

Frequency ESP8266 ESP32 Raspberry Pi 1 Raspberry Pi 4
40 MHz - 44427 ms
80 MHz 32807 ms 23323 ms
160 MHz 16113 ms 11375 ms
240 MHz - 7783 ms

Ideas are taken from:

Benchmark game - Which programming language is the fastest?

Compiled in 2018 (with history going back to 2002) several benchmarks compare the execution speed of programs in 24 languages. Many are optimized for multicore parallel execution, to make modern processors comparable. Probably disable the efficiency cores might speed up the processes.