Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
102 lines (71 sloc) 4.85 KB

Test-driving ARM cloud servers for development

ARM is coming to the cloud. Looking at some marketing materials, it seems the primary target is so-called "scale-out workloads", like lightweight web servers, that need to stay up but aren't exactly CPU-intensive. It's slower than x86, but it's also cheaper than x86! That seems to be the slogan.

My use case is the opposite. I am interested in using ARM cloud servers as a development environment, to troubleshoot porting issues to the ARM architecture. This often involves building large amounts of code, so CPU performance is important. On the other hand, compilation parallelizes well, so with enough cores (or vCPUs, as they say on the cloud) single-core performance is less important.

In this post, I will compare ARM cloud server offerings from Scaleway, AWS, and Packet. To benchmark them as a development environment, I chose building LLVM 9.0.0 in Release configuration as a workload. If you know of other ARM cloud server offerings I should try, please let me know.

LLVM was configured as follows:

$ cmake -G Ninja -D CMAKE_BUILD_TYPE=Release ..

To provide the context for numbers below, here is the result for my local build server at home, 4 cores of Intel Core i7-6700K with 16 GB of RAM.

$ time ninja -j4
real    23m9.424s
user    88m45.267s
sys     3m51.048s

4.2% spent in kernel, with 100.0% utilization. Took 92.6 minutes in single-core basis.

Local build server runs Debian GNU/Linux unstable, amd64. I chose Ubuntu 18.04 arm64 port for cloud servers. All systems were idle except for the build (and systemd and sshd). Everything else was the default unless specified otherwise. Note that cloud pricing is ever-shifting: pricing information below is as of the publication of this article.

Scaleway

Scaleway's ARM64 instances come in seven different kinds, from ARM64-2GB to ARM64-128GB. I chose ARM64-4GB for a comparison. It comes with 6 vCPUs of Cavium ThunderX and 4 GB of RAM. Price was 0.012 EUR per hour.

I found that building with 6 jobs in parallel results in out-of-memory error. It seems about 1 GB of RAM per job is required, so I used 4 jobs instead.

$ time ninja -j4
real    256m33.810s
user    992m21.012s
sys     32m41.956s

3.2% spent in kernel, with 99.9% utilization. Took 1025.0 minutes in single-core basis.

AWS

AWS's A1 instances come in five different kinds, from a1.medium to a1.4xlarge. I chose a1.large for a comparison. It comes with 2 vCPUs of AWS Graviton (using ARM Neoverse cores) and 4 GB of RAM. Price was 0.0642 USD per hour in Tokyo region, which is the closest to me. (Seoul region is closer, but A1 instances are not available in Seoul.)

$ time ninja -j2
real    170m18.482s
user    326m17.396s
sys     14m3.670s

4.1% spent in kernel, with 99.9% utilization. Took 340.4 minutes in single-core basis.

Packet

Packet is a "bare metal" cloud. You need to rent the whole machine and can't rent virtualized slices. I chose c1.large.arm and c2.large.arm for a comparison.

c1.large.arm has 96 cores of Cavium ThunderX (the same chip used by Scaleway) and 128 GB of RAM. Price was 0.5 USD per hour. c2.large.arm has 32 cores of Ampere eMAG and 128 GB of RAM. Price was 1 USD per hour.

$ time ninja -j96
real    12m35.765s
user    883m14.923s
sys     42m58.907s

4.6% spent in kernel, with 76.6% utilization. Took 926.2 minutes in single-core basis. LLVM build is parallel, but not 96 jobs parallel. As expected, the result is comparable to Scaleway.

$ time ninja -j32
real    13m38.964s
user    404m19.502s
sys     15m52.135s

3.8% spent in kernel, with 96.2% utilization. Took 420.2 minutes in single-core basis. LLVM build is 32 jobs parallel!

Summary

I used 1 EUR = 1.1 USD in the table below. Price is per core per hour in USD. Build time is per core in minutes.

Name Hardware Price Build time
Local Intel i7-6700K 92.6
Scaleway ARM64 Cavium ThunderX 0.0033 1025.0
AWS A1 AWS Graviton 0.0321 340.4
Packet c1.large.arm Cavium ThunderX 0.0052 926.2
Packet c2.large.arm Ampere eMAG 0.0313 420.2

Here it is again in relative numbers standardized at Scaleway = 1. Performance is reciprocal of build time.

Name Price Performance
Local 11.1
Scaleway ARM64 1.0 1.0
AWS A1 9.7 3.0
Packet c1.large.arm 1.6 1.1
Packet c2.large.arm 9.5 2.4

Local x86 is >10x faster than Scaleway and >3x faster than AWS. While AWS is 3x faster than Scaleway, it's also ~10x expensive. Packet c1.large.arm is comparable to Scaleway and c2.large.arm is comparable to AWS, but they can't be rented in slices.

You can’t perform that action at this time.