Create support files to run benchmarks in GKE. (#2756)

To build and run benchmarks in a (mostly) reproducible way I created configuration files to create a Docker image with the benchmarks installed, and to run said Docker image in GKE. Eventually I would like these to automatically build and run using Google Cloud Build, but for now they are very handy when we need to run 20 copies of the benchmarks.
googleapis · Jun 12, 2019 · 0347fb5 · 0347fb5
1 parent e8c138c
commit 0347fb5
Show file tree

Hide file tree

Showing 5 changed files with 414 additions and 0 deletions.
diff --git a/ci/benchmarks/Dockerfile b/ci/benchmarks/Dockerfile
@@ -0,0 +1,100 @@
+# Copyright 2019 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+FROM ubuntu:bionic AS google-cloud-cpp-dependencies
+
+RUN apt update && \
+    apt install -y build-essential cmake git gcc g++ cmake \
+        libc-ares-dev libc-ares2 libcurl4-openssl-dev libssl-dev make \
+        pkg-config tar wget zlib1g-dev
+
+# #### crc32c
+
+WORKDIR /var/tmp/build
+RUN wget -q https://github.com/google/crc32c/archive/1.0.6.tar.gz
+RUN tar -xf 1.0.6.tar.gz
+WORKDIR /var/tmp/build/crc32c-1.0.6
+RUN cmake \
+      -DCMAKE_BUILD_TYPE=Release \
+      -DBUILD_SHARED_LIBS=yes \
+      -DCRC32C_BUILD_TESTS=OFF \
+      -DCRC32C_BUILD_BENCHMARKS=OFF \
+      -DCRC32C_USE_GLOG=OFF \
+      -H. -Bcmake-out/crc32c
+RUN cmake --build cmake-out/crc32c --target install -- -j $(nproc)
+RUN ldconfig
+
+# #### Protobuf
+
+WORKDIR /var/tmp/build
+RUN wget -q https://github.com/google/protobuf/archive/v3.7.1.tar.gz
+RUN tar -xf v3.7.1.tar.gz
+WORKDIR /var/tmp/build/protobuf-3.7.1/cmake
+RUN cmake \
+        -DCMAKE_BUILD_TYPE=Release \
+        -DBUILD_SHARED_LIBS=yes \
+        -Dprotobuf_BUILD_TESTS=OFF \
+        -H. -Bcmake-out
+RUN cmake --build cmake-out --target install -- -j $(nproc)
+RUN ldconfig
+
+# #### gRPC
+
+WORKDIR /var/tmp/build
+RUN wget -q https://github.com/grpc/grpc/archive/v1.21.0.tar.gz
+RUN tar -xf v1.21.0.tar.gz
+WORKDIR /var/tmp/build/grpc-1.21.0
+RUN make -j $(nproc)
+RUN make install
+RUN ldconfig
+
+# Get the source code
+
+FROM google-cloud-cpp-dependencies AS google-cloud-cpp-build
+
+WORKDIR /w
+COPY . /w
+
+# #### google-cloud-cpp
+
+ARG CMAKE_BUILD_TYPE=Release
+ARG SHORT_SHA=""
+
+RUN cmake -H. -Bcmake-out \
+    -DGOOGLE_CLOUD_CPP_DEPENDENCY_PROVIDER=package -DBUILD_TESTING=OFF \
+    -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} \
+    -DGOOGLE_CLOUD_CPP_BUILD_METADATA=${SHORT_SHA}
+RUN cmake --build cmake-out -- -j $(nproc)
+WORKDIR /w/cmake-out
+RUN cmake --build . --target install
+
+# ================================================================
+
+# Prepare the final image, this image is much smaller because we only install:
+# - The final binaries, without any intermdiate object files.
+# - The run-time dependencies, without the build tool dependencies.
+FROM ubuntu:bionic
+RUN apt update && \
+    apt install -y ca-certificates libc-ares2 libcurl4 \
+    libstdc++6 libssl1.1 zlib1g
+RUN /usr/sbin/update-ca-certificates
+
+COPY --from=google-cloud-cpp-build /usr/local/lib /usr/local/lib
+COPY --from=google-cloud-cpp-build /usr/local/bin /usr/local/bin
+COPY --from=google-cloud-cpp-build /w/cmake-out/google/cloud/storage/benchmarks/*_benchmark /r/
+COPY --from=google-cloud-cpp-build /w/cmake-out/google/cloud/storage/examples/*_samples /r/
+COPY --from=google-cloud-cpp-build /w/ci/benchmarks/storage/*.sh /r
+RUN ldconfig
+
+CMD ["/bin/false"]
diff --git a/ci/benchmarks/README.md b/ci/benchmarks/README.md
@@ -0,0 +1,30 @@
+# Setting up GKE Deployments for the Benchmarks.
+
+`google-cloud-cpp` has a number of benchmarks that cannot be executed as part
+of the CI builds because:
+
+- They take too long to run (multiple hours).
+- Their results require an analysis pipeline to determine if there is a problem.
+- They need production resources (and therefore production credentials) to run.
+
+Instead, we execute these benchmarks in a Google Kubernetes Engine (GKE)
+cluster. Currently these benchmarks are manually started. Eventually we would
+like to setup a continuous deployment (CD) pipeline using Google Cloud Build and
+GKE.
+
+## Creating the Docker image
+
+The benchmarks run from a single Docker image. To build this image we use
+Google Cloud Build:
+
+```console
+$ gcloud builds submit \
+    --substitutions=SHORT_SHA=$(git rev-parse --short HEAD) \
+    --config cloudbuild.yaml .
+```
+
+## Setting up GKE and other configuration
+
+Please read the README file for the [storage](storage/README.md) benchmarks for
+more information on how to setup a GKE cluster and execute the benchmarks in
+this cluster.
diff --git a/ci/benchmarks/storage/README.md b/ci/benchmarks/storage/README.md
@@ -0,0 +1,180 @@
+# Setting up GKE Deployments for the GCS C++ Client Benchmarks.
+
+This document describes how to setup GKE deployments to continuously run the
+Google Cloud Storage C++ client benchmarks. Please see the general
+[README](../README.md) about benchmarks for the motivations behind this design.
+
+## Creating the Docker image
+
+The benchmarks run from a single Docker image. To build this image we use
+Google Cloud Build:
+
+```console
+$ cd $HOME/google-cloud-cpp
+$ gcloud builds submit \
+     --substitutions=SHORT_SHA=$(git rev-parse --short HEAD) \
+     --config cloudbuild.yaml .
+```
+
+## Create a GKE cluster for the Storage Benchmarks
+
+Select the project and zone where you will run the benchmarks.
+
+```console
+$ PROJECT_ID=... # The name of your project
+                 # e.g., PROJECT_ID=$(gcloud config get-value project)
+$ STORAGE_BENCHMARKS_ZONE=... # e.g. us-central1-a
+$ STORAGE_BENCHMARKS_REGION="$(gcloud compute zones list \
+    "--filter=(name=${STORAGE_BENCHMARKS_ZONE})" \
+    "--format=csv[no-heading](region)")"
+```
+
+Create the GKE clusters. All the performance benchmarks run in one cluster,
+because the work well with a single CPU for each.
+
+```console
+$ gcloud container clusters create --zone=${STORAGE_BENCHMARKS_ZONE} \
+      --num-nodes=30 storage-benchmarks-cluster
+```
+
+Pick a name for the service account, for example:
+
+```console
+$ SA_NAME=storage-benchmark-sa
+```
+
+Then create the service account:
+
+```console
+$ gcloud iam service-accounts create ${SA_NAME} \
+    --display-name="Service account to run GCS C++ Client Benchmarks"
+```
+
+Grant the service account `roles/storage.admin` permissions. The benchmarks
+create and delete their own buckets. This is the only role that grants these
+permissions, and incidentally also grants permission to read, write, and delete
+objects:
+
+```console
+$ gcloud projects add-iam-policy-binding ${PROJECT_ID} \
+      --member serviceAccount:${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
+      --role roles/storage.admin
+```
+
+Create new keys for this service account and download then to a temporary place:
+
+```console
+$ gcloud iam service-accounts keys create /dev/shm/key.json \
+      --iam-account ${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com
+```
+
+Copy the key to the GKE cluster:
+
+```console
+$ gcloud container clusters get-credentials \
+          --zone "${STORAGE_BENCHMARKS_ZONE}" storage-benchmarks-cluster
+$ kubectl create secret generic service-account-key \
+        --from-file=key.json=/dev/shm/key.json
+```
+
+And then remove it from your machine:
+
+```bash
+$ rm /dev/shm/key.json
+```
+
+### Starting the Throughput vs. CPU Benchmark
+
+The throughput vs. CPU benchmark measures the effective upload and download
+throughput for different object sizes, as well as the CPU required to achieve
+this throughput. In principle, we want the client libraries to achieve high
+throughput with minimal CPU utilization.
+
+The benchmark uploads an object of random size, measures the throughput and CPU
+usage, then downloads the same object and measures the throughput and CPU
+utilization. The results are reported to the standard output as a CSV file.
+
+Create a bucket to store the benchmark results:
+
+```console
+$ LOGGING_BUCKET=... # e.g. ${PROJECT_ID}-benchmark-logs
+$ gsutil mb -l us gs://${LOGGING_BUCKET}
+```
+
+Start the deployment:
+
+```console
+$ gcloud container clusters get-credentials \
+    --zone "${STORAGE_BENCHMARKS_ZONE}" storage-benchmarks-cluster
+$ VERSION=$(git rev-parse --short HEAD) # The image version.
+$ sed -e "s/@PROJECT_ID@/${PROJECT_ID}/" \
+      -e "s/@LOGGING_BUCKET@/${LOGGING_BUCKET}/" \
+      -e "s/@REGION@/${STORAGE_BENCHMARKS_REGION}/" \
+      -e "s/@VERSION@/${VERSION}/" \
+    ci/benchmarks/storage/throughput-vs-cpu-job.yaml | kubectl apply -f -
+```
+
+To upgrade the deployment to a new image:
+
+```console
+$ VERSION= ... # New version
+$ kubectl set image deployment/storage-throughput-vs-cpu \
+    "benchmark-image=gcr.io/${PROJECT_ID}/google-cloud-cpp-benchmarks:${VERSION}"
+$ kubectl rollout status -w deployment/storage-throughput-vs-cpu
+```
+
+To restart the deployment
+
+```console
+$ sed -e "s/@PROJECT_ID@/${PROJECT_ID}/" \
+      -e "s/@LOGGING_BUCKET@/${LOGGING_BUCKET}/" \
+      -e "s/@REGION@/${STORAGE_BENCHMARKS_REGION}/" \
+      -e "s/@VERSION@/${VERSION}/" \
+    ci/benchmarks/storage/throughput-vs-cpu-job.yaml | \
+  kubectl replace --force -f -
+```
+
+## Analyze Throughput-vs-CPU results.
+
+If you haven't already, create a BigQuery dataset to hold the data:
+
+```console
+$ bq mk --dataset  \
+     --description "Holds data and results from GCS C++ Client Library Benchmarks" \
+     "${PROJECT_ID}:storage_benchmarks"
+```
+
+Create a table in this dataset to hold the benchmark results:
+
+```console
+$ TABLE_COLUMNS=(
+    "op:STRING"
+    "object_size:INT64"
+    "chunk_size:INT64"
+    "buffer_size:INT64"
+    "elapsed_us:INT64"
+    "cpu_us:INT64"
+    "status:STRING"
+    "version:STRING"
+)
+$ printf -v schema ",%s" "${TABLE_COLUMNS[@]}"
+$ schema=${schema:1}
+$ bq mk --table --description "Raw Data from throughput-vs-cpu benchmark" \
+    "${PROJECT_ID}:storage_benchmarks.throughput_vs_cpu_data" \
+    "${schema}"
+```
+
+Make a list of all the objects:
+
+```console
+$ objects=$(gsutil ls gs://${PROJECT_ID}-benchmark-logs/throughput-vs-cpu/ | wc -l)
+$ max_errors=$(( ${objects} * 2 ))
+```
+
+Upload them to BigQuery:
+
+```console
+$ gsutil ls gs://${PROJECT_ID}-benchmark-logs/throughput-vs-cpu/ | \
+  xargs -I{} bq load --noreplace --skip_leading_rows=16 --max_bad_records=2 \
+       "${PROJECT_ID}:storage_benchmarks.throughput_vs_cpu_data" {}
+```
diff --git a/ci/benchmarks/storage/throughput-vs-cpu-driver.sh b/ci/benchmarks/storage/throughput-vs-cpu-driver.sh
@@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+# Copyright 2019 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -eu
+
+echo "# ================================================================"
+
+if [[ $# -lt 1 ]]; then
+  echo "Usage: $(basename "$0") <logging-bucket> [benchmark args...]"
+  exit 1
+fi
+
+readonly LOGGING_BUCKET="$1"
+shift 1
+
+# Disable -e because we always want to upload the log, even partial results
+# are useful.
+set +e
+/r/storage_throughput_vs_cpu_benchmark "$@" 2>&1 | tee tp-vs-cpu.txt
+exit_status=$?
+
+# We need to upload the results to a unique object in GCS. The program creates
+# a unique bucket to run, so we can use the same bucket name as part of a unique
+# name.
+readonly B="$(sed -n 's/# Running test on bucket: \(.*\)/\1/p' tp-vs-cpu.txt)"
+# While the bucket name (which is randomly generated and fairly long) is almost
+# guaranteed to be a unique name, we also want to easily search for "all the
+# uploads around this time". Using a timestamp as a prefix makes that relatively
+# easy.
+readonly D="$(date -u +%Y-%m-%dT%H:%M:%S)"
+readonly OBJECT_NAME="throughput-vs-cpu/${D}-${B}.txt"
+
+/r/storage_object_samples upload-file tp-vs-cpu.txt \
+    "${LOGGING_BUCKET}" "${OBJECT_NAME}" >/dev/null
+
+exit ${exit_status}