Skip to content
Michal Kuratczyk edited this page Jan 30, 2024 · 5 revisions

Bazel, BuildBuddy, and GitHub Actions

Overview

Bazel, BuildBuddy & GitHub Actions have replaced Concourse as our primary test suite execution mechanism in 2021.

GitHub Actions is GitHub's automation/continuous integration feature. It allows workflows vaguely similar to Concourse pipelines to be defined in YAML, and when commited to the apropriate location in a repository, can be triggered by various events, such as new commits to a repo. However unlike Concourse, a single workflow cannot "watch" multiple repos jointly for changes. For most workflows in rabbitmq-server, Bazel is used to build and test.

Bazel is a build and test tool. It is useful to us as it supports caching of test results and remote parallel execution. Once we merged the broker and all of the tier-1 plugin repositories into a monorepo, the set of tests naively trigged by every commit to the monorepo was simply too large to ignore. Unfortunately Bazel does not have built in Erlang support, but since it is extensible, we wrote rules_erlang.

BuildBuddy is a hosted Bazel Remote Build Execution service. Most of our actual execution of test cases occurs on buildbuddy workers.

So graphical form, we have:

push new commit -> Test Workflow -> `bazel test //...` (Erlang 23) -> a_SUITE (executed by BuildBuddy)
                                                                   -> b_SUITE (executed by BuildBuddy)
                                 -> `bazel test //...` (Erlang 24) -> a_SUITE (executed by BuildBuddy)
                                                                   -> b_SUITE (executed by BuildBuddy)
                -> Test (Mixed Versions) Workflow -> ...
                _> ...

Using Bazel

Local Dependencies

Bazel rules used by RabbitMQ assume a number of developer tools available locally:

  • Modern C++ compiler toolchain (clang, g++): comes from the build-essential package on Debian-based Linux and XCode command line tools on macOS
  • sha256sum: comes from coreutils on Linux and the sha3sum formula via Homebrew on macOS

Using Bazel Locally (without RBE)

Bazel can be used to run test locally, just like make. First install Bazelisk, a user-friendly launcher for bazel that will also respect the .bazelversion file in the repository. The erlang and elixir installations used will be picked up from your PATH, or they can be specified by exporting ERLANG_HOME and ELIXIR_HOME environment variables. One should also copy user-template.bazelrc to user.bazelrc or $HOME/.bazelrc:

# rabbitmqctl wait shells out to 'ps', which is broken in the bazel macOS
# sandbox (https://github.com/bazelbuild/bazel/issues/7448)
# adding "--spawn_strategy=local" to the invocation is a workaround
build --spawn_strategy=local

# don't re-run flakes automatically on the local machine
build --flaky_test_attempts=1

build:buildbuddy --remote_header=x-buildbuddy-api-key=YOUR_API_KEY

# cross compile for linux (if on macOS) with rbe
build:rbe --host_cpu=k8
build:rbe --cpu=k8

Once the above is complete, you should be able to run some tests with

bazel test //deps/rabbit_common:all

So what is a test label? Bazel has a notion of repositories and packages, and the name of a target within that hierarchy is its label.

So, for instance, the label for the backing_queue_SUITE for the rabbit application from rabbitmq-server, found at deps/rabbit/test/backing_queue_SUITE.erl is //deps/rabbit:backing_queue_SUITE. And, if you want to run that suite, you can do so with

bazel test //deps/rabbit:backing_queue_SUITE

The complete label is actually @rabbitmq-server//deps/rabbit:backing_queue_SUITE, but the head can be left off if rabbitmq-server is the current repository.

You can also run tests matching a label pattern, so if you wanted to run all of the tests for the rabbit application (this will take a while and consume all available local CPU cores!), do it with

# runs ALL RabbitMQ server core test suites
bazel test //deps/rabbit:all

To build everything in rabbitmq-server, use

# runs ALL RabbitMQ server core test suites
bazel build //...

Finally, to run all test suites in the repository:

# runs ALL RabbitMQ server core test suites
bazel test //...

To know more about what :all or //... means, check this out.

Local Bazel Command Alternatives to Common Erlang.mk Tasks

Run All Test Suites in a Subproject

To execute all tests in a plugin, say, rabbitmq_shovel:

# erlang.mk
cd deps/rabbitmq_shovel
gmake tests
# Bazel
bazel test //deps/rabbitmq_shovel:all

Run a Single Test Suite in a Subproject

To execute one test in a plugin, say, rabbitmq_auth_backend_oauth2:

# erlang.mk
cd deps/rabbitmq_auth_backend_oauth2
gmake ct-unit
# Bazel
bazel test //deps/rabbitmq_auth_backend_oauth2:unit_SUITE

Run a Single Test

To execute one test in a test suite, say test case test_successful_token_refresh of group basic_happy_path of test suite system_SUITE of subproject rabbitmq_auth_backend_oauth2 :

# erlang.mk
gmake -C deps/rabbitmq_auth_backend_oauth2 ct-system t=basic_happy_path:test_successful_token_refresh
# Bazel
bazel test //deps/rabbitmq_auth_backend_oauth2:system_SUITE --test_env FOCUS="-group basic_happy_path -case test_successful_token_refresh"

Run Tests Repeatedly Even If Previous Results are Cached

bazel test --cache_test_results=no //deps/rabbitmq_auth_backend_oauth2:all

Open Common Test Log Results

To consult Common Test logs after running a test suite (or all test suites):

# erlang.mk
cd deps/rabbitmq_auth_backend_oauth2
gmake ct-unit
open logs/index.html
# Bazel
bazel test //deps/rabbitmq_auth_backend_oauth2:unit_SUITE
bazel run test-logs //deps/rabbitmq_auth_backend_oauth2:unit_SUITE

Open Node Data Directories After a Run

To inspect node data directories after a test run:

# erlang.mk
cd deps/rabbitmq_auth_backend_oauth2
gmake ct-unit
# opens top level directory for all test run data
open logs
# Bazel
bazel test //deps/rabbit:maintenance_mode_SUITE
# opens test run data directory of the last trun
bazel run test-node-data //deps/rabbit:maintenance_mode_SUITE

Code coverage

# erlang.mk
make -C deps/rabbitmq_amqp1_0 ct FULL=1 COVER=1
open deps/rabbitmq_amqp1_0/logs/index.html

In the browser, click on the test name, then on Coverage log.

# Bazel
bazel coverage //deps/rabbitmq_amqp1_0:all -t-
genhtml --output genhtml "$(bazel info output_path)/_coverage/_coverage_report.dat"
open genhtml/index.html

where genhtml is https://github.com/linux-test-project/lcov/blob/master/bin/genhtml and can be installed with brew install lcov on Mac OS.

Run a Node from Source with Several Plugins

# erlang.mk
gmake run-broker PLUGINS="rabbitmq_management rabbitmq_shovel rabbitmq_shovel_management rabbitmq_top" RABBITMQ_CONFIG_FILE=/path/to/rabbitmq.conf
# Bazel
 bazel run broker RABBITMQ_ENABLED_PLUGINS="rabbitmq_management,rabbitmq_shovel,rabbitmq_shovel_management,rabbitmq_top" RABBITMQ_CONFIG_FILE=/path/to/rabbitmq.conf

Use CLI Tools Built from Source

# erlang.mk, from the directory used to run 'gmake run-broker'
./sbin/rabbitmq-diagnostics status
# Bazel, from the directory used to run 'bazel run broker'
 bazel run rabbitmq-diagnostics status

# Running the CLIs through bazel is pretty slow. You can use the CLIs directly, once they are built:
./bazel-bin/broker-home/sbin/rabbitmqctl

# For even more convenience, just add this folder to your PATH (you may need to adjust it of course)
export PATH=$PATH:~/rabbitmq-server/bazel-bin/broker-home/sbin/rabbitmq-defaults

Run a Cluster from Source

# erlang.mk
gmake start-cluster NODES=5 TEST_TMPDIR="$HOME"/scratch/myrabbit
# Bazel
 bazel run start-cluster NODES=5 TEST_TMPDIR="$HOME"/scratch/myrabbit

Stop the cluster:

# erlang.mk
gmake stop-cluster NODES=5 TEST_TMPDIR="$HOME"/scratch/myrabbit
# Bazel
 bazel run stop-cluster NODES=5 TEST_TMPDIR="$HOME"/scratch/myrabbit

Build Docker image

# erlang.mk
gmake package-generic-unix
gmake docker-image
# Bazel
bazel run //packaging/docker-image:rabbitmq

Run XRef and Dialyzer

# erlang.mk
cd deps/rabbitmq_shovel
gmake xref
gmake dialyze
# Bazel
bazel test //deps/rabbitmq_shovel:xref
bazel test //deps/rabbitmq_shovel:dialyze
# to skip xref
bazel test //deps/rabbitmq_shovel:all --test_tag_filters="xref"
# to skip Dialyzer
bazel test //deps/rabbitmq_shovel:all --test_tag_filters="dialyze"
# to skip both xref and Dialyzer
bazel test //deps/rabbitmq_shovel:all --test_tag_filters="xref,dialyze"

Clean

# erlang.mk
gmake clean
# Bazel
bazel clean

Produce a Generic Binary Package (Generic UNIX Package)

# Bazel
bazel build "//:package-generic-unix"

Generate Language Server Files

To generate language server files for ./deps directories, run

bazel run //tools:symlink_deps_for_erlang_ls

Using Bazel with RBE

To use Bazel and leverage remote build execution with BuildBuddy, you need to create a BuildBuddy account and fill in the token value in the user.bazelrc file. Then, you can run tests with the rbe-23 or rbe-24 configurations active, such as bazel test //... --config=rbe-24.

When tests are run with with (or without) RBE, the logs can be found under the bazel-testlogs directory. This directory mirrors the package structure of the repo, so the backing_queue_SUITE logs will be found in the bazel-testlogs/deps/rabbit/backing_queue_SUITE directory.

A note on the RBE "environment"

The RBE configuration is such that remote execution uses the pivotalrabbitmq/rabbitmq-server-buildenv:linux-rbe docker image. The Dockerfile for that image can be found at https://github.com/rabbitmq/rabbitmq-ci/blob/main/docker/rabbitmq-server-buildenv/linux-rbe/Dockerfile. A Hush House pipeline watches that repo and will rebuild the image automatically when it changes. However, that is not enough for the change to propagate to RBE. After the image has been rebuilt, the nightly (also manually triggerable) https://github.com/rabbitmq/rbe-erlang-platform/actions/workflows/rbe_configs_gen.yaml GitHub Actions workflow must see the new image and create a corresponding PR. Only once that PR is merged will RBE pick up the new image.

Debugging failures and flakes in CI

By default, bazel does not stream test outputs, and typically there are many different tests running in parallel. In fact, streaming output with the --test_output=streamed flag disables remote execution and runs tests in sequence.

Therefore, when tests are run in CI, most of the logs are not visible until you follow the link in the GitHub Actions log to the run in BuildBuddy. It will look something like:

INFO: Streaming build results to: https://app.buildbuddy.io/invocation/43e8a3a0-78db-444d-8316-737426d65de1

From there, you can see the results of the run and click through to the logs of a failing test. Even then, to see the Erlang Common Test HTML logs, you will have to scroll to the bottom of the page and download the test.outputs__outputs.zip file. Furthermore, it's currently a limitation of BuildBuddy that if a test is sharded, it's not clear in the UI which log file goes with which shard, as they all the same name. I've raised this with them and they have said that they will fix it.

Thankfully these outputs are distinguishable on the machine that ran bazelisk test. When you see a failure in CI, remember that running the same suite from your local machine using remote execution is an option. Additionally, if the test flaky, one can easily run 10 or 20 copies (or even 100) in parallel using the approach and a few more flags. For example,

bazel test //deps/rabbitmq_federation:exchange_SUITE --config=rbe-24 -t- --runs_per_test=10

The -t- flag tells Bazel to ignore cached test results.

The exchange_SUITE is actually an interesting example, as it also currently a sharded test. When executed with the above flags, part of the output is:

//deps/rabbitmq_federation:exchange_SUITE                                PASSED in 144.6s
  Stats over 60 runs: max = 144.6s, min = 26.7s, avg = 91.7s, dev = 31.8s

Which is correct since each of the 6 shards is run 10 times, for a total of 60 runs (and 60 parallelizable jobs).

Using Locally Modified Dependencies with Bazel

Sometimes you might want to test a coordinated change between something like osiris and rabbitmq-server. In that case, you can clone osiris next to rabbitmq-server, and add the additional --override_repository rules_erlang~3.8.5~erlang_package~osiris=$PWD/../osiris flag to bazel commands in rabbitmq-server. RBE still works, and local changes are honored. Unfortunately at this point the rules_erlang version needs to be included in the flag, so you will need to keep this command up to date as we upgrade rules_erlang. To get the exact repository name to override, you can ls bazel-rabbitmq-server/external/ and use the folder/link name as displayed there.

Hot Code Reloading

You can use hot code reloading to quickly iterate on code changes - no need to restart the broker/cluster to check if things work as expected.

  1. Start a local RabbitMQ with -c dbg:
# single node
bazel run -c dbg broker

# cluster
bazel run -c dbg start-cluster
  1. Uncomment code_reload section in erlang_ls.config and set the hostname to your local name (as used by RabbitMQ nodes). For example:
code_reload:
  node: rabbit@mymachine

and use a text editor with LSP support (pretty much any editor). When you save a file, it will be reloaded by erlang_ls.

Starting with erlang_ls 0.50.0, you can configure multiple nodes for code reload, so you can develop against a cluster:

code_reload:
  node: [rabbit-1@mymachine, rabbit-2@mymachine, rabbit-3@mymachine]

and have your local changes reloaded on all nodes immediately.

Here's a demo of what that looks like (in this case using neovim and ToggleTerm but you can use it with other editors and ways of calling functions):

  1. Open a terminal with rabbitmq-diagnostics remote_shell
  2. Use (neo)vim autocmd to automatically run a command in the terminal on buffer save (since reloading on 3 nodes takes a moment, I use an artificial delay before running the function)
  3. When the function is modified to return a new value and the buffer gets saved, a function is called. In this case I specifically use rabbit_misc:append_rpc_all_nodes to show that all 3 nodes return the new value.

code_reload-cluster