A service to expose Prometheus metrics of your CI’s validation of Pull Requests (PRs), using GitHub webhooks.
webhook +------------+ events +-----------+ | github.com +----------->+/ | +------------+ | | proxy +--------+ | +<------->+ pulley | | nginx | +--------+ +------------+ | | | prometheus +------------+/metrics | +------------+ +-----------+
The best way to have service level objectives (SLOs) in place is to measure the same way the other party is observing. Sometimes your CI is providing you metrics, but they are not sufficient or do not match the reality. Pulley uses GitHub’s webhooks to produce metrics as observed by your developers. That way, you can optimize your CI’s pipelines in a way it matters.
Pulley assumes that a PR is mergeable when a required status check is successful. Thus, it lets you configure which status checks to monitor, and only treat the PR as validated (mergeable) when that status check is green. Just exactly like how developers (or even other automation) observe and act on PRs.
The basic metrics Pulley exposes are (all times are histograms):
-
The time it takes for the CI to send the first
pending
status for a PR -
The time it takes for the CI to send the
success
/failure
/error
for the required status check, from the time PR has been open -
The time it takes for a PR to be merged since it got open
-
The build duration on the CI, per build
-
How many PRs have been open/closed
-
How many times branches have been rebased
-
The total number of status checks received
-
The total number of
success
/failure
/error
status checks received, without a precedingpending
status check
Pulley can track any repository on GitHub, as long as that repository is configured to send webhook events to it.
Pulley is a service that should run listening on a public IP (as an endpoint is needed to be accessible by GitHub’s servers). It is completely configurable via environment variables.
Pulley understands the following environment variables:
Environment Variable | Description |
---|---|
PULLEY_HOST |
The hostname to which Pulley will bind. Defaults to |
PULLEY_PORT |
The port number to which Pulley will bind. Defaults to |
PULLEY_WEBHOOK_PATH |
URL path on which Pulley receives webhooks. Defaults to an empty string,
meaning that webhook events are handled on the root path (that is,
|
PULLEY_WEBHOOK_TOKEN |
A base64 encoded string representing a secret token, used to validate the events coming from GitHub. Defaults to an empty string. More details on https://developer.github.com/webhooks/securing/. |
PULLEY_METRICS_PATH |
URL path on which Pulley exposes Prometheus metrics. Defaults to |
PULLEY_PR_TIMING_STRATEGY |
Which strategy Pulley should use to time the PRs. That is, how to detect when
PR building started and ended. Currently, the only available one is |
PULLEY_STRATEGY_AGGREGATE_REPO_REGEX_<int> |
Set of regular expressions defining contexts to monitor for matching
repository names. Defaults to regex |
PULLEY_STRATEGY_AGGREGATE_CONTEXT_REGEX_<int> |
See above. |
PULLEY_TRACK_BUILD_TIMES |
If true, Pulley will track and export build times for each build (that is,
different context it sees). Accepted range of values is |
A PR timing strategy is how Pulley determines when the CI started building the PR, and when CI finished. It deals with received status checks coming from GitHub.
At the moment, Pulley supports only a single PR timing strategy, aggregate
.
This strategy assumes that each repository has one aggregate job defined, that finishes when all other jobs are done and emits a corresponding status check. This job determines if the PR is to be merged. Another use of this strategy is for the case when there is a single required status check defined in a repo.
To provide versatility in configuration, while keeping it simple and configurable via the environment variables, Pulley resorts to using the regular expressions (regexes) and the following scheme for the environment variables when encoding that information:
PULLEY_STRATEGY_AGGREGATE_REPO_REGEX_<int> = $repo_name_regex PULLEY_STRATEGY_AGGREGATE_CONTEXT_REGEX_<int> = $status_check_name_regex
That is, we mimic a prioritized list of regexes to match between the repository name and the status check name.
The list of regexes is processed in order from the smallest number towards the
highest, looking first for the _REPO
variant. If there is a match on the
repository name, but not on the status, the search will NOT continue.
For example, the following configuration:
PULLEY_STRATEGY_AGGREGATE_REPO_REGEX_0=-deployment$ PULLEY_STRATEGY_AGGREGATE_CONTEXT_REGEX_0=^terraform-validate PULLEY_STRATEGY_AGGREGATE_REPO_REGEX_1=^knl/pulley$ PULLEY_STRATEGY_AGGREGATE_CONTEXT_REGEX_1=build PULLEY_STRATEGY_AGGREGATE_REPO_REGEX_100=.* PULLEY_STRATEGY_AGGREGATE_CONTEXT_REGEX_100=:all-jobs$
would check for status checks whose names begin with terraform-validate
for
all repositories whose names end with -deployment
. Then it would check for
build
status check on repositories whose names begin with knl/pulley
.
Finally, it would default (due to .*
regex on the repository name) to status
check whose name ends with :all-jobs
for all other repositories.
Several things are important to note here:
-
Repository names are always in the form:
$OWNER/$REPOSITORY
, that is, they represent the repository’s full name. -
The first entry that matches the repository name is considered only.
-
When there are multiple matching status check names for a repository, only the first one that shows up will be considered.
Set the environment variables and run:
./pulley
The best is to place Pulley behind a reverse proxy (for example, Nginx) that terminates HTTPS traffic.
To build the code, simply run:
make build
Similarly, the tests are executed via:
make test
Prior to committing the code, you could run
make
to properly format and lint the code
Releases are managed with goreleaser.
To create a new release, push a tag (for example, a version 0.1.0):
git tag -a v0.1.0 -m "First release" git push origin v0.1.0
To build a test release, without publishing, run:
make test-release