Skip to content
85 changes: 57 additions & 28 deletions docs/advanced-topics/running_an_experiment.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,97 +2,126 @@
layout: default
title: Running an experiment
parent: Advanced topics
nav_order: 1
nav_order: 3
permalink: /advanced-topics/running-an-experiment/
---

# Running an experiment

This page explains how to run an experiment. It requires using Google Cloud.
Because most "users" of FuzzBench will be using it as a service and not running
it themselves, we consider this an advanced topic.
**NOTE**: Most users of FuzzBench should simply [add a fuzzer]({{ site.baseurl
}}/getting-started/adding-a-new-fuzzer/) and use the FuzzBench service. This
document isn't needed for using the FuzzBench service. This document explains
how to run an [experiment]({{ site.baseurl }}/reference/glossary/#Experiment) on
your own. We don't recommend running experiments on your own for most users.
Validating results from the FuzzBench service is a good reason to run an
experiment on your own.

This document assumes a certain level of knowledge about
Google Cloud and FuzzBench. If you haven't already, please follow the
[guide on setting up a Google Cloud Project]({{ site.baseurl}}/advanced-topics/setting-up-a-google-cloud-project/)
to run your own experiments. This document assumes you already have set up a
Google Cloud Project, since running an experiment requires Google Cloud.

- TOC
{:toc}

Experiments are started by the `run_experiment.py` script. This will create a
dispatcher instance on Google Compute Engine which:
1. Builds desired fuzzer-benchmark combinations.
1. Starts instances to run fuzzing trials with the fuzzer-benchmark
builds and stops them when they are done.
1. Measures the coverage from these trials.
1. Generates reports based on these measurements.
This page will walk you through on how to use `run_experiment.py`.
Experiments are started by the `run_experiment.py` script. The script will
create a dispatcher instance on Google Compute Engine which runs the experiment,
including:
1. Building desired fuzzer-benchmark combinations.
1. Starting instances to run fuzzing trials with the fuzzer-benchmark
builds and stopping them when they are done.
1. Measuring the coverage from these trials.
1. Generating reports based on these measurements.

This page will walkthrough on how to use `run_experiment.py`.
The rest of this document will assume all commands are run from the root of
FuzzBench.

# run_experiment.py

This page assumes a certain level of knowledge about Google Cloud and FuzzBench.
If you haven't already, please check out the guide on setting up a Google Cloud
Project to run FuzzBench.
{% comment %}
TODO(metzman): Write this doc.
{% endcomment %}

## Experiment configuration file

You need to create an experiment configuration yaml file.
This will contain the configuration parameters for experiments that do not
This file contains the configuration parameters for experiments that do not
change very often.
Below is an example configuation file with explanations of each required
Below is an example configuration file with explanations of each required
parameter.

```yaml
# The number of trials of a fuzzer-benchmark pair to do.
trials: 5

# The amount of time in seconds that each trial is run for.
# 1 day = 24 * 60 * 60 = 86400
max_total_time: 86400

# The name of your Google Cloud project.
cloud_project: fuzzbench
cloud_project: $PROJECT_NAME

# The Google Compute Engine zone to run the experiment in.
cloud_compute_zone: us-central1-a
cloud_compute_zone: $PROJECT_REGION

# The Google Cloud Storage bucket that will store most of the experiment data.
cloud_experiment_bucket: gs://fuzzbench-data
cloud_experiment_bucket: gs://$DATA_BUCKET_NAME

# The bucket where HTML reports and summary data will be stored.
cloud_web_bucket: gs://fuzzbench-reports
cloud_web_bucket: gs://$REPORT_BUCKET_NAME

# The connection to use to connect to the Google Cloud SQL instance.
cloud_sql_instance_connection_name: "fuzzbench:us-central1:postgres-experiment-db=tcp:5432"
cloud_sql_instance_connection_name: "$PROJECT_NAME:$PROJECT_REGION:$POSTGRES_INSTANCE=tcp:5432"
```

**NOTE:** The values `$PROJECT_NAME`, `$PROJECT_REGION` `$DATA_BUCKET_NAME`,
`$REPORT_BUCKET_NAME` `$POSTGRES_INSTANCE` refer to the values of those
environment variables that were set in the [guide on setting up a Google Cloud
Project]({{ site.baseurl }}/advanced-topics/setting-up-a-google-cloud-project/).
For example if `$PROJECT_NAME` is `my-fuzzbench-project`, use
`my-fuzzbench-project` and not `$PROJECT_NAME`.

## Setting the database password

Find the password for the PostgreSQL instance you are using in your
experiment config.
Set it using the environment variable `POSTGRES_PASSWORD` like so:

```bash
export POSTGRESS_PASSWORD="my-super-secret-password"
export POSTGRES_PASSWORD="my-super-secret-password"
```

## Benchmarks

Pick the benchmarks you want to use from the `benchmarks/` directory.

For example: `freetype2-2017` and `bloaty_fuzz_target`.

## Fuzzers

Pick the fuzzers you want to use from the `fuzzers/` directory.
For example: `libfuzzer` and `afl`.

## Executing run_experiment.py

Now that everything is ready, execute `run_experiment.py`:

```bash
PYTHONPATH=. python3 experiment/run_experiment.py \
--experiment-config experiment-config.yaml \
--benchmarks freetype2-2017 bloaty_fuzz_target \
--experiment-name experiment-name \
--experiment-name $EXPERIMENT_NAME \
--fuzzers afl libfuzzer
```

where `$EXPERIMENT_NAME` is the name you want to give the experiment.

## Viewing reports

You should eventually be able to see reports from your experiment, that are
update at some interval throughout the experiment. However, you may have to wait
a while until they first appear since a lot must happen before there is data to
generate report. Once they are available, you should be able to view them at:
`https://storage.googleapis.com/$REPORT_BUCKET_NAME/$EXPERIMENT_NAME/index.html`

# Advanced usage

## Fuzzer configuration files
Expand Down
192 changes: 192 additions & 0 deletions docs/advanced-topics/setting_up_a_google_cloud_project.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
---
layout: default
title: Setting up a Google Cloud Project
parent: Advanced topics
nav_order: 2
permalink: /advanced-topics/setting-up-a-google-cloud-project/
---

# Setting up a Google Cloud Project

**NOTE**: Most users of FuzzBench should simply [add a fuzzer]({{ site.baseurl
}}/getting-started/adding-a-new-fuzzer/) and use the FuzzBench service. This
document isn't needed for using the FuzzBench service. This document explains
how to set up a Google Cloud project for running an [experiment]({{ site.baseurl
}}/reference/glossary/#Experiment) for the first time. We don't recommend
running experiments on your own for most users. Validating results from the
FuzzBench service is a good reason to run an experiment on your own.

Currently, FuzzBench requires Google Cloud to run experiments (though this may
change, see
[FAQ]({{ site.baseurl }}/faq/#how-can-i-reproduce-the-results-or-run-fuzzbench-myself)).

The rest of this document will assume all commands are run from the root of
FuzzBench.

## Create the Project

* [Create a new Google Cloud Project](https://console.cloud.google.com/projectcreate).

* Enable billing when prompted on the Google Cloud website.

* Set `$PROJECT_NAME` in the environment:

```bash
export PROJECT_NAME=<your-project-name>
```

For the rest of this document, replace `$PROJECT_NAME` with the name of the
project you created.

* [Install Google Cloud SDK](https://console.cloud.google.com/sdk/install).

* Set your default project using gcloud:

```bash
gcloud config set project $PROJECT_NAME
```

## Set up the database

* [Enable the Compute Engine API](https://console.cloud.google.com/apis/library/compute.googleapis.com?q=compute%20engine)

* Create a PostgreSQL (we use PostgreSQL 11) instance using
[Google Cloud SQL](https://console.cloud.google.com/sql/create-instance-postgres).
This will take a few minutes.
We recommend using "us-central1" as the region and zone "a" as the zone.
Certain links provided in this document assume "us-central1".
Note that the region you choose should be the region you use later for running
experiments.

* For the rest of this document, we will use `$PROJECT_REGION`,
`$POSTGRES_INSTANCE`, and `$POSTGRES_PASSWORD` to refer to the region of the
PostgreSQL instance you created, its name, and its password. Set them in your
environment:

```bash
export PROJECT_REGION=<your-postgres-region>
export POSTGRES_INSTANCE=<your-postgres-instance-name>
export POSTGRES_PASSWORD=<your-postgres-password>
```

* [Download and install cloud_sql_proxy](https://cloud.google.com/sql/docs/postgres/sql-proxy)

```bash
wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy
```

* Connect to your postgres instance using cloud_sql_proxy:

```bash
./cloud_sql_proxy -instances=$PROJECT_NAME:$PROJECT_REGION:$POSTGRES_INSTANCE=tcp:5432
```

* (optional, but recommended) Connect to your instance to ensure you
have all of the details right:

```bash
psql "host=127.0.0.1 sslmode=disable user=postgres"
```

Use `$POSTGRES_PASSWORD` when prompted.

* Initialize the postgres database:

```bash
PYTHONPATH=. alembic upgrade head
```

If this command fails, double check you set `POSTGRES_PASSWORD` correctly.
At this point you can kill the `cloud_sql_proxy` process.

## Google Cloud Storage buckets

* Set up Google Cloud Storage Buckets by running the commands below:

```bash
# Bucket for storing experiment artifacts such as corpora, coverage binaries,
# crashes etc.
gsutil mb gs://$DATA_BUCKET_NAME

# Bucket for storing HTML reports.
gsutil mb gs://$REPORT_BUCKET_NAME
```

You can pick any (globally unique) names you'd like for `$DATA_BUCKET_NAME` and
`$REPORT_BUCKET_NAME`.

* Make the report bucket public so it can be viewed from your browser:

```bash
gsutil iam ch allUsers:objectViewer gs://$REPORT_BUCKET_NAME
```

## Dispatcher image and container registry setup

* Build the dispatcher image:

```bash
docker build -f docker/dispatcher-image/Dockerfile \
-t gcr.io/$PROJECT_NAME/dispatcher-image docker/dispatcher-image/
```

FuzzBench uses an instance running this image to manage most of the experiment.

* [Enable Google Container Registry API](https://console.console.cloud.google.com/apis/api/containerregistry.googleapis.com/overview)
to use the container registry.

* Push `dispatcher-image` to the docker registry:

```bash
docker push gcr.io/$PROJECT_NAME/dispatcher-image
```

* [Switch the registry's visibility to public](https://console.cloud.google.com/gcr/settings).

## Enable required APIs

* [Enable the IAM API](https://console.cloud.google.com/apis/api/iam.googleapis.com/landing)
so that FuzzBench can authenticate to Google Cloud APIs and services.

* [Enable the error reporting API](https://console.cloud.google.com/apis/library/clouderrorreporting.googleapis.com)
so that FuzzBench can report errors to the
[Google Cloud error reporting dashboard](https://console.cloud.google.com/errors)

* [Enable Cloud Build API](https://console.cloud.google.com/apis/library/cloudbuild.googleapis.com)
so that FuzzBench can build docker images using Google Cloud Build, a platform
optimized for doing so.

* [Enable Cloud SQL Admin API](https://console.cloud.google.com/apis/library/sqladmin.googleapis.com)
so that FuzzBench can connect to the database.

## Configure networking

* Go to the networking page for the network you want to run your experiment in.
[This](https://cloud.console.google.com/networking/subnetworks/details/us-central1/default)
is the networking page for the default network in "us-central1". It is best if
you use `$POSTGRES_REGION` for this.

* Click the edit icon. Turn "Private Google access" to "On". Press "Save".

* This allows the trial runner instances to use Google Cloud APIs since they do
not have external IP addresses.

## Request CPU quota increase

* FuzzBench uses a 96 core Google Compute Engine instance for measuring trials
and single core instances for each trial in your experiment.

* Go to the quotas page for the region you will use for experiments.
[This](https://console.cloud.google.com/iam-admin/quotas?location=us-central1)
is the quotas page for the "us-central1" region.

* Select the "Compute Engine API" "CPUs" quota, fill out contact details and
request a quota increase. We recommend requesting a quota limit of "1000" as
will probably be approved and is large enough for running experiments in a
reasonable amount of time.

* Wait until you receive an email confirming the quota increase.

## Run an experiment

* Follow the [guide on running an experiment]({{ site.baseurl }}/advanced-topics/running-an-experiment/)
2 changes: 1 addition & 1 deletion docs/advanced-topics/statistical_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Statistical Analysis
parent: Advanced topics
nav_order: 2
nav_order: 1
permalink: /getting-started/statistical-analysis/
---

Expand Down
16 changes: 16 additions & 0 deletions docs/reference/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,22 @@ or a custom one where you explicitly define the steps to checkout code and build
the fuzz target
([example integration](https://github.com/google/fuzzbench/blob/master/benchmarks/vorbis-2017-12-11/build.sh)).

### Trial

A single fuzzing run on a particular benchmark. For example, we might compare
AFL and honggfuzz by running 20 trials of each fuzzer on the libxml2-v2.9.2
benchmark.

### Experiment

A group of [trials](#trial) that are run together to compare fuzzer performance.
This usually includes trials from multiple benchmarks and multiple fuzzers. For
example, to compare libFuzzer, AFL and honggfuzz, we might run an experiment
where each of them fuzz every benchmark. Experiments use the same number of
trials for each fuzzer-benchmark pair and a specific amount of time for each
trial (typically, 24 hours) so that results are comparable. FuzzBench generates
reports for experiments while they are running and after they complete.

[fuzzing]: https://en.wikipedia.org/wiki/Fuzzing
[fuzz target]: https://github.com/google/fuzzing/blob/master/docs/glossary.md#fuzz-target