Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 124 additions & 4 deletions benchmark/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,128 @@
# BigQuery Benchmark
This directory contains benchmarks for BigQuery client.
This directory contains benchmark scripts for BigQuery client. It is created primarily for project
maintainers to measure library performance.

## Usage
`python benchmark.py queries.json`
`python benchmark.py`

BigQuery service caches requests so the benchmark should be run
at least twice, disregarding the first result.

### Flags
Run `python benchmark.py -h` for detailed information on available flags.

`--reruns` can be used to override the default number of times a query is rerun. Must be a positive
integer. Default value is 3.

`--projectid` can be used to run benchmarks in a different project. If unset, the GOOGLE_CLOUD_PROJECT
environment variable is used.

`--queryfile` can be used to override the default file which contains queries to be instrumented.

`--table` can be used to specify a table to which benchmarking results should be streamed. The format
for this string is in BigQuery standard SQL notation without escapes, e.g. `projectid.datasetid.tableid`

`--create_table` can be used to have the benchmarking tool create the destination table prior to streaming.

`--tag` allows arbitrary key:value pairs to be set. This flag can be specified multiple times.

When `--create_table` flag is set, must also specify the name of the new table using `--table`.

### Example invocations

Setting all the flags
```
python benchmark.py \
--reruns 5 \
--projectid test_project_id \
--table logging_project_id.querybenchmarks.measurements \
--create_table \
--tag source:myhostname \
--tag somekeywithnovalue \
--tag experiment:special_environment_thing
```

Or, a more realistic invocation using shell substitions:
```
python benchmark.py \
--reruns 5 \
--table $BENCHMARK_TABLE \
--tag origin:$(hostname) \
--tag branch:$(git branch --show-current) \
--tag latestcommit:$(git log --pretty=format:'%H' -n 1)
```

## Stream Results To A BigQuery Table

When streaming benchmarking results to a BigQuery table, the table schema is as follows:
```
[
{
"name": "groupname",
"type": "STRING"
},
{
"name": "name",
"type": "STRING"
},
{
"name": "tags",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "key",
"type": "STRING"
},
{
"name": "value",
"type": "STRING"
}
]
},
{
"name": "SQL",
"type": "STRING"
},
{
"name": "runs",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "errorstring",
"type": "STRING"
},
{
"name": "start_time",
"type": "TIMESTAMP"
},
{
"name": "query_end_time",
"type": "TIMESTAMP"
},
{
"name": "first_row_returned_time",
"type": "TIMESTAMP"
},
{
"name": "all_rows_returned_time",
"type": "TIMESTAMP"
},
{
"name": "total_rows",
"type": "INTEGER"
}
]
},
{
"name": "event_time",
"type": "TIMESTAMP"
}
]
```

The table schema is the same as the [benchmark in go](https://github.com/googleapis/google-cloud-go/tree/main/bigquery/benchmarks),
so results from both languages can be streamed to the same table.

## BigQuery Benchmarks In Other Languages
* Go: https://github.com/googleapis/google-cloud-go/tree/main/bigquery/benchmarks
* JAVA: https://github.com/googleapis/java-bigquery/tree/main/benchmark
Loading