# BigQuery Concurrency Tester

This notebook runs a concurrency test on BigQuery. It executes a set of queries concurrently to simulate a real-world workload and then analyzes the performance and cost.

In [None]:
!pip install google-cloud-bigquery PyYAML numpy

## Parameters

The following cell defines the parameters for the test. You can change these values to match your environment.

In [None]:
project_id = "your-project-id"
location = "US"
num_threads = 5

## Queries

The following cells define the queries to be executed.

In [None]:
query1 = "SELECT COUNT(*) FROM `bigquery-public-data.usa_names.usa_1910_2013`;"

In [None]:
query2 = """SELECT name, SUM(number) as total
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE gender = 'F'
GROUP BY name
ORDER BY total DESC
LIMIT 10;"""

In [None]:
query3 = """SELECT
    t1.name,
    t1.total,
    t2.year,
    t2.number AS number_in_year
FROM (
    SELECT name, SUM(number) as total
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE gender = 'M'
    GROUP BY name
    ORDER BY total DESC
    LIMIT 5
) AS t1
JOIN `bigquery-public-data.usa_names.usa_1910_2013` AS t2
ON t1.name = t2.name
WHERE t2.year > 2000
ORDER BY t2.year, t2.number DESC;"""

In [None]:
queries = [query1, query2, query3]

## Concurrency Test Functions

The following cell contains the Python functions to run the concurrency test. The cell is hidden by default.

In [None]:
from google.cloud import bigquery
import threading
import time

def run_query(query, client):
    try:
        query_job = client.query(query)
        results = query_job.result()
        print(f"Query finished successfully. Job ID: {query_job.job_id}")
    except Exception as e:
        print(f"Query failed: {e}")

def run_concurrency_test(queries, num_threads, client):
    threads = []
    for i in range(num_threads):
        for query in queries:
            thread = threading.Thread(target=run_query, args=(query, client))
            threads.append(thread)
            thread.start()

    for thread in threads:
        thread.join()

## Run Concurrency Test

This cell runs the concurrency test.

In [None]:
client = bigquery.Client(project=project_id, location=location)
run_concurrency_test(queries, num_threads, client)

## Analyze Results

This cell uses BigQuery magic to query the Information Schema and analyze the results of the concurrency test. It calculates the total slot milliseconds, total bytes processed, and estimates the cost of the queries in the different BigQuery editions.

In [None]:
%%bigquery

WITH
  jobs AS (
  SELECT
    job_id,
    total_slot_ms,
    total_bytes_processed
  FROM
    `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
  WHERE
    creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
    AND job_type = 'QUERY'
    AND statement_type != 'SCRIPT')
SELECT
  SUM(total_slot_ms) AS total_slot_ms,
  SUM(total_bytes_processed) AS total_bytes_processed,
  -- On-demand pricing (US multi-region): $6.25 per TB
  (SUM(total_bytes_processed) / 1024 / 1024 / 1024 / 1024) * 6.25 AS on_demand_cost,
  -- Standard edition pricing (US multi-region): $0.04 per slot hour
  (SUM(total_slot_ms) / 1000 / 60 / 60) * 0.04 AS standard_edition_cost,
  -- Enterprise edition pricing (US multi-region): $0.06 per slot hour
  (SUM(total_slot_ms) / 1000 / 60 / 60) * 0.06 AS enterprise_edition_cost
FROM
  jobs;