description |
---|
The TPC-H Benchmark sample data |
As described in the TPC Benchmark™ H (TPC-H) specification:
“TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.”
TPC-H comes with various data set sizes to test different scaling factors. Spice.ai TPC-H dataset consists of the base scale factor (several million elements) and is available under tpch
schema . The columns and their schema available for each table can be viewed with the describe
command:
DESCRIBE tpch.customer;
DESCRIBE tpch.lineitem;
DESCRIBE tpch.part;
DESCRIBE tpch.partsupp;
DESCRIBE tpch.orders;
DESCRIBE tpch.nation;
DESCRIBE tpch.region
DESCRIBE tpch.supplier;
The components of TPC-H consist of eight separate and individual tables (the Base Tables). The relationships between columns in these tables are illustrated in the following ER diagram:
The TPC-H Schema. Source: TPC Benchmark H Standard Specification
TPC-H comes with pre-defined queries to ask different business questions. Query below reports the amount of business that was billed, shipped, and returned. More information about TPC-H and all the queries involved can be found in the official TPC Benchmark H Standard Specification.
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
tpch.lineitem
where
l_shipdate <= date '1998-12-01' - interval '90' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
The query lists totals for extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended price, and average discount. These aggregates are grouped by RETURNFLAG and LINESTATUS, and listed in ascending order of RETURNFLAG and LINESTATUS.