You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The UPDATE data contains some unnecessary data, we should drop them. See details in Specification(v3.2.0) pdf at Page 7 of 141
remove tables that are no longer part of the data maintenance refresh in TPC-DS v2.0 A-1
(s_zip_to_cmt) A-3 (s_customer) A-7 (s_item) A-10 (s_store) A-11 (s_call_center) A-12
(s_web_site) A-13 (s_warehouse) A-14 (s_web_page) A-15 (s_promotion) A-20 (s_catalog_page)
(FogBugz 2178)
The Specification doc doesn't explicitly provide the DELETE query strings, I come up with the following SQLs.
See Section 5.3.11
-- DF_CS-- for catalog_sales deletefrom catalog_sales where cs_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2);
-- for catalog_returns deletefrom catalog_returns where cr_order_number in (
select cs_order_number from catalog_sales where cs_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2)
);
-- DF_SS-- for store_sales deletefrom store_sales
where ss_sold_date_sk in
(select d_date_sk from date_dim where d_date between DATE1 and DATE2);
-- for store_sales deletefrom store_returns where sr_ticket_number in (
select ss_ticket_number from store_sales where ss_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2)
);
-- DF_WS-- for web_sales deletefrom web_sales where ws_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2);
-- for web_returns deletefrom web_returns where wr_order_number in (
select ws_order_number from web_sales where ws_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2)
);
-- DF_I-- for inventorydeletefrom inventory where inv_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2);
Data Maintenance is detailed explained in TPC-DS Specification at Section 5. To simplify, the following 3 steps are needed for it:
tests
folder inDSGen-software-code-3.2.0rc1
, to update table data.nds_power.py
to read SQL stream files) to read these new SQL files to update the existing table data.$TPCDS_HOME/tools/tpcds_source.sql
Step 2. is called "Data Maintenance". the SQL files are "lf_*.sql" and "dm_*.sql". Note, we need to make them Spark-compatible.
Step 1+2+3 is called Refresh Run.
The text was updated successfully, but these errors were encountered: