Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNION ALL is not supported #122

Closed
dongyoungy opened this issue Apr 2, 2018 · 2 comments
Closed

UNION ALL is not supported #122

dongyoungy opened this issue Apr 2, 2018 · 2 comments
Assignees
Labels

Comments

@dongyoungy
Copy link
Contributor

Currently, Verdict parses queries containing UNION ALL without errors. However, the parse tree at the end only contains the last SELECT statement of the query.

For example, parsing the following SQL query:

select  
 'web' as channel
 ,web.item
 ,web.return_ratio
 ,web.return_rank
 ,web.currency_rank
 from (
 	select 
 	 item
 	,return_ratio
 	,currency_ratio
 	,rank() over (order by return_ratio) as return_rank
 	,rank() over (order by currency_ratio) as currency_rank
 	from
 	(	select ws.ws_item_sk as item
 		,(cast(sum(coalesce(wr.wr_return_quantity,0)) as decimal(15,4))/
 		cast(sum(coalesce(ws.ws_quantity,0)) as decimal(15,4) )) as return_ratio
 		,(cast(sum(coalesce(wr.wr_return_amt,0)) as decimal(15,4))/
 		cast(sum(coalesce(ws.ws_net_paid,0)) as decimal(15,4) )) as currency_ratio
 		from 
 		 web_sales ws left outer join web_returns wr 
 			on (ws.ws_order_number = wr.wr_order_number and 
 			ws.ws_item_sk = wr.wr_item_sk)
                 ,date_dim
 		where 
 			wr.wr_return_amt > 10000 
 			and ws.ws_net_profit > 1
                         and ws.ws_net_paid > 0
                         and ws.ws_quantity > 0
                         and ws.ws_sold_date_sk = date_dim.d_date_sk
                         and d_year = 2000
                         and d_moy = 12
 		group by ws.ws_item_sk
 	) in_web
 ) web
 where 
 (
 web.return_rank <= 10
 or
 web.currency_rank <= 10
 )
 union all
 select 
 'catalog' as channel
 ,catalog.item
 ,catalog.return_ratio
 ,catalog.return_rank
 ,catalog.currency_rank
 from (
 	select 
 	 item
 	,return_ratio
 	,currency_ratio
 	,rank() over (order by return_ratio) as return_rank
 	,rank() over (order by currency_ratio) as currency_rank
 	from
 	(	select 
 		cs.cs_item_sk as item
 		,(cast(sum(coalesce(cr.cr_return_quantity,0)) as decimal(15,4))/
 		cast(sum(coalesce(cs.cs_quantity,0)) as decimal(15,4) )) as return_ratio
 		,(cast(sum(coalesce(cr.cr_return_amount,0)) as decimal(15,4))/
 		cast(sum(coalesce(cs.cs_net_paid,0)) as decimal(15,4) )) as currency_ratio
 		from 
 		catalog_sales cs left outer join catalog_returns cr
 			on (cs.cs_order_number = cr.cr_order_number and 
 			cs.cs_item_sk = cr.cr_item_sk)
                ,date_dim
 		where 
 			cr.cr_return_amount > 10000 
 			and cs.cs_net_profit > 1
                         and cs.cs_net_paid > 0
                         and cs.cs_quantity > 0
                         and cs_sold_date_sk = d_date_sk
                         and d_year = 2000
                         and d_moy = 12
                 group by cs.cs_item_sk
 	) in_cat
 ) catalog
 where 
 (
 catalog.return_rank <= 10
 or
 catalog.currency_rank <=10
 )
 union all
 select 
 'store' as channel
 ,store.item
 ,store.return_ratio
 ,store.return_rank
 ,store.currency_rank
 from (
 	select 
 	 item
 	,return_ratio
 	,currency_ratio
 	,rank() over (order by return_ratio) as return_rank
 	,rank() over (order by currency_ratio) as currency_rank
 	from
 	(	select sts.ss_item_sk as item
 		,(cast(sum(coalesce(sr.sr_return_quantity,0)) as decimal(15,4))/cast(sum(coalesce(sts.ss_quantity,0)) as decimal(15,4) )) as return_ratio
 		,(cast(sum(coalesce(sr.sr_return_amt,0)) as decimal(15,4))/cast(sum(coalesce(sts.ss_net_paid,0)) as decimal(15,4) )) as currency_ratio
 		from 
 		store_sales sts left outer join store_returns sr
 			on (sts.ss_ticket_number = sr.sr_ticket_number and sts.ss_item_sk = sr.sr_item_sk)
                ,date_dim
 		where 
 			sr.sr_return_amt > 10000 
 			and sts.ss_net_profit > 1
                         and sts.ss_net_paid > 0 
                         and sts.ss_quantity > 0
                         and ss_sold_date_sk = d_date_sk
                         and d_year = 2000
                         and d_moy = 12
 		group by sts.ss_item_sk
 	) in_store
 ) store
 where  (
 store.return_rank <= 10
 or 
 store.currency_rank <= 10
 )
 order by 1,4,5
 limit 100;

results in the following parse tree:

LimitedRelation(vt37) [100]
  OrderedRelation(vt37) [1 ASC, 4 ASC, 5 ASC]
    ProjectedRelation(vt37) ['store' AS `channel`, store.`item` AS `item`, store.`return_ratio` AS `return_ratio`, store.`return_rank` AS `return_rank`, store.`currency_rank` AS `currency_rank`]
      FilteredRelation(store) [(store.`return_rank` <= 10) OR (store.`currency_rank` <= 10)]
        ProjectedRelation(store) [in_store.`item` AS `item`, in_store.`return_ratio` AS `return_ratio`, in_store.`currency_ratio` AS `currency_ratio`, rank() OVER (order by `return_ratio`) AS `return_rank`, rank() OVER (order by `currency_ratio`) AS `currency_rank`]
          AggregatedRelation(in_store) [sts.`ss_item_sk` AS `item`, (cast(sum(coalesce(sr.`sr_return_quantity`,0)) as decimal(15,4)) / cast(sum(coalesce(sts.`ss_quantity`,0)) as decimal(15,4))) AS `return_ratio`, (cast(sum(coalesce(sr.`sr_return_amt`,0)) as decimal(15,4)) / cast(sum(coalesce(sts.`ss_net_paid`,0)) as decimal(15,4))) AS `currency_ratio`]
            GroupedRelation(sts-sr-vt30) [sts.`ss_item_sk`]
              FilteredRelation(sts-sr-vt30) [(((((sr.`sr_return_amt` > 10000) AND (sts.`ss_net_profit` > 1)) AND (sts.`ss_net_paid` > 0)) AND (sts.`ss_quantity` > 0)) AND (vt30.`d_year` = 2000)) AND (vt30.`d_moy` = 12)]
                JoinedRelation(sts-sr-vt30) [(sts.`ss_sold_date_sk`,vt30.`d_date_sk`)]
                  JoinedRelation(sts-sr) [(sts.`ss_ticket_number`,sr.`sr_ticket_number`), (sts.`ss_item_sk`,sr.`sr_item_sk`)]
                    SingleRelation(tpcds_bin_partitioned_orc_2.store_sales, sts)
                    SingleRelation(tpcds_bin_partitioned_orc_2.store_returns, sr)
                  SingleRelation(tpcds_bin_partitioned_orc_2.date_dim, vt30)

The result parse tree is the last SELECT statement, missing all other previous SELECT statements linked with UNION ALL.

@dongyoungy dongyoungy added the bug label Apr 2, 2018
@dongyoungy dongyoungy self-assigned this Apr 2, 2018
@dongyoungy
Copy link
Contributor Author

dongyoungy commented Apr 2, 2018

The fix to this problem should aim to support all of UNION (ALL), EXCEPT, and INTERSECT clauses.

@dongyoungy
Copy link
Contributor Author

Pull request #125 fixes this issue. I will close this issue once the patch gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant