**SET OPERATORS**

A set operator compares complete rows between the results of the two input queries involved. Two queries involved cannot have ORDER BY clause. The two input queries must produce results with the same number of columns, and corresponding columns must have compatible data types. 

Set operator does not use equality operator; it uses distinct predicate. This predicate produces TRUE when comparing two NULLs. T-SQL does not allow you to specify the DISTINCT clause explicitly to reduce duplicates; it is implied when you don't specify ALL.

**SET vs. Multiset**

SET: Distinct members

Multiset: Contains all duplicates.

**The UNION operator**

It unifieds the results of two input quries. . T-SQL supports both UNION ALL and UNION (implicit DISTINCT)

Proposition: Return country, region, and city from the Employee table and Customer table. Preserve the duplicates.

In [1]:
use Northwinds2022TSQLV7;
--UNION ALL (Multiset)
SELECT e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity
FROM HumanResources.Employee AS e

UNION ALL

SELECT c.CustomerCountry, c.CustomerRegion, c.CustomerCity
FROM Sales.Customer AS c

--UNION (DISTINCT, Set)- duplicates removed.
SELECT e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity
FROM HumanResources.Employee AS e

UNION

SELECT c.CustomerCountry, c.CustomerRegion, c.CustomerCity
FROM Sales.Customer AS c

EmployeeCountry,EmployeeRegion,EmployeeCity
USA,WA,Seattle
USA,WA,Tacoma
USA,WA,Kirkland
USA,WA,Redmond
UK,,London
UK,,London
UK,,London
USA,WA,Seattle
UK,,London
Germany,,Berlin


EmployeeCountry,EmployeeRegion,EmployeeCity
Argentina,,Buenos Aires
Austria,,Graz
Austria,,Salzburg
Belgium,,Bruxelles
Belgium,,Charleroi
Brazil,RJ,Rio de Janeiro
Brazil,SP,Campinas
Brazil,SP,Resende
Brazil,SP,Sao Paulo
Canada,BC,Tsawassen


**The INTERSECT operator**

It is implied DISTINCT. Returns only distinct rows that appear in both input query results.

You can achieve the same results using the INNER JOIN or correlated subquery. However, you would have to handle special case of NULLs. Using the set operator is simpler to code up.

Proposition: Return the country, city, region that is present in both Employee and Customer tables. Remove duplicates.

In [2]:
use Northwinds2022TSQLV7;
SELECT e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity
FROM HumanResources.Employee AS e

INTERSECT

SELECT c.CustomerCountry, c.CustomerRegion, c.CustomerCity
FROM Sales.Customer AS c

EmployeeCountry,EmployeeRegion,EmployeeCity
UK,,London
USA,WA,Kirkland
USA,WA,Seattle


**INTERSECT ALL (keep all duplicates)**

 If row R occurs 4 times in Employee and 6 times in Customers, it will output the R 4 times, minimum (4,6). T-SQL does not support  it but you can do something similar with ROW\_NUMBER function.

Proposition: Return the country, city, region that is present in both Employee and Customer tables. Preserve the duplicates.

In [3]:
use Northwinds2022TSQLV7;
--Note that ORDER BY (SELECT 0) is used when you don't care about the order.
SELECT 
	ROW_NUMBER() OVER (PARTITION BY e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity ORDER BY (SELECT 0)) AS rownum
	,e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity

FROM HumanResources.Employee AS e

INTERSECT

SELECT 
	ROW_NUMBER() OVER (PARTITION BY c.CustomerCountry, c.CustomerRegion, c.CustomerCity ORDER BY (SELECT 0)) AS rownum
	,c.CustomerCountry, c.CustomerRegion, c.CustomerCity
FROM sales.Customer AS c;

--INTERSECT ALL would normally not return the row numbers. To hide this, create a table expression and query without the row number.
WITH INTERSECT_ALL
AS
(
	SELECT 
		ROW_NUMBER() OVER (PARTITION BY e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity ORDER BY (SELECT 0)) AS rownum
		,e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity

	FROM HumanResources.Employee AS e

	INTERSECT

	SELECT 
		ROW_NUMBER() OVER (PARTITION BY c.CustomerCountry, c.CustomerRegion, c.CustomerCity ORDER BY (SELECT 0)) AS rownum
		,c.CustomerCountry, c.CustomerRegion, c.CustomerCity
	FROM sales.Customer AS c
)

SELECT INTERSECT_ALL.EmployeeCountry, INTERSECT_ALL.EmployeeRegion, INTERSECT_ALL.EmployeeCity
FROM INTERSECT_ALL

rownum,EmployeeCountry,EmployeeRegion,EmployeeCity
1,UK,,London
1,USA,WA,Kirkland
1,USA,WA,Seattle
2,UK,,London
3,UK,,London
4,UK,,London


EmployeeCountry,EmployeeRegion,EmployeeCity
UK,,London
USA,WA,Kirkland
USA,WA,Seattle
UK,,London
UK,,London
UK,,London


**The EXCEPT operator**

It implements set differences. It returns rows that appear in the first input but not the second. A row is returned once in the output as long as it appears at least once in the first input multiset and zero times in the second. It is noncommutative; the order in which you specify the two input queries matter.

Here, too, you can achieve similar results using the OUTER JOIN and using NOT EXISTS. However, they may not be comparing each column by column. You also have to handle NULLs explicitly.

Proposition: Return country, region, and city that is only present in the Employee table.

In [4]:
use Northwinds2022TSQLV7;
--Notice the difference in the result set depending on which input you place first.
--Outputs 2 rows
SELECT e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity FROM HumanResources.Employee AS e

EXCEPT

SELECT c.CustomerCountry, c.CustomerRegion, c.CustomerCity FROM sales.Customer AS c;

-- outputs 66 rows. Reverse. 

SELECT c.CustomerCountry, c.CustomerRegion, c.CustomerCity FROM sales.Customer AS c

EXCEPT

SELECT e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity FROM HumanResources.Employee AS e;

EmployeeCountry,EmployeeRegion,EmployeeCity
USA,WA,Redmond
USA,WA,Tacoma


CustomerCountry,CustomerRegion,CustomerCity
Argentina,,Buenos Aires
Austria,,Graz
Austria,,Salzburg
Belgium,,Bruxelles
Belgium,,Charleroi
Brazil,RJ,Rio de Janeiro
Brazil,SP,Campinas
Brazil,SP,Resende
Brazil,SP,Sao Paulo
Canada,BC,Tsawassen


**EXCEPT ALL operator**

EXCEPT ALL returns only occurrences of a row from the first multiset that do not have a corresponding occurrence in the second. If row R appears 6 times in the first multiset and 4 times in the second, R will appear 6-4= 2 times in the query. 

T\_SQL does not provide EXCEPT ALL operator. You can achieve something similar using the ROW\_NUMBER.

In [5]:
use Northwinds2022TSQLV7;
WITH EXCEPT_ALL
AS
(
	SELECT 
		ROW_NUMBER() OVER (PARTITION BY e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity ORDER BY (SELECT 0)) AS rownum
		,e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity

	FROM HumanResources.Employee AS e

	EXCEPT

	SELECT 
		ROW_NUMBER() OVER (PARTITION BY c.CustomerCountry, c.CustomerRegion, c.CustomerCity ORDER BY (SELECT 0)) AS rownum
		,c.CustomerCountry, c.CustomerRegion, c.CustomerCity
	FROM sales.Customer AS c
)

SELECT EXCEPT_ALL.EmployeeCountry, EXCEPT_ALL.EmployeeRegion, EXCEPT_ALL.EmployeeCity 
FROM EXCEPT_ALL;

EmployeeCountry,EmployeeRegion,EmployeeCity
USA,WA,Redmond
USA,WA,Tacoma
USA,WA,Seattle


**Precedence**

INTERSECT operator precedes UNION and EXCEPT.

UNION and EXCEPT are evaluated in order of appearance.

Proposition: Retrun locations that are supplier locations, but not (locations that are both employee and customer locations)."

In [6]:
use Northwinds2022TSQLV7;
--Intersect happens before except. 

SELECT s.SupplierCountry, s.SupplierRegion, s.SupplierCity FROM Production.Supplier AS s
EXCEPT
SELECT e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity  FROM HumanResources.Employee AS e
INTERSECT
SELECT c.CustomerCountry, c.CustomerRegion, c.CustomerCity  FROM sales.Customer AS c;

-- To control the order of operation, use parenthesis, which has the highest precedence.
(SELECT s.SupplierCountry, s.SupplierRegion, s.SupplierCity FROM Production.Supplier AS s
EXCEPT
SELECT e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity  FROM HumanResources.Employee AS e)
INTERSECT
SELECT c.CustomerCountry, c.CustomerRegion, c.CustomerCity  FROM sales.Customer AS c 


SupplierCountry,SupplierRegion,SupplierCity
Australia,NSW,Sydney
Australia,Victoria,Melbourne
Brazil,,Sao Paulo
Canada,Québec,Montréal
Canada,Québec,Ste-Hyacinthe
Denmark,,Lyngby
Finland,,Lappeenranta
France,,Annecy
France,,Montceau
France,,Paris


SupplierCountry,SupplierRegion,SupplierCity
Canada,Québec,Montréal
France,,Paris
Germany,,Berlin


**Circumventing unsupported logical phases**

logical phases: ORDER BY, GROUP BY, HAVING, WHERE

Only ORDER BY is allowed to be used on the result of the operator. WHERE, GROUP BY, and HAVING are not allowed. To get around that, use table expressions. Apply the logical query processing phase in the outer query.

Proposition: Return the count of locations by the country.

In [7]:
use Northwinds2022TSQLV7;
SELECT u.EmployeeCountry, COUNT(*) AS numlocations
FROM (SELECT e.EmployeeCountry, e.EmployeeRegion, e.EmployeeCity FROM HumanResources.Employee AS e
	   UNION
       SELECT c.CustomerCountry, c.CustomerRegion, c.CustomerCity FROM sales.Customer AS c) AS u
GROUP BY u.EmployeeCountry;

EmployeeCountry,numlocations
Argentina,1
Austria,2
Belgium,2
Brazil,4
Canada,3
Denmark,2
Finland,2
France,9
Germany,11
Ireland,1


**Circumventing unsupported logical phases**

Using TOP and OFFSET-FETCH with set operators. 

ORDER BY is not allowed in the inner query unless it is used with TOP or OFFSET-FETCH. In order to use this with the set operators, use the table expression again.

Proposition: Return the last 2 orders placed by employee 3 and 5.

In [8]:
use Northwinds2022TSQLV7;
SELECT D1.EmployeeId, D1.OrderId, D1.OrderDate 
FROM (SELECT TOP (2) employeeId, o.OrderId, o.OrderDate
	  FROM sales.[order] AS o
	  WHERE o.EmployeeId = 3
	  ORDER BY o.orderdate DESC, o.OrderId DESC) AS D1

UNION ALL

SELECT D2.EmployeeId, D2.OrderId, D2.OrderDate 
FROM (SELECT TOP (2) employeeId, o.OrderId, o.OrderDate
	  FROM sales.[order] AS o
	  WHERE o.EmployeeId = 5
	  ORDER BY o.orderdate DESC, o.OrderId DESC) AS D2;


EmployeeId,OrderId,OrderDate
3,11063,2016-04-30
3,11057,2016-04-29
5,11043,2016-04-22
5,10954,2016-03-17


Exercise 1

  

Explain the difference between the UNION ALL and UNION operators. In what cases are the two equivalent? When they are equivalent, which one should you use?

UNION removes the duplicates from the result set while UNION ALL preserves them. The result set will be identical if the duplicates do not exist. UNION is the preferred operator to use.

**Exercise 2**

Write a query that generates a virtual auxiliary table of 10 numbers in the range 1 through 10 without using a looping construct. You do not need to guarantee any order of the rows in the ouput of your solution.

In [9]:
use Northwinds2022TSQLV7;
SELECT 1 AS n 
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9
UNION ALL SELECT 10

n
1
2
3
4
5
6
7
8
9
10


**Exercise 3**

Write a query that returns customer and employee pairs  that had order activity in January 2016 but not in February 2016

In [10]:
use Northwinds2022TSQLV7;
SELECT  o.CustomerId, o.EmployeeId FROM sales.[Order] AS [o] WHERE o.OrderDate >='20160101' AND o.OrderDate < '20160201'

EXCEPT

SELECT  o.CustomerId, o.EmployeeId  FROM sales.[Order] AS [o] WHERE o.OrderDate >='20160201' AND o.OrderDate < '20160301'

CustomerId,EmployeeId
1,1
3,3
5,8
5,9
6,9
7,6
9,1
12,2
16,7
17,1


**Exercise 4**

Write a query that returns customer and employee pairs that had order activity in both January 2016 and February 2016

In [11]:
use Northwinds2022TSQLV7;
SELECT  o.CustomerId, o.EmployeeId FROM sales.[Order] AS [o] WHERE o.OrderDate >='20160101' AND o.OrderDate < '20160201'

INTERSECT

SELECT  o.CustomerId, o.EmployeeId  FROM sales.[Order] AS [o] WHERE o.OrderDate >='20160201' AND o.OrderDate < '20160301'

CustomerId,EmployeeId
20,3
39,9
46,5
67,1
71,4


**Exercise 5**

Write a query that returns customer and employee pairs that had order activity in both January 2016 and February 2016 but not in 2015. Might not need the parenthesis for INTERSECT precedes EXCEPT anyway.

In [12]:
(SELECT  o.CustomerId, o.EmployeeId FROM sales.[Order] AS [o] WHERE o.OrderDate >='20160101' AND o.OrderDate < '20160201'

INTERSECT

SELECT  o.CustomerId, o.EmployeeId  FROM sales.[Order] AS [o] WHERE o.OrderDate >='20160201' AND o.OrderDate < '20160301')

EXCEPT

SELECT  o.CustomerId, o.EmployeeId  FROM sales.[Order] AS [o] WHERE o.OrderDate >='20150101' AND o.OrderDate < '20160101'

CustomerId,EmployeeId
67,1
46,5


**Exercise 6** 

You are asked to add logic to the query such that it would guarantee that the rows from Employees would be returned in the output before the rows from Suppliers, and within each segment, the rows should be sorted by country, region, city

In [13]:
use Northwinds2022TSQLV7;
SELECT D.EmployeeCountry, D.EmployeeRegion, D.EmployeeCity
FROM 
	(   SELECT 1 AS sortcol, EmployeeCountry, EmployeeRegion, EmployeeCity
		FROM HumanResources.Employee

		UNION ALL

		SELECT 2 AS sortcol, SupplierCountry, SupplierRegion, SupplierCity
		FROM Production.Supplier) AS D
ORDER BY sortcol, D.EmployeeCountry, D.EmployeeRegion, D.EmployeeCity

EmployeeCountry,EmployeeRegion,EmployeeCity
UK,,London
UK,,London
UK,,London
UK,,London
USA,WA,Kirkland
USA,WA,Redmond
USA,WA,Seattle
USA,WA,Seattle
USA,WA,Tacoma
Australia,NSW,Sydney
