**Definitions**

<u>self-contained subquery</u>: no dependency on tables from the outer query

<u>correlated subquery</u>: dependence on tables from the outer query

**Self-Contained Scalar Subquery**

Uses equality operator

Proposition: Return maximum orderID

In [None]:
--Let's first try to do obtain the results using a two-step process. Then we will see that this can be done in one step using a subquery.
use Northwinds2022TSQLV7
DECLARE @maxid AS INT = (SELECT MAX (OrderId) FROM Sales.[Order]);

SELECT orderId, OrderDate, EmployeeId, CustomerId
From Sales.[Order]
WHERE OrderId = @maxid;

--Same thing, using a subquery. Note that Alias was used in the subquery to refer to the same time maybe to differentiate between the outer query and subquery.
SELECT orderId, OrderDate, EmployeeId, CustomerId
FROM Sales.[Order]
WHERE OrderId =(SELECT MAX (O.OrderId) 
				FROM Sales.[Order] AS O);

--For scalar function to be valid, the subquery must return a scalar value or a single value. The following query is incorrect.
-- This query works without error because there happens to be one employee who meets this criteria. However, since it has a potential to return
--more than one results, it is wrong.
SELECT orderId, EmployeeId
FROM Sales.[Order]
WHERE EmployeeId = (SELECT E.EmployeeId
					FROM HumanResources.Employee AS E
					WHERE E.EmployeeLastName LIKE N'C%');

--Trying the same query with condition that will return more than one value (employee whose last name starts with D) and see if the query gets an error. Davis and Doyle are two results.
SELECT orderId
FROM Sales.[Order]
WHERE EmployeeId = (SELECT E.EmployeeId
					FROM HumanResources.Employee AS E
					WHERE E.EmployeeLastName LIKE N'D%');

--In the following query, the subquery returns NULL. The outer query returns empty set because NULL with an equality operator (comparison) returns UNKNOWN.
SELECT orderID
FROM Sales.[Order]
WHERE EmployeeId = (SELECT E.EmployeeId
					FROM HumanResources.Employee AS E
					WHERE E.EmployeeLastName LIKE N'A%');



<span style="color: #608b4e;"><b>Self-contained&nbsp;multivalued&nbsp;subquery</b></span>

<span style="color: #608b4e;">Proposition: Return orders placed by employees whose last name starts with 'D'</span>

In [None]:

--Returns multiple values as a single column. Use the IN predicate. 
use Northwinds2022TSQLV7
SELECT orderId, EmployeeId
FROM Sales.[Order]
WHERE EmployeeId IN (SELECT E.EmployeeId
					FROM HumanResources.Employee AS E
					WHERE E.EmployeeLastName LIKE N'D%');

--IN can handle none, one or more.

--Notice that you can accomplish the same thing with an inner join:
SELECT orderID, o.employeeid, E.employeeLastName
FROM Sales.[Order] O
	INNER JOIN HumanResources.Employee AS E
		ON O.EmployeeId= E.EmployeeId
WHERE E.EmployeeLastName LIKE N'D%';
--Multiple query methods are possible to arrive at the same answer. Depending on the database engine, joins perform better or at times subqueries are better.
--Pick the fastest one for particular databases

--Applying the group by.
SELECT E.EmployeeLastName, count(O.orderID) as [Number of Orders]
FROM Sales.[Order] O
	INNER JOIN HumanResources.Employee AS E
		ON O.EmployeeId= E.EmployeeId
WHERE E.EmployeeLastName LIKE N'D%'
GROUP BY E.EmployeeLastName
ORDER BY E.EmployeeLastName;

--another example of multivalued subquery
--Proposition: Return orders placed by customers from United States. 
SELECT CustomerId, orderid, orderdate, EmployeeId
FROM Sales.[Order]
	WHERE CustomerId IN (SELECT C.CustomerId
						 FROM Sales.Customer AS C
						 WHERE CustomerCountry = N'USA');


<span style="color: #608b4e;"><b>Demonstration&nbsp;of&nbsp;Multiple&nbsp;Self-Contained&nbsp;Subqueries</b></span>

<span style="color: #608b4e;">Proposition:&nbsp;Write&nbsp;a&nbsp;query&nbsp;that&nbsp;returns&nbsp;all&nbsp;individual&nbsp;order&nbsp;IDs&nbsp;that&nbsp;are&nbsp;missing&nbsp;between&nbsp;the&nbsp;minimum&nbsp;and&nbsp;maximum&nbsp;orderid in&nbsp;the&nbsp;table.&nbsp;In&nbsp;other&nbsp;words,&nbsp;return&nbsp;the&nbsp;odd-numbered&nbsp;orderids</span>

In [None]:

--Demonstration of Multiple Self-Contained Subqueries
DROP TABLE IF EXISTS dbo.Orders;
CREATE TABLE dbo.Orders (
	orderId INT NOT NULL CONSTRAINT PK_Orders PRIMARY KEY
);

--popluate the table with even-numbered orderid.
INSERT INTO dbo.Orders (orderId)
	SELECT orderId
	FROM Sales.[Order]
	Where orderid % 2 = 0;

--Proposition: Write a query that returns all individual order IDs that are missing between the minimum and maximum ones in the table. In other words, return the odd-numbered orderids
SELECT n
FROM dbo.Nums
WHERE n BETWEEN (SELECT MIN (O.OrderId) FROM dbo.Orders AS O)
			AND (SELECT MAX (O.OrderId) FROM dbo.Orders AS O)
  AND n	NOT IN (SELECT O.orderId FROM dbo.Orders AS O);


**Left outer join with self-contained subqueries**

<span style="color: #608b4e;">Proposition:&nbsp;Return&nbsp;customers&nbsp;with&nbsp;no&nbsp;orders</span>

In [None]:
use Northwinds2022TSQLV7
--You can negate the IN Predicate

SELECT CustomerId, CustomerCompanyName
FROM Sales.Customer
WHERE CustomerId NOT IN (SELECT O.CustomerId
						 FROM Sales.[Order] AS O);

--You can do the same thing with the left outer join
select c.customerid, c.customercompanyname
from sales.customer as c
	 left outer join sales.[order] o
	     on c.customerid = o.CustomerId
where o.orderid is null; -- or o.customerid is null
-- It is best practice to qualify the subquery to exlude NULLS. This was done to demonstrate the negation. More on this in the bottom "null trouble"

**Correlated subqueries**

Subqueries refers to attributes from the outer query. The subquery depends on the outer query and cannot be invoked independently.  

\* \*The subquery is evaluated separately for EACH outer row\*\*

Proposition: Find the max orderID for each customer.

In [19]:
USE Northwinds2022TSQLV7
SELECT customerId, OrderId, OrderDate, EmployeeId
FROM Sales.[Order] as O1
WHERE orderId = (SELECT MAX (O2.OrderId)
				  FROM Sales.[Order] as O2
				  WHERE O2.CustomerId = O1.CustomerId);

-- debugging correlated queries is hard because the inner depends on the outer to run. Just substitute the inner query with a constant to see if it works. 
SELECT MAX (O2.OrderId)
FROM Sales.[Order] AS O2
WHERE O2.CustomerId = 85

customerId,OrderId,OrderDate,EmployeeId
91,11044,2016-04-23,4
90,11005,2016-04-07,2
89,11066,2016-05-01,7
88,10935,2016-03-09,4
87,11025,2016-04-15,6
86,11046,2016-04-23,8
85,10739,2015-11-12,3
84,10850,2016-01-23,1
83,10994,2016-04-02,2
82,10822,2016-01-08,6


(No column name)
10739


<span style="color: #608b4e;"><b>CORRELATED SUBQUERY IN THE SELECT CLAUSE</b></span>

<span style="color: rgb(96, 139, 78);">Instead of WHERE clause, you&nbsp;can&nbsp;use&nbsp;the&nbsp;subquery&nbsp;within&nbsp;the&nbsp;SELECT clause&nbsp;if&nbsp;you&nbsp;want&nbsp;to&nbsp;use&nbsp;the&nbsp;attributes&nbsp;to&nbsp;obtain&nbsp;the&nbsp;result&nbsp;you&nbsp;want.</span><span style="color: #608b4e;"><br></span>

<span style="color: #608b4e;">Proposition:&nbsp;Return&nbsp;for&nbsp;each&nbsp;order&nbsp;the&nbsp;percentage&nbsp;of&nbsp;the&nbsp;current&nbsp;order&nbsp;value&nbsp;out&nbsp;of&nbsp;the&nbsp;customer&nbsp;total.&nbsp;</span>

In [20]:
--The inner query returns the sum/total money spent by that customer and the outer query calculates the percentage of customer's each spending against the total.
USE TSQLV4
SELECT orderId, custid, o1.val,
		CAST(100. * val/ (SELECT SUM(val) 
			 FROM Sales.OrderValues AS O2
			 WHERE O2.custid = O1.custid)
		 AS NUMERIC (5,2)) AS Ptc
FROM Sales.OrderValues AS O1
ORDER BY custid, orderid;

orderId,custid,val,Ptc
10643,1,814.5,19.06
10692,1,878.0,20.55
10702,1,330.0,7.72
10835,1,845.8,19.79
10952,1,471.2,11.03
11011,1,933.5,21.85
10308,2,88.8,6.33
10625,2,479.75,34.2
10759,2,320.0,22.81
10926,2,514.4,36.67


<span style="color: #608b4e;"><b>EXISTS&nbsp;predicate</b></span>

<span style="color: #608b4e;">Returns&nbsp;true&nbsp;if&nbsp;subquery&nbsp;returns&nbsp;any&nbsp;rows&nbsp;and&nbsp;false&nbsp;otherwise.</span>

<span style="color: #608b4e;">Proposition:&nbsp;Return&nbsp;customers&nbsp;from&nbsp;Spain&nbsp;who&nbsp;placed&nbsp;orders</span>

In [21]:
USE Northwinds2022TSQLV7
SELECT customerId, CustomerCompanyName
FROM Sales.Customer AS C
WHERE CustomerCountry = N'Spain'
	AND EXISTS
		(SELECT * FROM Sales.[Order] O
		WHERE C.customerid = O.CustomerId);

--Negation: Return customers from Spain who did NOT place any orders.
SELECT customerId, customerCompanyName
FROM Sales.Customer C
WHERE CustomerCountry = N'Spain'
	AND NOT EXISTS
		(SELECT * FROM Sales.[Order] O
		 WHERE C.CustomerId = O.CustomerId);
--EXISTS is a good optimization tool. Just need to know whether a data exists and the database engine does not need to produce the data. Notice that * is used liberally here. 
--EXISTS is two-valued logic. The data either exists or it does not.
--Instead of *, you could also specify SELECT DISTINCT O.CustomerId from Sales.[Order]...so the database engine does not strain itself.

customerId,CustomerCompanyName
8,Customer QUHWH
29,Customer MDLWA
30,Customer KSLQF
69,Customer SIUIH


customerId,customerCompanyName
22,Customer DTDMN


<span style="color:#608b4e;">Returning&nbsp;the&nbsp;previous&nbsp;value relative to the current OrderID.&nbsp;Max&nbsp;of&nbsp;the&nbsp;value&nbsp;prior&nbsp;to&nbsp;the&nbsp;current&nbsp;value.&nbsp;</span>

In [None]:
--Max is looking for the orderId that is immediately before all of the previous orderID's.
SELECT top 20 orderid, orderdate, employeeid, customerid,
	   (SELECT MAX(O2.orderid)
		FROM Sales.[Order] AS O2
		WHERE O2.orderID < O1.orderId) AS PrevOrderId
FROM Sales.[Order] AS O1

--Returning the next value. Min value among the all the OrderId's that occur after the current value returns the orderId that is immediately after the current one.
SELECT top 20 orderid, orderdate, employeeid, customerid,
			(SELECT MIN (O2.orderid)
			 FROM Sales.[Order] AS O2
			 WHERE O2.orderid > O1.orderid) AS NextOrderId
FROM Sales.[Order] AS O1;


<span style="color: #608b4e;"><b>Using&nbsp;running&nbsp;aggregates</b></span>

<span style="color: #608b4e;">Proposition:&nbsp;Use&nbsp;a&nbsp;correlated&nbsp;subquery&nbsp;against&nbsp;a&nbsp;second&nbsp;instance&nbsp;of&nbsp;the&nbsp;view&nbsp;to&nbsp;calculate&nbsp;the&nbsp;running-total&nbsp;quantity.&nbsp;</span>

In [2]:
--Using running aggregates
USE TSQLV4
SELECT orderyear, qty
FROM Sales.OrderTotalsByYear
ORDER BY orderyear;

--Proposition: Use a correlated subquery against a second instance of the view to calculate the running-total quantity. 
SELECT orderyear, qty,
	   (SELECT SUM (O2.qty)
		FROM Sales.OrderTotalsByYear AS O2
		WHERE O2.orderyear <= O1.orderyear) AS runqty
FROM Sales.OrderTotalsByYear AS O1
ORDER BY orderyear;

orderyear,qty
2014,9581
2015,25489
2016,16247


orderyear,qty,runqty
2014,9581,9581
2015,25489,35070
2016,16247,51317


<span style="color: #608b4e;"><b>NULL&nbsp;trouble</b>:&nbsp;problems&nbsp;that&nbsp;can&nbsp;evolve&nbsp;when&nbsp;you&nbsp;forget&nbsp;about&nbsp;NULLs</span>

In [22]:
USE Northwinds2022TSQLV7
SELECT CustomerId, CustomerCompanyName
FROM Sales.Customer
WHERE CustomerId NOT IN(SELECT O.CustomerId
                    FROM Sales.[Order] AS O);
--This returnS customer 22 and 57 who have not placed any orders. But let's say you enter NULL values for the customerId attribute.
INSERT INTO Sales.[Order]
  (CustomerId, EmployeeId, orderdate, requireddate, ShipToDate, shipperid,
   freight, shipToName, shipToAddress, shipToCity, shipToRegion,
   shipToPostalcode, shipToCountry)
  VALUES(NULL, 1, '20160212', '20160212',
         '20160212', 1, 123.00, N'abc', N'abc', N'abc',
         N'abc', N'abc', N'abc');
--Then if you run the same query, it will return empty set. NULL will be excluded in the result because NULLS are included in the subquery results.
--When you use the NOT IN predicate against a subquery that returns at least one NULL, the query always returns an empty set.

SELECT CustomerId, CustomerCompanyName
FROM Sales.Customer
WHERE CustomerId NOT IN(SELECT O.CustomerId
                    FROM Sales.[Order] AS O);

--You can explicitly exclude NULLs in the subquery so the outer query can return NULLS, when you are using NOT IN.
SELECT CustomerId, CustomerCompanyName
FROM Sales.Customer
WHERE CustomerId NOT IN(SELECT O.CustomerId ---alternatively, you can use coalesce function to replace null values. SELECT COALESCE (O.CustomerId, -1) and remove the the were o.customerid IS NOT NULL.
                        FROM Sales.[Order] AS O
                        WHERE O.CustomerId IS NOT NULL);
---alternatively, you can use coalesce function to replace null values. SELECT COALESCE (O.CustomerId, -1) and remove the the were o.customerid IS NOT NULL.
SELECT CustomerId, CustomerCompanyName
FROM Sales.Customer
WHERE CustomerId NOT IN(SELECT COALESCE (O.CustomerId, -1) 
                        FROM Sales.[Order] AS O);
--EXISTS uses two-valued logic. NOT EXISTS is a better alternative to IN because of this NULL trouble.
SELECT CustomerId, CustomerCompanyName
FROM Sales.Customer AS C
WHERE NOT EXISTS
  (SELECT * 
   FROM Sales.[Order] AS O
   WHERE O.CustomerId = C.CustomerId);


--We delete the null row.
DELETE FROM Sales.[Order] WHERE CustomerId IS NULL;

CustomerId,CustomerCompanyName
22,Customer DTDMN
57,Customer WVAXS


CustomerId,CustomerCompanyName


CustomerId,CustomerCompanyName
22,Customer DTDMN
57,Customer WVAXS


CustomerId,CustomerCompanyName
22,Customer DTDMN
57,Customer WVAXS


Substitution Error in a query column name

In [24]:
USE Northwinds2022TSQLV7
DROP TABLE IF EXISTS Sales.MyShipper;

CREATE TABLE Sales.MyShipper
(
  shipper_id  INT          NOT NULL,
  companyname NVARCHAR(40) NOT NULL,
  phone       NVARCHAR(24) NOT NULL,
  CONSTRAINT PK_MyShippers PRIMARY KEY(shipper_id)
);

INSERT INTO Sales.MyShipper(shipper_id, companyname, phone)
  VALUES(1, N'Shipper GVSUA', N'(503) 555-0137'),
	      (2, N'Shipper ETYNR', N'(425) 555-0136'),
				(3, N'Shipper ZHISN', N'(415) 555-0138');

--Only two values are expected. However, all of them are returned.
SELECT shipper_id, companyname
FROM Sales.MyShipper
WHERE shipper_id IN
  (SELECT shipper_id
   FROM Sales.[Order]
   WHERE CustomerId = 43);

--The lesson is that you should use the attribute names consistently throughout the database. shipper_id vs shipperid. Notice that there is red line present under the O.shipper_id when you use alias
/*SELECT shipper_id, companyname
FROM Sales.MyShipper
WHERE shipper_id IN
  (SELECT O.shipper_id
   FROM Sales.[Order] AS O
   WHERE O.CustomerId = 43);*/

--Using the alias to refer to each attribute, the use of O.... helps with identifying the error. The correct attribute name is now used below:
SELECT shipper_id, companyname
FROM Sales.MyShipper
WHERE shipper_id IN
  (SELECT O.shipperid
   FROM Sales.[Order] AS O
   WHERE O.CustomerId = 43);

-- Cleanup
DROP TABLE IF EXISTS Sales.MyShippers;


shipper_id,companyname
1,Shipper GVSUA
2,Shipper ETYNR
3,Shipper ZHISN


shipper_id,companyname
2,Shipper ETYNR
3,Shipper ZHISN
