**Chapter 3: JOINS.**

**CROSS JOINS**

Proposition: I would like a Cartesian Product of custoemers and employees. 

There are 91 customers and 9 employees. There should be 91 \* 9 = 819 rows in the result.

All other joins are first cross joins/cartesian product before additional filtering is applied on the matched column attribute, logically speaking. Physical query processing may run differently.

In [None]:
USE Northwinds2022TSQLV7

SELECT C.CustomerId, E.EmployeeId
FROM Sales.Customer AS C
    CROSS JOIN HumanResources.Employee AS E;

SELF CROSS JOINS

Proposition: Produce all possible combinations of pairs of employees. This may include an employee being paired with him/herself. 

9 \* 9 = 81

In [None]:
USE Northwinds2022TSQLV7
SELECT E1.EmployeeId, E2.EmployeeId
FROM HumanResources.Employee AS E1
    CROSS JOIN HumanResources.Employee AS E2;

Producing Tables of Numbers (Application of self cross join)

Proposition: Write a query that produces a sequence of integers in the range 1 through 1000. 

A sequence of numbers is a powerful tool for many purposes. First, create a digits table with values from 0 to 9. Then apply the table whenever necessary.

In [None]:
USE Northwinds2022TSQLV7
DROP TABLE IF EXISTS dbo.Digit;
CREATE TABLE dbo.Digit (digit INT NOT NULL PRIMARY KEY);
INSERT INTO dbo.Digit (digit)
    VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);

SELECT digit FROM dbo.Digit;

SELECT D3.digit * 100 + D2.digit * 10 + D1.digit + 1 AS n -- multiply by power of 10. 100, 10, digit table itself (ones). +1 is to discard 0. 
FROM dbo.Digit AS D1
    CROSS JOIN dbo.Digit AS D2
    CROSS JOIN dbo.digit AS D3
ORDER BY n;


**INNER JOINS or JOINS**

Proposition: Produce a list of orderID, employee first and last names, and employeeid. "Match each employee row with all order rows that have the same employeeID as in the employee row."

Since these attributes are from two different tables, inner join is necessary. The cartesian product is filtered by a match in the employeeId.

In [None]:
USE Northwinds2022TSQLV7
SELECT E.EmployeeId, E.EmployeeFirstName, E.EmployeeLastName, O.OrderId
FROM HumanResources.Employee AS E
    INNER JOIN SALES.[Order] AS O
        ON E.EmployeeId = O.EmployeeId

**COMPOSITE JOINS** (application of join)

For tables based on a primary key-foreign key relationship based on more than one attribute, you may need to perform composite join where you need to match multiple attributes from each side.

Proposition: Return a query with current value from OrderDetails table and the value before and after the change from the OrderDetailsAudit table.

Primary key of the OrderDetails table is composed of two attributes: productid, and orderid.

In [None]:
/*
    FROM dbo.Table1 AS T1
        INNER JOIN dbo.Table2 AS T2
         ON T1.col1 = T2.col1
         AND T2.col1 = T2.col2
*/

--In the following table creation, there are two columns in Sales.ORderDetailAudit that are also present in Sales.OrderDetail: orderid and productid. When you join these two tables, you will need to perform the composite join.
USE Northwinds2022TSQLV7
DROP TABLE IF EXIstS Sales.OrderDetailsAudit;

CREATE TABLE Sales.OrderDetailsAudit(
    lsn INT NOT NULL IDENTITY,
    orderid INT NOT NULL,
    productid INT NOT NULL,
    dt DATETIME NOT NULL,
    loginname sysname NOT NULL,
    columnname sysname NOT NULL,
    oldval SQL_VARIANT,
    newval SQL_VARIANT,
    CONSTRAINT PK_OrderDetailsAudit PRIMARY KEY (lsn),
    CONSTRAINT FK_OrderDetailsAudit_OrderDetails
        FOREIGN KEY(orderid, productid)
        REFERENCES Sales.OrderDetail (OrderId, ProductId)
);

SELECT OD.OrderId, OD.ProductId, OD.Quantity,
    ODA.dt, ODA.loginname, ODA.oldval, ODA.newval 
FROM Sales.OrderDetail AS OD
    INNER JOIN Sales.OrderDetailsAudit AS ODA
        ON OD.orderid = ODA.orderid
        AND OD.productid = ODA.productid
WHERE ODA.columnname = N'qty';

**NON-EQUI JOINS**

Equi joins: When a join condition involves only an equality operator. =

Non-equi joins: When a join condition involves any operator besides equality. \>, \<, etc

Proposition: Join two instances of the Employees table to produce unique pairs of employees. This is an example of non-equi join and self join.

This result excludes self pairs (1 with 1) and mirrored pairs (1 with 2 and also 2 with 1). The purpose is to produce unique pairs.

In [None]:
USE Northwinds2022TSQLV7
SELECT
    E1.EmployeeId, E1.EmployeeFirstName, E1.EmployeeLastName,
    E2.EmployeeId, E2.EmployeeFirstName, E2.EmployeeLastname
FROM HumanResources.Employee AS E1
    INNER JOIN HumanResources.Employee AS E2
        ON E1.EmployeeId < E2.EmployeeId
ORDER BY E1.EmployeeId;

**Multi-Join Queries**

When more than one table operates in the FROM clause, the table operators are logically processed from left to right. The result of the first table operateor is treated as the left input to the second table operator.  The first join operates on two base tables, but all other joins get the result of the preceding join as their left input.

Proposition: Join the Customers and Orders tables to match customers with their orders, and then join this result with the OrderDetails table.

In [None]:
USE Northwinds2022TSQLV7
SELECT
    C.CustomerId, C.CustomerCompanyName,
    O.OrderId,
    OD.ProductId, OD.Quantity
FROM Sales.Customer AS C
    INNER JOIN Sales.[Order] AS O
        ON C.CustomerId = O.CustomerId
    INNER JOIN Sales.OrderDetail AS OD
        ON O.OrderId = OD.OrderId;

**OUTER JOIN: LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN**

The keyword is preserve. LEFT OUTER JOIN preserves and returns all data from the left table even when there is no match on the attribute. 

First phase: cross join

Second phase: inner join (overlap)

Thrird phase: Combine inner join with the values from the left table.

Proposition: Return customers and their order. Include the customers who did not palce any orders.

In [None]:
USE Northwinds2022TSQLV7
SELECT C.CustomerId,C.CustomerCompanyName,
    O.orderid
FROM Sales.Customer AS C
    LEFT OUTER JOIN Sales.[Order] as O 
        ON C.CustomerId = O.CustomerId
ORDER BY C.CustomerID;

--Notice that the output includes NULL under OrderId. 

--The following only identifies the customers with no orders:
SELECT C.CustomerId,C.CustomerCompanyName,
    O.orderid
FROM Sales.Customer AS C
    LEFT OUTER JOIN Sales.[Order] as O 
        ON C.CustomerId = O.CustomerId
WHERE O.OrderId IS NULL;

--To see the missing values/the outer rows or NULLs, uou should choose an attribute that can only have NULLs. In this case, you know OrderId would be NULL for customers who did not place any orders.
--The following three are safe to use: Primary key column, join column, and column defined as NOT NULL
--Primary key: primary key cannot be null. If there is a NULL here, it can only meant that the row is an outer row.
--Join Column: if There is a null in the join column, that row is filtered out by the second phase of the join, so null in such a column can only mean that it's an outer row. 
--NULL in a column defined as NOT NULL can only mean that the row is an outer row.

**Using Outer joins to identify and include the missing values**

Proposition: Query all orders from the Orders table from Jan 1, 2014 to Dec 31, 2016. Include the dates with no orders with NULLs as place holders in the order attribute.

First, write a query that returns the sequence of all dates. Then perform a left outer join between the dates result and the Orders table.

In [None]:
--This query returns a sequence of all dates in the range Jan 1, 2014 through Dec 31 2016.
USE Northwinds2022TSQLV7
SELECT DATEADD (day, n-1, CAST('20140101' AS DATE)) AS orderdate -- From Jan 1st 2014, add n-1 number of days and output the result.
FROM dbo.Nums
WHERE n <= DATEDIFF (day, '20140101', '20161231') +1--this calculates how many times to iterate. Calculates the difference between the two dates in days. 
ORDER BY orderdate;
--this is like a forloop. n is the number of iterations. 

--Then outer join between the dates and the orders.

SELECT DATEADD (day, Nums.n-1, CAST('20140101' AS DATE)) AS orderdate,
    O.orderId, O.CustomerId, O.EmployeeId
FROM dbo.Nums
    LEFT OUTER JOIN Sales.[Order] AS O
        ON DATEADD (day, Nums.n-1, CAST('20140101' AS DATE)) = O.orderdate;

OUTER JOINS: Mistake to Avoid

The Where clause of the following query causes all outer rows to be filtered out, effectively nuliffying the outer join. Effectively, the join becomes an inner join. So the programmer either made a mistake in the join type or in the predicate.

In [None]:
USE Northwinds2022TSQLV7
SELECT C.CustomerId, C.CustomerCompanyName, O.OrderId, O.OrderDate
FROM Sales.Customer AS C
    LEFT Outer JOIN Sales.[Order] AS O
        ON C.CustomerId = O.CustomerId
WHERE O.OrderDate >= '20160101';

**Using outer joins in a multi-join query**

FROM evalutes table operates from left to right. If the predicate in the inner jion's ON clause compares an attribute from the nonpreserved side of the outer join and an attribute from the third table, all outer rows are discarded. Remember that outer rows have NULLs in the attributes from the non preserved side of the join, comparing a NULL with anything yields UNKNOWN.

Proposition: Return a query with customerID, OrderId, ProductId, and Quantity. Be sure to include customers with no orders.

In [19]:
--The following query is bad becaues of the reason above.
USE Northwinds2022TSQLV7
SELECT C.CustomerId, O.OrderId, OD.ProductId, OD.Quantity
FROM Sales.Customer AS C 
    LEFT OUTER JOIN Sales.[Order] AS O
        ON C.CustomerId = O.CustomerId
    INNER JOIN Sales.OrderDetail AS OD 
        ON O.OrderId = Od.OrderId;

--Generally, outer rows are dropped whenever any kind of outer join (left, right, full) is folowed by a subsequent inner join or right outer join.

--Work- around 1 (both left outer joins): This might not be good. What if there were rows in Orders that didnt have matches in OrderDetails, and you wanted those rows to be discarded?
SELECT C.CustomerId, O.OrderId, OD.ProductId, OD.Quantity
FROM Sales.Customer AS C 
    LEFT OUTER JOIN Sales.[Order] AS O
        ON C.CustomerId = O.CustomerId
    LEFT OUTER JOIN Sales.OrderDetail AS OD 
        ON O.OrderId = Od.OrderId;

--Work- around 2: inner join before the outer join. Inner join between orders and orderdetails. then join the result with the customers table.
SELECT C.CustomerId, O.OrderId, OD.ProductId, OD.Quantity
FROM Sales.[Order] AS O 
    INNER JOIN Sales.OrderDetail AS OD 
        ON O.OrderId = Od.OrderId
    RIGHT OUTER JOIN Sales.[Order] AS C
        ON C.CustomerId = O.CustomerId;
--Work- around 3: Use parenthesis. Similar to the work-around 2. Parenthesis ensures that inner join happens before the outer join.
SELECT C.CustomerId, O.OrderId, OD.ProductId, OD.Quantity
FROM Sales.Customer AS C
    LEFT OUTER JOIN 
        (Sales.[Order] AS O
            INNER JOIN Sales.OrderDetail AS OD 
                ON O.OrderID = OD.OrderId)
    ON O.CustomerId = C.CustomerId;

: Msg 8120, Level 16, State 1, Line 28
Column 'Sales.Customer.CustomerId' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Using the COUNT aggregate with outer joins

COUNT (\*) aggregate counts rows regardless of their contents such as NULLs. For this reason, you are not supposed to take outer rows into consideration for counting purpose.

Proposition: Return the count of orders for each customer.

In [None]:
--The following query is supposed to return the count of orders for each customer. Customer 22 and 57 who did not place an order are counted as having ordered once. This counted the Nulls as well.
USE Northwinds2022TSQLV7
SELECT C.CustomerId, COUNT(*) AS numorders
FROM Sales.Customer AS C 
    LEFT OUTER JOIN Sales.[Order] AS O
        ON C.CustomerId = O.CustomerId
GROUP BY C.CustomerId;

--Rewrite this query like this:
SELECT C.CustomerId, COUNT(O.OrderId) AS numorders
FROM Sales.Customer AS C 
    LEFT OUTER JOIN Sales.[Order] AS O
        ON C.CustomerId = O.CustomerId
GROUP BY C.CustomerId;
    

Exercise 1

Write a query that generates five copies of each employee row.

In [None]:
USE TSQLV4

SELECT E.empid, E.firstname, E.lastname, N.n 
FROM HR.Employees E
    CROSS JOIN dbo.Nums N
WHERE n <=5
ORDER BY n, empid;

Exercise 1-2

Proposition: Write a query that returns a row for each employee and day in the range June 12, 2016 through June 16,2016.

In [None]:
USE TSQLV4
SELECT E.empid, DATEADD (day, D.n -1, CAST('20160612' AS DATE)) AS dt 
    FROM HR.Employees AS E 
        CROSS JOIN dbo.Nums AS D
WHERE D.n <=DATEDIFF(day, '20160612', '20160616') + 1 -- + 1 inclues the 20160616 within the results.
ORDER BY empid, dt;

Exercise 2 

Proposition: Explain what's wrong in the following query and provide a correct alternative.

In [None]:
USE TSQLV4
/*
SELECT Customers.CustomerId, Customers.companyname, Orders.orderid, Orders.orderdate
FROM Sales.Customers AS C 
    INNER JOIN Sales.Orders AS O
        ON Customers.custid = Orders.customerid;*/

-- The alias was not consistent.
SELECT C.Custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C 
    INNER JOIN Sales.Orders AS O
        ON C.custid = O.custid;

Exercise 3

Proposition: Return US customers, and for each customer return the total number of orders and total quantities.

In [None]:
USE TSQLV4

SELECT TOP(5) * FROM sales.OrderDetails;
SELECT TOP(5) * FROM sales.orders;

SELECT C.custid, COUNT (DISTINCT O.orderid) AS numorders, SUM(OD.qty) AS totalqty
FROM Sales.Customers AS C 
    INNER JOIN Sales.Orders AS O
        ON C.custid = O.custid
    INNER JOIN Sales.OrderDetails AS OD 
        ON O.orderid = OD.orderid
WHERE C.country = N'USA'
GROUP BY C.custid
ORDER BY C.custid

--need to use distinct because of the join between order and orderdetails. orderdetails has multiple lines for each order.

Exercise 4

Proposition: Return customers and their orders, including customers who placed no orders.

In [None]:
USE TSQLV4
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C 
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid 


Exercise 5

Proposition: Return customers who placed no orders

In [None]:
USE TSQLV4
SELECT C.custid, C.companyname, O.orderid
FROM Sales.Customers AS C 
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid 
WHERE O.orderid is NULL

Exercise 6

Proposition: Return customers with orders placed on Feb 12, 2016 along with their orders.

In [None]:
USE TSQLV4
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C 
    INNER JOIN Sales.Orders AS O
        ON C.custid = O.custid 
WHERE O.orderdate = '20160212';

Exercise 7 & 8

Proposition: Write a query that returns all customers in the output, but matches them with their respective orders only if they were placed on Feb 12, 2016:

In [None]:
--Be sure to note that where is a final filter and predicate based on the orderdate is a nonfinal matching candiate. So it must be with the on clause not with the where clause.
--The following query is incorrect.
USE TSQLV4
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C 
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid 
wHERE O.orderdate = '20160212'
    OR O.orderid IS NULL;

--this query is correct.
USE TSQLV4
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C 
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid 
        AND O.orderdate = '20160212'

Exercise 9

Proposition: Return all customers, and for each return a Yes/No value depending on whether the customer placed orders on Feb 12, 2016

In [None]:
USE TSQLV4
SELECT C.custid, C.companyname, O.orderdate,
    CASE O.orderdate
        WHEN '20160212' THEN 'Yes'    --CASE WHEN O.orderid Is not null then 'Yes' ELSE 'NO' END AS Hasorderon20160212
         ELSE 'No'
    END AS HasOrderOn20160212
FROM Sales.Customers AS C 
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid 
        AND O.orderdate = '20160212'