# HW2. SQL and Relational Algebra

## Objectives

In this assignment, you will write more complex SQL queries. You will practice the following:
 - How to use **Set Operators** to union/intersect multiple tables
 - How to use **Join Operator** to join multiple tables
 - How to use **Aggregations** and **Group By** to aggregate data
 - How to write **Subqueries** in SQL 
 - How to use **Relational Algebra** to describe the SQL queries you have previously written
 

### Background

We will use the same database [bank.db](bank.db) that we used in homework assignment (1). The database has five tables. The following shows their schemas. Primary key attributes are underlined and foreign keys are noted in superscript.
 - Customer = {<span style="text-decoration:underline">customerID</span>, firstName, lastName, income, birthDate}
 - Account = {<span style="text-decoration:underline">accNumber</span>, type, balance, branchNumber<sup>FK-Branch</sup>}
 - Owns = {<span style="text-decoration:underline">customerID</span><sup>FK-Customer</sup>, <span style="text-decoration:underline">accNumber</span><sup>FK-Account</sup>}
 - Transactions = {<span style="text-decoration:underline">transNumber</span>, <span style="text-decoration:underline">accNumber</span><sup>FK-Account</sup>, amount}
 - Employee = {<span style="text-decoration:underline">sin</span>, firstName, lastName, salary, branchNumber<sup>FK-Branch</sup>}
 - Branch = {<span style="text-decoration:underline">branchNumber</span>, branchName, managerSIN<sup>FK-Employee</sup>, budget}

### Notes
 - The *customerID* attribute (*Customer*) is a unique number that represents a customer, it is *not* a customer's SIN
 - The *accNumber* attribute (*Account*) represents the account number
 - The *balance* (*Account*) attribute represents the total amount in an account
 - The *type* (*Account*) attribute represents the type an account: chequing (type CHQ), saving (type SAV), or business (type BUS)
 - The *Owns* relation represents a many-to-many relationship (between *Customer* and *Account*)
 - The *transNumber* attribute (*Transactions*) represents a transaction number, combined with account number it uniquely identify a transaction
 - The *branchNumber* attribute (*Customer*) uniquely identifies a branch
 - The *managerSIN* attribute (*Customer*) represents the SIN of the branch manager
 


### Important Note


Please note that you are not allowed to use anything other than what you have learned in class. As an example, you can use INNER JOIN and LEFT OUTER JOIN, but you should not use CROSS JOIN or NATURAL JOIN because we have not discussed them in class yet. You can also use subqueries in the FROM or WHERE clasuses, but cannot use WITH. You will be penalized **(by -2 points)** for each violation.
 

## Part 1 (25 points): SQL Questions 

Write SQL queries to return the data specified in questions 1 to 19.

**Query Requirement**
 - The answer to each question should be a single SQL query
 - You must order each query as described in the question, order is always ascending unless specified otherwise
 - Every column in the result should have indicative names, so make sure that you include required AS statement to name the column
 - While your queries will not be assessed on their efficiency, marks may be deducted if unnecessary tables are included in the query (for example including Owns and Customer when you only require the customerID of customers who own accounts)

 

**Execute the next two cells**

In [2]:
%load_ext sql

In [3]:
%sql sqlite:///bank.db

'Connected: @bank.db'

**Queries**

**1.** (1 point) *First name, last name, income* of customers whose income is within [60,000, 70,000], order by *income* (desc), *lastName*, *firstName*.

In [161]:
%%sql
SELECT firstName "First name", lastName "Last name", income Income
FROM Customer
WHERE income >= 60000 AND income <= 70000
ORDER BY income DESC, lastName, firstName

 * sqlite:///bank.db
Done.


First name,Last name,Income
Steven,Johnson,69842
Bonnie,Johnson,69198
Larry,Murphy,69037
Evelyn,Scott,68832
Jeffrey,Griffin,68812
Randy,Mitchell,67895
Anna,Cooper,67275
Kimberly,Powell,65555
Mildred,Reed,64499
Helen,Sanchez,63333


**2.** (1 point) *SIN, branch name, salary and manager’s salary - salary* (that is, the salary of the employee’s manager minus salary of the employee) of all employees in New York, London or Berlin, order by ascending (manager salary - salary).

In [183]:
%%sql
SELECT sin SIN, branchName "Branch name", salary Salary, (SELECT salary FROM Employee WHERE sin = managerSIN) - salary "Manager's salary - salary"
FROM Employee AS E
JOIN Branch AS B ON E.branchNumber = B.branchNumber
WHERE branchName IN ("New York", "London", "Berlin")
ORDER BY "manager's salary - salary" ASC

 * sqlite:///bank.db
Done.


SIN,Branch name,Salary,Manager's salary - salary
23528,New York,94974,-4491
11285,New York,93779,-3296
31964,New York,90483,0
55700,London,99289,0
99537,Berlin,90211,0
38351,New York,86093,4390
97216,London,89746,9543
40900,New York,77533,12950
58707,London,85934,13355
57796,New York,75896,14587


**3.** (1 point) *First name, last name, and income* of customers whose income is at least twice the income of any customer whose lastName is Butler, order by last name then first name. 


In [163]:
%%sql
SELECT firstName "First name", lastName "Last name", income Income
FROM Customer
WHERE income >= 2 * (SELECT MIN(income) FROM Customer WHERE lastName = "Butler")
ORDER BY lastName, firstName

 * sqlite:///bank.db
Done.


First name,Last name,Income
Ernest,Adams,75896
Stephanie,Adams,46486
William,Adams,77570
Carol,Alexander,56145
Jack,Anderson,35755
Anthony,Bailey,72328
Henry,Barnes,50640
Laura,Barnes,41159
Ruby,Barnes,84562
Louis,Bell,50159


**4.** (1 point) *Customer ID, income, account numbers and branch numbers* of customers with income greater than 90,000 who own an account at both London and Latveria branches, order by customer ID then account number. The result should contain all the account numbers of customers who meet the criteria, even if the account itself is not held at London or Latveria.

In [157]:
%%sql
SELECT C.customerID, C.income, A.accNumber, A.branchNumber
FROM Customer C
JOIN Owns O ON C.customerID = O.customerID
JOIN Owns O1 ON C.customerID = O1.customerID
JOIN Owns O2 ON C.customerID = O2.customerID
JOIN Account A ON O.accNumber = A.accNumber
JOIN Account A1 ON O1.accNumber = A1.accNumber
JOIN Account A2 ON O2.accNumber = A2.accNumber
JOIN Branch B1 ON A1.branchNumber = B1.branchNumber
JOIN Branch B2 ON A2.branchNumber = B2.branchNumber
WHERE C.income > 90000 AND B1.branchName = "London" AND B2.branchName = "Latveria"
ORDER BY C.customerID, A.accNumber

 * sqlite:///bank.db
Done.


customerID,income,accNumber,branchNumber
27954,94777,10,1
27954,94777,68,3
27954,94777,239,2
51850,97412,35,1
51850,97412,35,1
51850,97412,129,1
51850,97412,129,1
51850,97412,161,3
51850,97412,161,3
51850,97412,182,2


**5.** (1 point) *Customer ID, types, account numbers and balances* of business (type *BUS*) and savings (type *SAV*) accounts owned by customers who own at least one business account or at least one savings account, order by customer ID, then type, then account number.

In [176]:
%%sql
SELECT O.customerID "Customer ID", A.type Types, O.accNumber "Account numbers", balance Balance
FROM Owns O
JOIN Account A ON O.accNumber = A.accNumber
WHERE Types IS NOT "CHQ"
ORDER BY customerID, Types, A.accNumber;


 * sqlite:///bank.db
Done.


Customer ID,Types,Account numbers,Balance
11790,BUS,150,77477.04
11790,SAV,1,118231.13
11799,BUS,174,23535.33
13230,SAV,137,76535.96
13697,SAV,251,33140.3
13874,SAV,82,29525.31
14295,BUS,106,102297.76
14295,BUS,273,65213.27
14295,SAV,245,95413.18
16837,BUS,197,19495.5


**6.** (1 point) *Branch name, account number and balance* of accounts with balances greater than $110,000 held at the branch managed by Phillip Edwards, order by account number.

In [177]:
%%sql
SELECT B.branchName "Branch name", A.accNumber "Account number", A.balance Balance
FROM Branch AS B
JOIN Account AS A ON A.branchNumber = B.branchNumber
JOIN Employee AS E ON B.managerSIN = E.sin
WHERE A.balance > 110000 AND E.firstName = "Phillip" AND E.lastName = "Edwards"
ORDER BY accNumber

 * sqlite:///bank.db
Done.


Branch name,Account number,Balance
London,1,118231.13
London,8,121267.54
London,9,132271.23
London,13,112505.84
London,26,112046.36
London,28,112617.97
London,31,111209.89
London,119,113473.16


**7.** (1 point) Customer ID of customers who have an account at the *New York* branch, who do not own an account at the London branch and who do not co-own an account with another customer who owns an account at the *London* branch, order by customer ID. The result should not contain duplicate customer IDs.

In [199]:
%%sql
SELECT DISTINCT O.customerID "Customer ID"
FROM Owns O
JOIN Owns O1 ON O.customerID = O1.customerID
JOIN Account A ON O.accNumber = A.accNumber
JOIN Account A1 ON O1.accNumber = A1.accNumber
JOIN Branch B ON A1.branchNumber = B.branchNumber
JOIN Branch B1 ON A1.branchNumber = B1.branchNumber
JOIN Owns O2 ON O1.accNumber = O2.accNumber AND NOT O1.customerID = O2.customerID
JOIN Owns O3 ON O2.customerID = O2.customerID
JOIN Account A2 ON O3.accNumber = A2.accNumber
JOIN Branch B2 ON A2.branchNumber = B2.branchNumber
WHERE B.branchName = "New York" AND NOT B1.branchName = "London" AND NOT B2.branchName = "London"
ORDER BY O.customerID

 * sqlite:///bank.db
Done.


Customer ID
29474
30622
30807
38351
44922
47953
52622
59366
61379
61976


**8.** (1 point) *SIN, first name, last name, and salary* of employees who earn more than $70,000, if they are managers show the branch name of their branch in a fifth column (which should be NULL/NONE for most employees), order by branch name. You must use an outer join in your solution (which is the easiest way to do it).

In [178]:
%%sql
SELECT sin SIN, firstName "First name", lastName "Last name", salary Salary, branchName "Branch name"
FROM Employee E
LEFT OUTER JOIN Branch B ON E.sin = B.managerSIN
WHERE salary > 70000
ORDER BY branchName

 * sqlite:///bank.db
Done.


SIN,First name,Last name,Salary,Branch name
11285,Rebecca,Simmons,93779,
23528,Lisa,Russell,94974,
28453,Margaret,White,75146,
30513,Timothy,Perez,78839,
33743,Jacqueline,Scott,70396,
38351,Victor,Perez,86093,
40900,Chris,Garcia,77533,
57796,Ernest,Adams,75896,
58707,Clarence,Watson,85934,
63772,Mary,Powell,74194,


**9.** (1 point) Exactly as question eight, except that your query cannot include any join operation.

In [180]:
%%sql
SELECT sin SIN, firstName "First name", lastName "Last name", salary Salary, NULL "Branch name"
FROM Employee E
WHERE salary > 70000
UNION
SELECT sin SIN, firstName "First name", lastName "Last name", salary Salary, branchName "Branch name"
FROM Employee E, Branch B
WHERE E.sin = B.managerSIN AND salary > 70000
EXCEPT
SELECT sin SIN, firstName "First name", lastName "Last name", salary Salary, NULL "Branch name"
FROM Employee E, Branch B
WHERE E.sin = B.managerSIN AND salary > 70000
ORDER BY branchName

 * sqlite:///bank.db
Done.


SIN,First name,Last name,Salary,Branch name
11285,Rebecca,Simmons,93779,
23528,Lisa,Russell,94974,
28453,Margaret,White,75146,
30513,Timothy,Perez,78839,
33743,Jacqueline,Scott,70396,
38351,Victor,Perez,86093,
40900,Chris,Garcia,77533,
57796,Ernest,Adams,75896,
58707,Clarence,Watson,85934,
63772,Mary,Powell,74194,


**10.**  (1 point) *SIN, first name, last name and salary* of the lowest paid employee (or employees) of the *London* branch, order by sin.

In [181]:
%%sql
SELECT sin SIN, firstName "First name", lastName "Last name", MIN(salary) "Lowest salary"
FROM Employee
JOIN Branch ON Employee.branchNumber = Branch.branchNumber
WHERE branchName = 'London'
ORDER BY sin;

 * sqlite:///bank.db
Done.


SIN,First name,Last name,Lowest salary
24469,Frank,Rodriguez,13950


**11.**  (1 point) *Branch name, and the difference of maximum and minimum (salary gap) and average salary* of the employees at each branch, order by branch name.

In [186]:
%%sql
SELECT branchName "Branch name", MAX(salary) - MIN(salary) "Salary gap", AVG(salary) "Average salary"
FROM Branch
JOIN Employee ON Employee.branchNumber = Branch.branchNumber
GROUP BY branchName
ORDER BY branchName;

 * sqlite:///bank.db
Done.


Branch name,Salary gap,Average salary
Berlin,86862,34714.8125
Latveria,89282,56143.46153846154
London,85339,50813.80952380953
Moscow,58759,49065.71428571428
New York,84021,48649.90476190476


**12.** (1 point) *Count* of the number of employees working at the *New York* branch and *Count* of the number of different last names of employees working at the *New York* branch (two numbers in a single row).

In [187]:
%%sql
SELECT COUNT(*) "Number of employees", COUNT(DISTINCT lastName) "Number of different last names" 
FROM Employee
JOIN Branch ON Employee.branchNumber = Branch.branchNumber
WHERE branchName = 'New York';

 * sqlite:///bank.db
Done.


Number of employees,Number of different last names
21,20


**13.** (1 point) *Sum* of the employee salaries (a single number) at the *New York* branch.

In [188]:
%%sql
SELECT SUM(salary) "Sum of salaries"
FROM Employee
JOIN Branch ON Employee.branchNumber = Branch.branchNumber
WHERE branchName = 'New York'

 * sqlite:///bank.db
Done.


Sum of salaries
1021648


**14.** (1 point) *Customer ID, first name and last name* of customers who own accounts at four different branches, order by Last Name and first Name.

In [108]:
%%sql
SELECT C.customerID "Customer ID", C.firstName "First name", C.lastName "Last name"
FROM Customer C
JOIN Owns O ON C.customerID = O.customerID
JOIN Account A ON O.accNumber = A.accNumber
JOIN Branch B ON A.branchNumber = B.branchNumber
GROUP BY C.customerID
HAVING COUNT (DISTINCT B.branchNumber) = 4
ORDER BY lastName, firstName

 * sqlite:///bank.db
Done.


Customer ID,First name,Last name
44922,Dennis,Flores
73386,Arthur,Jones
62312,Phyllis,Lopez
90667,Carl,Murphy
92389,Amy,Ross
65441,Arthur,Thompson


**15.** (2 points) *Average income* of customers older than 60 on Jun 12,2023 and average income of customers younger than 20 on Jun 12,2023, the result must have two named columns, with one row, in one result set (hint: look up [SQLite time and date functions](https://www.sqlite.org/lang_datefunc.html)).

In [121]:
%%sql
SELECT AVG(income) "Average income > 60yrs", 
(SELECT AVG(income) FROM Customer WHERE date("2023-06-12","-20 years") < birthDate ) AS "Average income < 20yrs"
FROM Customer 
WHERE date("2023-06-12","-60 years") > birthDate

 * sqlite:///bank.db
Done.


Average income > 60yrs,Average income < 20yrs
52583.0,56570.0


**16.** (2 points) *Customer ID, last name, first name, income, and average account balance* of customers who have at least three accounts, and whose last names begin with *S* and contain an *e* (e.g. **S**t**e**ve) **or** whose first names begin with *A* and have the letter *n* just before the last 2 letters (e.g. **An**ne), order by customer ID. Note that to appear in the result customers must have at least three accounts and satisfy one (or both) of the name conditions.

In [134]:
%%sql
SELECT C1.customerID "Customer ID", C1.lastName "Last name", C1.firstName "First name", C1.income "Income", C2.avgBalance "Average account balance"
FROM Customer C1 JOIN 
(SELECT O.customerID, AVG(balance) avgBalance
 FROM Account A 
 JOIN Owns O ON A.accNumber = O.accNumber 
 GROUP BY O.customerID 
 HAVING COUNT(*) >= 3) AS C2 ON C1.customerID = C2.customerID
WHERE C1.lastName LIKE "S%e%" OR C1.firstName LIKE "A%n__"
ORDER BY C1.customerID

 * sqlite:///bank.db
Done.


Customer ID,Last name,First name,Income,Average account balance
14295,Ramirez,Anne,44495,87641.40333333334
29474,White,Amanda,59360,68591.57333333335
52189,Sanders,Shawn,13615,68936.21166666666
79601,Sanders,Joe,95144,58843.438
81263,Cooper,Anna,67275,68895.66333333333


**17.** (2 points) *Account number, balance, sum of transaction amounts, and balance - transaction sum* for accounts in the *London* branch that have at least 15 transactions, order by transaction sum.

In [191]:
%%sql
SELECT A.accNumber "Account number", A.balance, SUM(T.amount) "Sum of transaction amounts", A.balance - SUM(T.amount) "Balance - transaction sum"
FROM Account A
JOIN Transactions T ON A.accNumber = T.accNumber
JOIN Branch B ON A.branchNumber = B.branchNumber
WHERE branchName = 'London'
GROUP BY A.accNumber
HAVING COUNT(*) >= 15
ORDER BY "Sum of transaction amounts";

 * sqlite:///bank.db
Done.


Account number,balance,Sum of transaction amounts,Balance - transaction sum
113,82792.58,82792.58,0.0
9,132271.23,132271.22999999998,2.9103830456733704e-11


**18.** (2 points) *Branch name, account type, and average transaction amount* of each account type for each branch for branches that have at least 50 accounts combined, order by branch name, then account type.

In [200]:
%%sql
SELECT B1.branchName "Branch name", A.type "Account type", AVG(T.amount) "Average transaction amount" 
FROM Branch B1
JOIN 
(SELECT branchNumber
 FROM Account
 GROUP BY branchNumber
 HAVING COUNT(*) >= 50) AS B2 ON B1.branchNumber = B2.branchNumber
JOIN Account A ON B2.branchNumber = A.branchNumber
JOIN Transactions T ON A.accNumber = T.accNumber 
GROUP BY B1.branchName, A.type
ORDER BY B1.branchName, A.type

 * sqlite:///bank.db
Done.


Branch name,Account type,Average transaction amount
Latveria,BUS,6323.264077253221
Latveria,CHQ,6950.850576923073
Latveria,SAV,6925.2736708860775
London,BUS,9334.790548780493
London,CHQ,8947.788654970751
London,SAV,8281.66272727273
New York,BUS,7533.197088607597
New York,CHQ,7541.038226950345
New York,SAV,5932.801875000004


**19.** (3 points) *Branch name, account type, account number, transaction number and amount* of transactions of accounts where the average transaction amount is greater than three times the (overall) average transaction amount of accounts of that type. For example, if the average transaction amount of all business accounts is 2,000 then return transactions from business accounts where the average transaction amount for that account is greater than 6,000. Order by branch name, then account type, account number and finally transaction number. Note that all transactions of qualifying accounts should be returned even if they are less than the average amount of the account type.

In [193]:
%%sql
SELECT branchName "Branch name", A1.type "Account type", A1.accNumber "Account number", transNumber "Transaction number", T.amount "Amount of transactions"
FROM Branch B
JOIN Account A1 ON B.branchNumber = A1.branchNumber
JOIN Transactions T ON A1.accNumber = T.accNumber
JOIN 
(SELECT accNumber, AVG(amount) AS avgAcc
 FROM Transactions
 GROUP BY accNumber) AS AAs ON AAs.accNumber = A1.accNumber
JOIN
(SELECT A2.type, AVG(amount) AS avgType
 FROM Transactions
 JOIN Account A2 ON Transactions.accNumber = A2.accNumber
 GROUP BY A2.type) AS ATS ON A1.type = ATS.type
WHERE avgAcc > 3 * avgType

 * sqlite:///bank.db
Done.


Branch name,Account type,Account number,Transaction number,Amount of transactions
London,CHQ,13,1,108440.2
London,CHQ,13,2,1770.56
London,CHQ,13,3,2587.99
London,CHQ,13,4,-292.91
London,BUS,18,1,103802.18
London,BUS,18,2,1588.38
London,BUS,18,3,-1161.43
London,BUS,18,4,-649.44
London,SAV,121,1,98101.36
London,SAV,121,2,-524.42


## Part 2 (5 points): Relational Algebra Questions

### Preparation 

To write a relational algebra query in a cell, the cell should be a [Markdown cell](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html). You can use [LaTeX equations](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html#LaTeX-equations) in a markdown cell for required algebraic notation. Double click on this cell to see the souce code for each operator. Here is a list of the main operators:

* Selection ($\sigma$)
* Projection ($\pi$)
* Union ($\cup$)
* Intersect ($\cap$)
* Set Difference ($-$) 
* Cross Product ($\times$)
* Rename ($\rho$)
* Join ($\bowtie$)
* Conjunction ($\wedge$)
* Disjunction ($\vee$)
* Greater Than or Equal To ($\geq$)
* Less Than or Equal To ($\leq$)
* Semijin ($\ltimes$)
* Antijoin ($\bar{\ltimes}$)

You may also need $_{Subscript}$ and $^{Superscript}$ in the notations you use.

Consider the same bank database you have used before.
 - Customer = {<span style="text-decoration:underline">customerID</span>, firstName, lastName, income, birthDate}
 - Account = {<span style="text-decoration:underline">accNumber</span>, type, balance, branchNumber<sup>FK-Branch</sup>}
 - Owns = {<span style="text-decoration:underline">customerID</span><sup>FK-Customer</sup>, <span style="text-decoration:underline">accNumber</span><sup>FK-Account</sup>}
 - Transactions = {<span style="text-decoration:underline">transNumber</span>, <span style="text-decoration:underline">accNumber</span><sup>FK-Account</sup>, amount}
 - Employee = {<span style="text-decoration:underline">sin</span>, firstName, lastName, salary, branchNumber<sup>FK-Branch</sup>}
 - Branch = {<span style="text-decoration:underline">branchNumber</span>, branchName, managerSIN<sup>FK-Employee</sup>, budget}
 




 **In each question below, please write down the relational algebraic presentations for the described query. For this question, we use relational algebra on set.**

**20.1** (1 point) Find out names of the bank branches and first name and last name of their managers.

\begin{equation}
\pi_{branchName, firstName, lastName}(Employee \bowtie_{sin=managerSIN} Branch)
\end{equation}

 **20.2** (1 point) Show account number, account type, account balance, and transaction amount of the accounts with balance higher than 100,000 and transaction amounts higher than 15000.

\begin{equation}
\pi_{accNumber, type, balance, amount} (Account \bowtie_{(Account.accNumber=Transactions.accNumber) \wedge (balance>100,000) \wedge(amount>15,000)} Transactions)
\end{equation}

**20.3** (1 point) Show first name, last name, and income of customers whose income is at least twice the income of any customer whose lastName is Butler. 

\begin{equation}
\epsilon(\pi_{firstName, lastName, income} (Customer \bowtie_{income\geq2*income'} (\rho_{firstName\rightarrow firstName', lastName \rightarrow lastName', income \rightarrow income'}(\sigma_{lastName="Butler"}(Customer))))
\end{equation}

**20.4** (2 points) Show Customer ID, income, account numbers and branch numbers of customers with income greater than 90,000 who own an account at both London and Latveria branches. The result should contain all the account numbers of customers who meet the criteria, even if the account itself is not held at London or Latveria.

$R = \pi_{customerID, accNumber, branchNumber, branchName}(Customer \bowtie Owns \bowtie Account \bowtie Branch)$

$S = \epsilon(\pi_{customerID}(\sigma_{branchName='London'}(R)) \cap \pi_{customerID}(\sigma_{branchName='Latveria'}(R))$)

$Answer = \pi_{customerID, income, accNumber, branchNumber}(R \ltimes_{(income>90,000) \wedge (customerID=customerID')} \rho_{customerID \rightarrow customerID'}(S))$

## Submission

Complete the code and markdown cells in this notebook and submit it to the Canvas activity Homework 2.