# HW2. SQL

## Objectives

In this assignment, you will write more complex SQL queries. You will practice the following:
 - How to use **Set Operators** to union/intersect multiple tables
 - How to use **Join Opeartor** to join multiple tables
 - How to use **Aggregations** and **Group By** to aggregate data
 - How to write **Subqueries** in SQL 
 
Please note that you are not allowed to use anything other than what you have learned in the class. As an example, you can use INNER JOIN and LEFT OUTER JOIN, but you should not use CROSS JOIN or NATURAL JOIN because we have not discussed them in the class yet. You can also use subqueries in the FROM or WHERE clasuses, but cannot use WITH. You will be penalized **(by -2 points)** for each violation.

## Background

We will use the same database [bank.db](bank.db) that we used in homework assignment (1). The database has five tables. The following shows their schemas. Primary key attributes are underlined and foreign keys are noted in superscript.
 - Customer = {<span style="text-decoration:underline">customerID</span>, firstName, lastName, income, birthDate}
 - Account = {<span style="text-decoration:underline">accNumber</span>, type, balance, branchNumber<sup>FK-Branch</sup>}
 - Owns = {<span style="text-decoration:underline">customerID</span><sup>FK-Customer</sup>, <span style="text-decoration:underline">accNumber</span><sup>FK-Account</sup>}
 - Transactions = {<span style="text-decoration:underline">transNumber</span>, <span style="text-decoration:underline">accNumber</span><sup>FK-Account</sup>, amount}
 - Employee = {<span style="text-decoration:underline">sin</span>, firstName, lastName, salary, branchNumber<sup>FK-Branch</sup>}
 - Branch = {<span style="text-decoration:underline">branchNumber</span>, branchName, managerSIN<sup>FK-Employee</sup>, budget}

**Notes**
 - The *customerID* attribute (*Customer*) is a unique number that represents a customer, it is *not* a customer's SIN
 - The *accNumber* attribute (*Account*) represents the account number
 - The *balance* (*Account*) attribute represents the total amount in an account
 - The *type* (*Account*) attribute represents the type an account: chequing, saving, or business
 - The *Owns* relation represents a many-to-many relationship (between *Customer* and *Account*)
 - The *transNumber* attribute (*Transactions*) represents a transaction number, combined with account number it uniquely identify a transaction
 - The *branchNumber* attribute (*Customer*) uniquely identifies a branch
 - The *managerSIN* attribute (*Customer*) represents the SIN of the branch manager

## Questions 

Write SQL queries to return the data specified in questions 1 to 19.

**Query Requirement**
 - The answer to each question should be a single SQL query
 - You must order each query as described in the question, order is always ascending unless specified otherwise
 - Every column in the result should have indicative names, so make sure that you include required AS statement to name the column
 - While your queries will not be assessed on their efficiency, marks may be deducted if unnecessary tables are included in the query (for example including Owns and Customer when you only require the customerID of customers who own accounts)

 

**Execute the next two cells**

In [58]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [59]:
%sql sqlite:///bank.db

'Connected: @bank.db'

**Queries**

**1.** (1 point) *First name, last name, income* of customers whose income is within [60,000, 70,000], order by *income* (desc), *lastName*, *firstName*.

In [60]:
%%sql
SELECT  firstName, lastName, income
FROM  Customer 
WHERE  income >= 60000 AND income <= 70000 
ORDER BY income DESC , lastName, firstName

 * sqlite:///bank.db
Done.


firstName,lastName,income
Steven,Johnson,69842
Bonnie,Johnson,69198
Larry,Murphy,69037
Evelyn,Scott,68832
Jeffrey,Griffin,68812
Randy,Mitchell,67895
Anna,Cooper,67275
Kimberly,Powell,65555
Mildred,Reed,64499
Helen,Sanchez,63333


**2.** (1 point) *SIN, branch name, salary and manager’s salary - salary* (that is, the salary of the employee’s manager minus salary of the employee) of all employees in New York, London or Berlin, order by ascending (manager salary - salary).

In [61]:
%%sql 

SELECT employee1.sin, Branch.branchName, employee1.salary , 
                                        manager.salary - employee1.salary as 'manager’s salary - salary'
FROM Employee employee1 , Branch, Employee manager
WHERE employee1.branchNumber == Branch.branchNumber AND (Branch.branchName == 'New York' or
                                                        Branch.branchName == 'London' or 
                                                        Branch.branchName == 'Berlin') AND 
      Branch.managerSIN == manager.sin

ORDER BY manager.salary - employee1.salary ASC

 * sqlite:///bank.db
Done.


sin,branchName,salary,manager’s salary - salary
23528,New York,94974,-4491
11285,New York,93779,-3296
31964,New York,90483,0
55700,London,99289,0
99537,Berlin,90211,0
38351,New York,86093,4390
97216,London,89746,9543
40900,New York,77533,12950
58707,London,85934,13355
57796,New York,75896,14587


**3.** (1 point) *First name, last name, and income* of customers whose income is at least twice the income of any customer whose lastName is Butler, order by last name then first name. 


In [62]:
%%sql

SELECT firstname, lastName, income
FROM Customer
WHERE income >= 2*
(   SELECT MIN(income)
    FROM Customer
    WHERE lastName = 'Butler'
)
ORDER BY lastName, firstName

 * sqlite:///bank.db
Done.


firstName,lastName,income
Ernest,Adams,75896
Stephanie,Adams,46486
William,Adams,77570
Carol,Alexander,56145
Jack,Anderson,35755
Anthony,Bailey,72328
Henry,Barnes,50640
Laura,Barnes,41159
Ruby,Barnes,84562
Louis,Bell,50159


**4.** (1 point) *Customer ID, income, account numbers and branch numbers* of customers with income greater than 90,000 who own an account at both London and Latveria branches, order by customer ID then account number. The result should contain all the account numbers of customers who meet the criteria, even if the account itself is not held at London or Latveria.

In [63]:
%%sql
SELECT c2.customerID , c2.income , a2.accNumber, b2.branchNumber
FROM Customer c2 , Branch b2 , Owns o2 , Account a2
INNER JOIN
( 
  SELECT c.customerID  
  FROM Customer c, Branch b, Owns o, Account a
  WHERE c.customerID == o.customerID and o.accNumber == a.accNumber and a.branchNumber == b.branchNumber 
  and c.income > 90000 and b.branchName = 'London'

  INTERSECT

  SELECT c.customerID 
  FROM Customer c, Branch b, Owns o, Account a
  WHERE c.customerID == o.customerID and o.accNumber == a.accNumber and a.branchNumber == b.branchNumber 
  and c.income > 90000 and b.branchName = 'Latveria'
) AS c1
  ON c2.customerID = c1.customerID
WHERE c2.customerID == o2.customerID and o2.accNumber == a2.accNumber and a2.branchNumber == b2.branchNumber
ORDER BY c2.customerID, a2.accNumber

 * sqlite:///bank.db
Done.


customerID,income,accNumber,branchNumber
27954,94777,10,1
27954,94777,68,3
27954,94777,239,2
51850,97412,35,1
51850,97412,129,1
51850,97412,161,3
51850,97412,182,2
62312,92919,61,3
62312,92919,116,1
62312,92919,219,2


**5.** (1 point) *Customer ID, types, account numbers and balances* of business (type *BUS*) and savings (type *SAV*) accounts owned by customers who own at least one business account or at least one savings account, order by customer ID, then type, then account number.

In [64]:
%%sql

SELECT c.customerID, a.type, a.accNumber, a.balance
FROM Customer c, Owns o, Account a
WHERE c.customerID = o.customerID and o.accNumber = a.accNumber and a.type IN ('BUS', 'SAV') 
ORDER BY c.customerID, a.type, a.accNumber

 * sqlite:///bank.db
Done.


customerID,type,accNumber,balance
11790,BUS,150,77477.04
11790,SAV,1,118231.13
11799,BUS,174,23535.33
13230,SAV,137,76535.96
13697,SAV,251,33140.3
13874,SAV,82,29525.31
14295,BUS,106,102297.76
14295,BUS,273,65213.27
14295,SAV,245,95413.18
16837,BUS,197,19495.5


**6.** (1 point) *Branch name, account number and balance* of accounts with balances greater than $110,000 held at the branch managed by Phillip Edwards, order by account number.

In [65]:
%%sql

SELECT b.branchName, a.accNumber, a.balance
FROM Branch b, Account a, Employee e
WHERE b.branchNumber = e.branchNumber and a.branchNumber == e.branchNumber and b.managerSIN = e.SIN and e.firstName == "Phillip" and e.lastName == "Edwards" and a.balance > 110000 
ORDER BY a.accNumber

 * sqlite:///bank.db
Done.


branchName,accNumber,balance
London,1,118231.13
London,8,121267.54
London,9,132271.23
London,13,112505.84
London,26,112046.36
London,28,112617.97
London,31,111209.89
London,119,113473.16


**7.** (1 point) Customer ID of customers who have an account at the *New York* branch, who do not own an account at the London branch and who do not co-own an account with another customer who owns an account at the *London* branch, order by customer ID. The result should not contain duplicate customer IDs.

In [66]:
%%sql

SELECT DISTINCT o.customerID
FROM Owns o, Account a, Branch b
WHERE o.accNumber = a.accNumber and a.branchNumber == b.branchNumber 
and b.branchName == "New York" and o.customerID NOT IN
  (SELECT o1.customerID
  FROM  Owns o1, Account a1, Branch b1
  WHERE o1.accNumber = a1.accNumber and a1.branchNumber = b1.branchNumber and a1.accNumber IN
      (SELECT a2.accNumber
      FROM  Owns o2, Account a2, Branch b2
      WHERE o2.accNumber = a2.accNumber and a2.branchNumber = b2.branchNumber and o2.customerID IN    
          (SELECT o3.customerID
          FROM Owns o3, Account a3, Branch b3
          WHERE o3.accNumber = a3.accNumber and a3.branchNumber = b3.branchNumber and b3.branchName = "London")))
    
ORDER BY o.customerID

 * sqlite:///bank.db
Done.


customerID
11696
13874
16837
38602
44637
46630
57796
61976
64063
87013


**8.** (1 point) *SIN, first name, last name, and salary* of employees who earn more than $70,000, if they are managers show the branch name of their branch in a fifth column (which should be NULL/NONE for most employees), order by branch name. You must use an outer join in your solution (which is the easiest way to do it).

In [67]:
%%sql
SELECT Employee.sin , Employee.firstName , Employee.lastName , Employee.salary , Branch.branchName 
FROM Employee 
LEFT OUTER JOIN 
Branch 
ON Employee.sin == Branch.managerSIN  
WHERE Employee.salary > 70000
ORDER BY Branch.branchName

 * sqlite:///bank.db
Done.


sin,firstName,lastName,salary,branchName
11285,Rebecca,Simmons,93779,
23528,Lisa,Russell,94974,
28453,Margaret,White,75146,
30513,Timothy,Perez,78839,
33743,Jacqueline,Scott,70396,
38351,Victor,Perez,86093,
40900,Chris,Garcia,77533,
57796,Ernest,Adams,75896,
58707,Clarence,Watson,85934,
63772,Mary,Powell,74194,


**9.** (1 point) Exactly as question eight, except that your query cannot include any join operation.

In [68]:
%%sql
SELECT Employee.sin , Employee.firstName , Employee.lastName , Employee.salary , NULL 
FROM Employee, Branch 
WHERE Employee.salary > 70000 and Employee.branchNumber == Branch.branchNumber AND Employee.SIN != Branch.managerSIN 
UNION
SELECT Employee.sin , Employee.firstName , Employee.lastName , Employee.salary , Branch.branchName 
FROM Employee, Branch 
WHERE Employee.salary > 70000 and Employee.sin == Branch.managerSIN  

ORDER BY Branch.branchName

 * sqlite:///bank.db
Done.


sin,firstName,lastName,salary,NULL
11285,Rebecca,Simmons,93779,
23528,Lisa,Russell,94974,
28453,Margaret,White,75146,
30513,Timothy,Perez,78839,
33743,Jacqueline,Scott,70396,
38351,Victor,Perez,86093,
40900,Chris,Garcia,77533,
57796,Ernest,Adams,75896,
58707,Clarence,Watson,85934,
63772,Mary,Powell,74194,


**10.**  (1 point) *SIN, first name, last name and salary* of the lowest paid employee (or employees) of the *London* branch, order by sin.

In [69]:
%%sql
SELECT Employee.sin , Employee.firstName , Employee.lastName , Min(Employee.salary)  
FROM Employee, Branch 
WHERE Employee.branchNumber == Branch.branchNumber and Branch.branchName == "London"

ORDER BY Employee.sin

 * sqlite:///bank.db
Done.


sin,firstName,lastName,Min(Employee.salary)
24469,Frank,Rodriguez,13950


**11.**  (1 point) *Branch name, and the difference of maximum and minimum (salary gap) and average salary* of the employees at each branch, order by branch name.

In [70]:
%%sql
SELECT Branch.branchName , Max(Employee.salary) - Min(Employee.salary) as 'salary gap', Avg(Employee.salary) as 'average salary'  
FROM Employee, Branch 
WHERE Employee.branchNumber == Branch.branchNumber
GROUP BY Branch.branchName
ORDER BY Branch.branchName

 * sqlite:///bank.db
Done.


branchName,salary gap,average salary
Berlin,86862,34714.8125
Latveria,89282,56143.46153846154
London,85339,50813.80952380953
Moscow,58759,49065.71428571428
New York,84021,48649.90476190476


**12.** (1 point) *Count* of the number of employees working at the *New York* branch and *Count* of the number of different last names of employees working at the *New York* branch (two numbers in a single row).

In [71]:
%%sql
SELECT DISTINCT COUNT(Employee.sin) as 'Employees at NY Branch', 
COUNT(DISTINCT Employee.lastName) as 'Different Last Names'
FROM Employee, Branch
WHERE Employee.branchNumber = Branch.branchNumber and Branch.branchName = 'New York'

 * sqlite:///bank.db
Done.


Employees at NY Branch,Different Last Names
21,20


**13.** (1 point) *Sum* of the employee salaries (a single number) at the *New York* branch.

In [72]:
%%sql
SELECT Sum(Employee.salary) as 'Sum of the employee salaries'
FROM Employee, Branch
WHERE Employee.branchNumber = Branch.branchNumber and Branch.branchName == 'New York'

 * sqlite:///bank.db
Done.


Sum of the employee salaries
1021648


**14.** (1 point) *Customer ID, first name and last name* of customers who own accounts at a max of four different branches, order by Last Name and first Name.

In [73]:
%%sql
SELECT c.customerID, c.firstName, c.lastName
FROM Customer c, Owns o, Account a
WHERE c.customerID == o.customerID and a.accNumber = o.accNumber 
Group by c.customerID
HAVING COUNT(a.branchNumber) <= 4
ORDER BY c.lastName, c.firstName

 * sqlite:///bank.db
Done.


customerID,firstName,lastName
57796,Ernest,Adams
66418,Stephanie,Adams
98826,William,Adams
86858,Carol,Alexander
77100,Laura,Alexander
89197,Lawrence,Anderson
41545,Terry,Bailey
33133,Henry,Barnes
64055,Laura,Barnes
18166,Ruby,Barnes


**15.** (2 points) *Average income* of customers older than 60 and average income of customers younger than 20, the result must have two named columns, with one row, in one result set (hint: look up [SQLite time and date functions](https://www.sqlite.org/lang_datefunc.html)).

In [74]:
%%sql
SELECT AVG(sixty.income) as 'Average income of customers older than 60', 
      AVG(twenty.income) as 'Average income of customers younger than 20'
FROM  
(SELECT income
 FROM Customer
 WHERE (date('now') - birthDate) > 60 
)as sixty,

(SELECT income
 FROM Customer  
 WHERE (date('now') - birthDate) < 20  
)as twenty


 * sqlite:///bank.db
Done.


Average income of customers older than 60,Average income of customers younger than 20
55879.379310344826,41888.333333333336


**16.** (2 points) *Customer ID, last name, first name, income, and average account balance* of customers who have at least three accounts, and whose last names begin with *S* and contain an *e* (e.g. **S**t**e**ve) **or** whose first names begin with *A* and have the letter *n* just before the last 2 letters (e.g. **An**ne), order by customer ID. Note that to appear in the result customers must have at least three accounts and satisfy one (or both) of the name conditions.

In [75]:
%%sql
SELECT c.customerID, c.lastName, c.firstName, c.income, Avg(a.balance)
FROM Customer c, Owns o, Account a
WHERE c.customerID == o.customerID and a.accNumber = o.accNumber 
and (c.lastName like 'S%e%' or c.firstName like 'A%n__')
Group by c.customerID
HAVING COUNT(o.accNumber) >= 3
ORDER BY c.customerID

 * sqlite:///bank.db
Done.


customerID,lastName,firstName,income,Avg(a.balance)
14295,Ramirez,Anne,44495,87641.40333333334
29474,White,Amanda,59360,68591.57333333335
52189,Sanders,Shawn,13615,68936.21166666666
79601,Sanders,Joe,95144,58843.438
81263,Cooper,Anna,67275,68895.66333333333


**17.** (2 points) *Account number, balance, sum of transaction amounts, and balance - transaction sum* for accounts in the *London* branch that have at least 15 transactions, order by transaction sum.

In [76]:
%%sql
SELECT a.accNumber, a.balance, Sum(t.amount) as 'Sum of Transaction Amounts', 
balance - Sum(t.amount) as 'Balance - Transaction Sum'
FROM Account a, Transactions t, Branch b
WHERE a.accNumber == t.accNumber and a.branchNumber == b.branchNumber 
and b.branchName == 'London'
Group By a.accNumber
HAVING COUNT(t.transNumber) >= 15
ORDER BY Sum(t.amount)

 * sqlite:///bank.db
Done.


accNumber,balance,Sum of Transaction Amounts,Balance - Transaction Sum
113,82792.58,82792.58,0.0
9,132271.23,132271.22999999998,2.9103830456733704e-11


**18.** (2 points) *Branch name, account type, and average transaction amount* of each account type for each branch for branches that have at least 50 accounts of any type, order by branch name, then account type.

In [77]:
%%sql
SELECT b.branchName, a.type, Avg(t.amount) as 'Average Transaction Amount'
FROM Account a, Transactions t, Branch b
WHERE a.accNumber == t.accNumber and a.branchNumber == b.branchNumber 
and b.branchNumber in
    (SELECT a1.branchNumber
     FROM Account a1
     GROUP BY a1.branchNumber  
     HAVING COUNT(a1.branchNumber) >= 50)
Group By b.branchNumber, a.type
ORDER BY b.branchName , a.type

 * sqlite:///bank.db
Done.


branchName,type,Average Transaction Amount
Latveria,BUS,6323.264077253221
Latveria,CHQ,6950.850576923073
Latveria,SAV,6925.2736708860775
London,BUS,9334.790548780493
London,CHQ,8947.788654970751
London,SAV,8281.66272727273
New York,BUS,7533.197088607597
New York,CHQ,7541.038226950345
New York,SAV,5932.801875000004


**19.** (3 points) *Branch name, account type, account number, transaction number and amount* of transactions of accounts where the average transaction amount is greater than three times the (overall) average transaction amount of accounts of that type. For example, if the average transaction amount of all business accounts is 2,000 then return transactions from business accounts where the average transaction amount for that account is greater than 6,000. Order by branch name, then account type, account number and finally transaction number. Note that all transactions of qualifying accounts should be returned even if they are less than the average amount of the account type.

In [78]:
%%sql
SELECT b.branchName, a.type, a.accNumber, t.transNumber, t.amount
FROM Account a, Transactions t, Branch b
WHERE a.accNumber == t.accNumber and a.branchNumber == b.branchNumber 
and t.accNumber in
    (SELECT t2.accNumber
FROM Account a2, Transactions t2
WHERE a2.accNumber == t2.accNumber 
GROUP BY t2.accNumber
HAVING Avg(t2.amount) >= 
        (SELECT Avg(t1.amount)*3
        FROM Account a1, Transactions t1
        WHERE a1.accNumber == t1.accNumber and a1.type == a2.type
        )
    )
Group By b.branchName, a.type, a.accNumber, t.transNumber
ORDER BY b.branchName , a.type

 * sqlite:///bank.db
Done.


branchName,type,accNumber,transNumber,amount
Latveria,CHQ,206,1,80371.46
Latveria,CHQ,206,2,3639.13
Latveria,CHQ,206,3,-196.5
London,BUS,18,1,103802.18
London,BUS,18,2,1588.38
London,BUS,18,3,-1161.43
London,BUS,18,4,-649.44
London,CHQ,13,1,108440.2
London,CHQ,13,2,1770.56
London,CHQ,13,3,2587.99


## Submission

Complete the code in this notebook [hw2.ipynb](hw2.ipynb) and submit it to the Canvas activity Homeowork (2).