In this part, we'll focus on joining tables with themselves aka Self Joins!

We'll also learn how to do subqueries and correlated subqueries!



We're going to be using `DB Fiddle` for this course. 

Navigate to: https://www.db-fiddle.com/

In the top right corner of the webpage, be sure to select `Database: PostgreSQL 13`

Now, in the `Schema SQL` pane on the left copy and paste the following, or just click this link here: https://www.db-fiddle.com/f/xfR3hUvokfSSiQHshPbNGL/2

```

CREATE TABLE IF NOT EXISTS "workshop_workers" (
    "id" INT,
    "name" TEXT,
    "specialization" TEXT,
    "master_id" INT,
    "experience" INT,
    "project_id" INT
);
INSERT INTO "workshop_workers" VALUES
    (1,'Mathew Conn','woodworking',null,20,1),
    (2,'Kate Brown','woodworking',1,4,1),
    (3,'John Doe','incrusting',5,3,1),
    (4,'John Kowalsky','watchmaking',7,2,3),
    (5,'Suzan Gregowitch','incrusting',null,15,4),
    (6,'Peter Parker','watchmaking',7,3,2),
    (7,'Joe Darrington','watchmaking',null,13,2),
    (8,'Mary Smith','woodworking',1,4,4),
    (9,'Carlos Bell','incrusting',5,1,4),
    (10,'Dennis Wright','watchmaking',7,3,3);

```

Let's consider the following situation: we have information about employees and their supervisors in a single table.

As you can see, there are four people. John is a supervisor to both Casper and Kate, and Casper is Peter's supervisor.

The employee table stores data in a hierarchical structure: employees and their supervisors. Storing a structure like this in a table is quite common. Imagine you want to list each employee's name along with the name of their supervisor. That's where JOINing a table with itself comes in handy:

```
SELECT
  emp.name as employee_name,
  supervisor.name as supervisor_name
FROM employee as emp
JOIN employee as supervisor
  ON emp.supervisor_id = supervisor.id

```

When you join a table with itself, you must alias both occurrences of the table name. Moreover, the column names you refer to must be preceded by the alias of the table you want. 

This way, the database can distinguish which copy of the table you want to select a particular column from.



## Exercise
The workshop_workers table consists of a few columns:

- `id` – the ID of a given worker,
- `name` – the first and last name of a given worker,
- `specialization` – a given worker's specialization,
- `master_id` – the ID of a given worker's supervisor,
- `experience` – number indicating a given worker's years of experience, and
- `project_id` – the ID of the project a given worker is currently assigned.

Show all workers' names together with the names of their direct supervisors.

Rename columns to apprentice_name and master_name, respectively.


### Show me the answer 
Click the three dots below

```
SELECT
  apprentice.name as apprentice_name,
  master.name as master_name
FROM workshop_workers as apprentice
JOIN workshop_workers as master
  ON master.id = apprentice.master_id
```

# Filtering self-joined tables

When self-joining tables, we can still filter the results with the `WHERE` clause, just as we can when JOINing two different tables.

Suppose that we want to show each employee's name along with the name of their supervisor for only those employees who have less than five years of experience:

```
SELECT
  e.name AS employee_name,
  s.name AS supervisor_name
FROM employee AS e
JOIN employee AS s
  ON e.supervisor_id = s.id
WHERE e.experience < 5
```

Note that we always have to specify the table name (an alias) before a given column when self-joining. 

Otherwise, the column name would be ambiguous, as there would be two columns with the same name: one from the employee table, and one from the supervisor table.

## Exercise

Show only worker's `name`, and `specialization`. Alias column with the name as `apprentice`. Consider only those workers whose supervisors have more than 14 years of experience.

```
SELECT
  apprentice.name apprentice,
  apprentice.specialization
FROM workshop_workers apprentice
JOIN workshop_workers master
  ON apprentice.master_id = master.id
WHERE master.experience > 14
```

### Exercise with Filtering


Show the `name` of the apprentice, their `specialization` and the `project_id` if the project they are working on is not directly supervised by their master, i.e. the master currently works on another project.



```
SELECT
  apprentice.name,
  apprentice.specialization,
  apprentice.project_id
FROM workshop_workers apprentice
JOIN workshop_workers master
  ON apprentice.master_id = master.id
WHERE apprentice.project_id != master.project_id
```

### The Dictionary Table

Let's try another table, shall we?

In the next couple of exercises, we'll work with words in a dictionary. Do you know that linguists classify words?

Our table contains some English words and their hypernyms.

A hypernym is a term with a broad meaning, that "contains" other terms. For example, the hypernym of "sparrow" is "bird".


`entry_id` - the ID of an entry in the dictionary,
`word` - the word associated with a particular entry,
`hypernym_id` - the ID of an entry that is a hypernym for a particular word.

Checkout the fiddle here: https://www.db-fiddle.com/f/cBTrWusP7nkM4nF8Ja4snB/3

Or you can copy and paste the schema into the fiddle:

```
CREATE TABLE IF NOT EXISTS "dictionary" (
    "entry_id" INT,
    "word" TEXT,
    "hypernym_id" INT
);
INSERT INTO "dictionary" VALUES
    (1,'bear',2),
    (2,'carnivore',16),
    (3,'color',null),
    (4,'red',3),
    (5,'blue',3),
    (6,'yellow',3),
    (7,'gemstone',null),
    (8,'diamond',7),
    (9,'ruby',7),
    (10,'emerald',7),
    (11,'bird',6),
    (12,'pelican',11),
    (13,'sparrow',11),
    (14,'cyan',5),
    (15,'marine',5),
    (16,'animal',null),
    (17,'fish',16),
    (18,'salmon',17),
    (19,'trout',18);

```

### Exercises

Show each dictionary `word` (name the column `entry`) together with the name of its direct hypernym (name the column hypernym). Don't show the entires with no hypernym.

```
SELECT
  entries.word AS entry,
  hypernym.word AS hypernym
FROM dictionary AS entries
JOIN dictionary AS hypernym
  ON entries.hypernym_id = hypernym.entry_id
```

## Exercise

Show each dictionary entry (name the column entry) together with the name of its direct hypernym (name the column hypernym) and the name of that hypernym's hypernym (name the column grandhypernym).

Only include words that have both a direct hypernym and a "grandhypernym".

```
SELECT
  entry.word AS entry,
  hypernym.word AS hypernym,
  super_hypernym.word AS grandhypernym
FROM dictionary AS entry
JOIN dictionary AS hypernym
  ON entry.hypernym_id = hypernym.entry_id
JOIN dictionary AS super_hypernym
  ON hypernym.hypernym_id = super_hypernym.entry_id
```

## Exercise

For each entry in the dictionary show the word, its direct hypernym and its hypernym's hypernym. Name the columns entry, hypernym and grandhypernym, respectively. Show all such entries, even those that don't have direct hypernyms or grandhypernyms.

```
SELECT
  entry.word AS entry,
  hypernym.word AS hypernym,
  super_hypernym.word AS grandhypernym
FROM dictionary AS entry
LEFT JOIN dictionary AS hypernym
  ON entry.hypernym_id = hypernym.entry_id
LEFT JOIN dictionary AS super_hypernym
  ON hypernym.hypernym_id = super_hypernym.entry_id
```

# Subqueries

Today we'll work with subqueries! We'll start with simple, uncorrelated subqueries. (We'll revisit correlated subqueries later in this part.) Here's a brief reminder:

A subquery is a query within another query.

We can use subqueries in the WHERE clause to compare a given column with the result of a whole query. 

When comparing with the result of the subquery, you can use comparison operators by themselves:

```
SELECT cat_id
FROM cats
WHERE age > (SELECT age FROM cats WHERE cat_name = 'Kitty')
```

or comparison operators with the `ANY` or `ALL` keywords, if your subquery can return multiple rows:

```
SELECT cat_id
FROM cats
WHERE age > ANY (SELECT age FROM cats WHERE cat_name = 'Kitty')
```

or the operator `IN`, if the value of the column compared with the subquery has to be in the result of particular subquery, e.g.

```
SELECT cat_id
FROM cats
WHERE age IN (SELECT age FROM cats WHERE cat_name LIKE 'K%')
```

We can also use the subqueries in the FROM clause, and filter our rows in this way. The subquery in the FROM clause has to have an alias.

```
SELECT MAX(number_of_cats)
FROM 
  (SELECT breed, COUNT(*) AS number_of_cats
  FROM cat
  GROUP BY breed) breed_count
```


## Tables

Check out the tables here: https://www.db-fiddle.com/f/dBL61FGbW6Qt2UKFBoyefy/0



### Exercises

Show the names of orchestras that were created after the 'Chamber Orchestra' and have a rating greater than 7.5.

```
SELECT name
FROM orchestras 
WHERE year > (SELECT year FROM orchestras WHERE name = 'Chamber Orchestra') 
  AND rating > 7.5
```


# Exercise

Select the names of all orchestras that have the same city of origin as any city in which any orchestra performed in 2013.

```
SELECT name
FROM orchestras
WHERE city_origin IN (SELECT city FROM concerts WHERE year = 2013)
```

# Exercise

Show the name of the orchestras that have the same number of members as the 'Musical Orchestra'. Show the number of members in the second column. Name the second column members_count.



```
SELECT
  o.name,
  COUNT(m.id) AS members_count
FROM orchestras o
JOIN member m
  ON o.id = m.orchestra_id
GROUP BY 1
HAVING COUNT(m.id) = (SELECT COUNT(m.id)
FROM orchestras o
JOIN member m
  ON o.id = m.orchestra_id
WHERE o.name = 'Musical Orchestra')
```

## Exercise
Find the average number of members of all orchestras in the table.

Remember this query. You'll need it in the next exercise!



```
SELECT 
  AVG(d.count) 
FROM 
  (SELECT orchestra_id, COUNT(id) FROM member GROUP BY 1) d;
```

## Exercise

Show the name and number of members for each orchestra that has more members than the average membership of all orchestras in the table.

```
SELECT o.name,
  COUNT(m.id)
FROM orchestras o
JOIN member m
  ON o.id = m.orchestra_id
GROUP BY 1
HAVING COUNT(m.id) > (SELECT AVG(d.count) FROM (SELECT orchestra_id, COUNT(id) FROM members GROUP BY 1) AS d)
```

## Correlated subqueries

Correlated subqueries refer to the outer query. Take a look:
```
SELECT cat_id
FROM cats c
WHERE cat_id IN 
  (SELECT owned_cat_id 
   FROM owner 
   WHERE wage > 5000 AND owned_cat_id = c.cat_id)
```

In the above query we selected the IDs of cats that belong to owners who earn more than 5,000. 

Note that the subquery refers to the table cats in the outer query: for each cat in the cats table the subquery is processed separately. The subquery can refer to tables in the outer query, but the outer query cannot refer to tables in the subquery. It's often helpful to give aliases to tables in both queries.

Just like uncorrelated subqueries, the correlated subqueries can be used in the WHERE, HAVING, or FROM clause of the query.

It is also possible to have a subquery in the SELECT clause. Such a subquery has to return exactly one row and column. Here's an example:

```
SELECT 
   name, 
   (SELECT AVG(age) FROM cats c2 WHERE c2.name = c1.name)
FROM cats c1
```

# Exercise

Select the name of each orchestra that held a concert in its country of origin in 2003.

```
SELECT 
  name
FROM orchestras
WHERE country_origin IN (
  SELECT country
  FROM concerts
  WHERE orchestras.id = concerts.orchestra_id
    AND concerts.year = 2003
  )
``` 

Correlated subqueries can be used to find the best object in a certain category. Here we select orchestras which have the best rating among orchestras coming from the same city:

```
SELECT
  name,
  city_origin,
  rating
FROM orchestras o1
WHERE rating = (SELECT MAX(rating) FROM orchestras o2 WHERE o1.city_origin = o2.city_origin)
```

In the subquery, we select the maximal rating for all orchestras which come from the same city as the orchestra in the outer query. 

Note that we use aliases o1, o2 to distinguish between subquery orchestra and outer query orchestra. 

In the outer query we select orchestras which have rating equal to the rating found in the subquery.

## Exercise

Select the name, wage, and experience of all members who earned the most within each orchestra.



```
SELECT
  name,
  wage,
  experience
FROM member m1
WHERE wage = (SELECT MAX(wage) FROM member m2 WHERE m1.orchestra_id = m2.orchestra_id)
```

## Exercise

Show the names of the most experienced members of each orchestra and the name of that orchestra. Rename the columns to member and orchestra, respectively.

```
SELECT m1.name AS member,
  o.name AS orchestra
FROM members m1 
JOIN orchestras o
  ON m1.orchestra_id = o.id
WHERE experience = (SELECT MAX(experience) FROM member m2 WHERE m1.orchestra_id = m2.orchestra_id)
```

## Exercise

Show name of orchestra members who earn more than the average wage of the violinists from their orchestra.



```
SELECT
  m1.name
FROM member m1
WHERE wage > (
  SELECT
    AVG(m2.wage)
  FROM member m2
  WHERE m2.position = 'violin'
    AND m2.orchestra_id = m1.orchestra_id
);
```

## Exercise

Select the name, rating, city of origin, and the total number of concerts it held in Ukraine for each orchestra that originated in Germany. Name the last column count.



```
SELECT 
  name,
  rating,
  city_origin,
  (SELECT COUNT(*)  FROM concerts WHERE orchestras.id = concerts.orchestra_id AND country = 'Ukraine') AS count
FROM orchestras
WHERE country_origin = 'Germany'
```