## Lecture 6 Combining data

**Combine data**:
- Vertically: `UNION`
- Horizontally: `JOIN`

### `UNION` and `UNION ALL`

#### Combing query results with `UNION` operator

```MySQL
-- Combine the two with the same schema
-- Ohterwise, you need to use column alias or/and casting
SELECT id, name FROM fun.games
UNION ALL
SELECT id, name FROM toy.toys;

-- column alias
SELECT name, list_price AS price FROM fun.games
UNION ALL
SELECT name, price FROM toy.toys;

-- casting
SELECT `year` FROM fly.flights
UNION DISTINCT
SELECT CAST(`year` AS INT) as `year` FROM fun.games;

-- No duplicates using UNION or UNION DISTINCT
SELECT country FROM customers
UNION DISTINCT
SELECT country FROM offices;
```

### Joins

- `INNER JOIN`
- `LEFT OUTER JOIN`
- `RIGHT OUTER JOIN`
- `FULL OUTER JOIN`

```MySQL
SELECT t.name as toy, m.name as maker
  FROM makers m INNER JOIN toys t
  ON t.meker_id = m.id;
  
-- The order of tables matters 
SELECT empl_id, first_name, o.office_id AS office_id, city
  FROM employees e LEFT OUTER JOIN offices o
  ON e.office_id = o.office_id;
```

When using OUTE JOIN, think about your question, find a *main table* to include all its rows.

**QUIZ**: Which `FROM` clauses could you use to return data about all the customers, even the ones who have not placed any orders?

```MYSQL
✅ FROM customers c LEFT OUTER JOIN orders o ON c.cust_id = o.cust_id;
❌ FROM customers c RIGHT OUTER JOIN orders o ON c.cust_id = o.cust_id;
✅ FROM orders o RIGHT OUTER JOIN customers c ON c.cust_id = o.cust_id;
❌ FROM orders o LEFT OUTER JOIN customers c ON c.cust_id = o.cust_id;
```

#### Use `WHERE` with Joins to identify non-matchs

Write a query to find only the employees whose office IDs do not match any office IDs found in the *office* table?
```MySQL
SELECT empl_id, first_name, last_name
  FROM employees e LEFT JOIN offices o
  ON e.office_id = o.office_id
  WHERE o.office_id is NULL;
```


### Advanced Joins

#### Handling `NULL` values in join key column, `NULL`-safe join

An equality comparison yields `NULL` when one or two side(s) of `=` is `NULL`. SQL engine omit rows with `NULL` in the key column.

If you want to treat `NULL` to be equal to `NULL`, you can use `<=>` instead of `=` in join conditions. 
```SQL
SELECT c.cust_id, name, total
  FROM customers_with_null c JOIN orders_with_null o
  ON c.cust_id <=> o.cust_id;
```

#### Join on non-unique keys

In [1]:
import pandas as pd

this = pd.DataFrame({'key': ['A', 'A', 'B', 'B', 'C', 'D'],
                     'val': [1, 2, 4, 4, 9, 2]})
last = pd.DataFrame({'key': ['A', 'A', 'B', 'C', 'D', 'E', 'F', 'C'], 
                     'val': [1, 99, 2, 3, 4, 5, 6, 7]},)

In [2]:
this

Unnamed: 0,key,val
0,A,1
1,A,2
2,B,4
3,B,4
4,C,9
5,D,2


In [3]:
last

Unnamed: 0,key,val
0,A,1
1,A,99
2,B,2
3,C,3
4,D,4
5,E,5
6,F,6
7,C,7


In [4]:
pd.merge(*[this, last], left_on='key', right_on='key', how='left')

Unnamed: 0,key,val_x,val_y
0,A,1,1
1,A,1,99
2,A,2,1
3,A,2,99
4,B,4,2
5,B,4,2
6,C,9,3
7,C,9,7
8,D,2,4


In [5]:
pd.merge(*[this, last], left_on='key', right_on='key', how='left').groupby('key').mean()

Unnamed: 0_level_0,val_x,val_y
key,Unnamed: 1_level_1,Unnamed: 2_level_1
A,1.5,50.0
B,4.0,2.0
C,9.0,5.0
D,2.0,4.0
