# Explore how JOINs work

## Overview

The purpose of this activity is to explore the main types of SQL joins and how the tables are connected based on a primary key from one table and a foreign key from the other table. **Primary keys** reference columns in which each value is unique to that table, and **foreign keys** are values in a table that match primary key values in another table thus creating a link between the two tables. The four main types of SQL joins are:

| JOIN | Definition |
| --- | --- |
| **INNER** | A function that returns records with matching values in both tables |
| **LEFT** | A function that returns all the records from the left table (first mentioned) and only the matching records from the right table (second mentioned) |
| **RIGHT** | A function that returns all records from the right table (second mentioned) and only the matching records from the left table (first mentioned) |
| **OUTER** | A function that combines the RIGHT JOIN and LEFT JOIN to return all matching records in both tables |

## Dataset

I receive 2 .csv files containing the data for employees and departments:

- the **Employees table** can be viewed in [Google Sheets](https://drive.google.com/file/d/12BSu6-yywMPOddbe4tX3APbPxyY9_7jk/view?usp=drive_link) or the [.csv file](/activities/sql/c05m03-explore-joins/c05m03-joins-employees-data.csv), and records are saved in the format: name,department_id,role
- the **Departments table** can be viewed in [Google Sheets](https://drive.google.com/file/d/1bGRPRjU5IE6MwSd2gCLvBgm2F8AMviiP/view?usp=drive_link) or the [.csv file](/activities/sql/c05m03-explore-joins/c05m03-joins-departments-data.csv) and records are saved in the format: name,department_id

Below is a preview of both tables in .csv format:

![Data in csv format](c05m03-joins-tables-data.png 'Data in csv format')

## Importing the data in BigQuery

The following steps are followed to import the employees and departments data to BigQuery:

- **Create dataset** with **Dataset ID** `employees_data`
- In the **Dataset info** window, select the **CREATE TABLE** button
- In the **Source** section, select the ***Upload*** option in **Create table from**
- Browse to the `c05m03-joins-employees-data.csv` file and open
- Set the file format to `.csv`
- In the **Destination** section, name the table as `employees`
- In the **Schema** section, select **Auto detect**
- Finally, select **Create table**

A new table `employees` has been created and appear in the explorer pane under the database `employees_data`. The above steps are repeated to create a new table `departments` from the file `c05m03-joins-departments-data.csv`. A preview of the BigQuery tables are shown below:

![Data in BigQuery](c05m03-joins-tables-bigquery.png 'Data in BigQuery')

## Exploring: INNER JOIN

In BigQuery, I execute the following query using an INNER JOIN to return records with matching values in both tables:

In [None]:
SELECT
	employees.name AS employee_name,
	employees.role AS employee_role,
	departments.name AS department_name
FROM
	plucky-aegis-427011-v5.employee_data.employees AS employees
INNER JOIN
	plucky-aegis-427011-v5.employee_data.departments AS departments
	ON employees.department_id = departments.department_id;

The output of the query provides me with the names of all employees where their department has been indicated:

![Inner Join Results](c05m03-query-inner-join.png 'Inner Join Results')

## Exploring: LEFT JOIN

I execute the following query in BigQuery using a LEFT JOIN to return all the records from the left/first table and all matching records from the right/second table:

In [None]:
SELECT
	employees.name AS employee_name,
	employees.role AS employee_role,
	departments.name AS department_name
FROM
	plucky-aegis-427011-v5.employee_data.employees AS employees 
LEFT JOIN
	plucky-aegis-427011-v5.employee_data.departments AS departments
    ON employees.department_id = departments.department_id;

The output of the query provides me with the names of all employees regardless of whether their department has been indicated:

![Left Join Results](c05m03-query-left-join.png 'Left Join Results')

## Exploring: RIGHT JOIN

In BigQuery, I execute the following query using a RIGHT JOIN to return all the records from the right/second table and all matching records from the left/first table:

In [None]:
SELECT
	employees.name AS employee_name,
	employees.role AS employee_role,
	departments.name AS department_name
FROM
	plucky-aegis-427011-v5.employee_data.employees AS employees 
RIGHT JOIN
	plucky-aegis-427011-v5.employee_data.departments AS departments
    ON employees.department_id = departments.department_id;

The output of the query provides me with the names of all the departments regardless of whether any employees have been assigned to them:

![Right Join Results](c05m03-query-right-join.png 'Right Join Results')

## Exploring: FULL OUTER JOIN

I execute the following query in BigQuery using a FULL OUTER JOIN to return all the matching records in both tables:

In [None]:
SELECT
	employees.name AS employee_name,
	employees.role AS employee_role,
	departments.name AS department_name
FROM
	plucky-aegis-427011-v5.employee_data.employees AS employees 
FULL OUTER JOIN
	plucky-aegis-427011-v5.employee_data.departments AS departments
      ON employees.department_id = departments.department_id;

The output of the query provides me with every record from the employees table and every record from the departments table regardless of whether there are matching values:

![Outer Join Results](c05m03-query-outer-join.png 'Outer Join Results')