### Setup

In [43]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

## Writing readable queries


Taking the time to write your queries to be more easily understood will take a little extra time now, but will save you time when you come back to old queries that you have written, and help your colleagues when you're working in a data team.

One obvious area when it comes to writing queries is the use of capitalization and whitespace. Because white space doesn't have any meaning in SQL, it can be used to help convey meaning in a complex query. Let's compare the same query written twice— first without whitespace and capitalization:


![img](img/nocap.png)

And now, with whitespace and capitalization:

![img](img/cap.png)

As you can see, a little time put into whitespace and capitalization pays off. A few tips to help make your queries more readable:

- If a select statement has more than one column, put each on a new line, indented from the select statement.
- Always capitalize SQL function names and keywords
- Put each clause of your query on a new line.
- Use indenting to make subqueries appear logically separate.

Another important consideration when writing readable queries is the use of alias names and shortcuts. Name aliases should be clear– a common convention is using the first letter of the table name, however if you feel that a query is complex you should consider using more explicit aliases. Similarly, at times lines like GROUP BY 1 can be confusing, and explicitly naming the column will make your query more readable.

If you work in a team, you might consider a SQL style guide— a great guide is available at SQL style guide, but remember that readability is more important than consistency. If you have a complex query and you think breaking the style guide will make it more readable, you should do it.


Let's now learn another way to make your queries more readable: named subqueries.

## The WITH clause

When constructing complex queries, it's useful to create an intermediate table to produce our final results. You can use subqueries to create these intermediate tables. Unfortunately, the way subqueries are written makes it harder to read— the person reading the query needs to find the subquery and read from the inside-out.

One way to alleviate this is to use a **with clause**. WITH clauses allow you to define one or more named subqueries before the start of the main query. The main query then refers to the subquery by it's alias name, just as if it's a table in the database.

The syntax for the WITH clause is relatively straight-forward.

```
WITH [alias_name] AS ([subquery])
SELECT [main_query]
``` 

Let's look at a simple example, a query designed to gather some info about the tracks from a single album. First, here's our query written with a standard subquery and **no** WITH clause:

In [44]:
%%sql

SELECT * FROM
    (
     SELECT
         t.name,
         ar.name artist,
         al.title album_name,
         mt.name media_type,
         g.name genre,
         t.milliseconds length_milliseconds
     FROM track t
     INNER JOIN media_type mt ON mt.media_type_id = t.media_type_id
     INNER JOIN genre g ON g.genre_id = t.genre_id
     INNER JOIN album al ON al.album_id = t.album_id
     INNER JOIN artist ar ON ar.artist_id = al.artist_id
    )
WHERE album_name = "Jagged Little Pill";

 * sqlite:///chinook.db
Done.


name,artist,album_name,media_type,genre,length_milliseconds
All I Really Want,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,284891
You Oughta Know,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,249234
Perfect,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,188133
Hand In My Pocket,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,221570
Right Through You,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,176117
Forgiven,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,300355
You Learn,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,239699
Head Over Feet,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,267493
Mary Jane,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,280607
Ironic,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,229825


By moving the subquery before the main query using a WITH clause, the intent of the main query becomes much easier to understand.

In [45]:
%%sql

WITH track_info AS
    (                
     SELECT
         t.name,
         ar.name artist,
         al.title album_name,
         mt.name media_type,
         g.name genre,
         t.milliseconds length_milliseconds
     FROM track t
     INNER JOIN media_type mt ON mt.media_type_id = t.media_type_id
     INNER JOIN genre g ON g.genre_id = t.genre_id
     INNER JOIN album al ON al.album_id = t.album_id
     INNER JOIN artist ar ON ar.artist_id = al.artist_id
    )

    
SELECT * FROM track_info
  WHERE album_name = "Jagged Little Pill";

 * sqlite:///chinook.db
Done.


name,artist,album_name,media_type,genre,length_milliseconds
All I Really Want,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,284891
You Oughta Know,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,249234
Perfect,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,188133
Hand In My Pocket,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,221570
Right Through You,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,176117
Forgiven,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,300355
You Learn,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,239699
Head Over Feet,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,267493
Mary Jane,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,280607
Ironic,Alanis Morissette,Jagged Little Pill,MPEG audio file,Rock,229825


While in this example the difference is subtle, using the WITH statement helps a lot when your main query even has some slight complexities. Let's get some practice using WITH in a more complex example.

*Create a query that shows summary data for every playlist in the Chinook database:*

- Use a WITH clause to create a named subquery with the following info:
    - The unique ID for the playlist.
    - The name of the playlist.
    - The name of each track from the playlist.
    - The length of each track in seconds.
- Your final table should have the following columns, in order:
    - playlist_id - the unique ID for the playlist.
    - playlist_name - The name of the playlist.
    - number_of_tracks - A count of the number of tracks in the playlist.
    - length_seconds - The sum of the length of the playlist in seconds. This column should be an integer.
- The results should be sorted by playlist_id in ascending order.

In [46]:
%%sql

WITH sub_playlist AS
    (
      SELECT pt.track_id,
             pt.playlist_id,
             p.name playlist_name,
             t.name,
             t.milliseconds
        FROM playlist_track pt 
            INNER JOIN playlist p ON pt.playlist_id = p.playlist_id
            INNER JOIN track t ON pt.track_id = t.track_id
    ) 
    
 SELECT subp.playlist_id playlist_id,
        subp.playlist_name playlist_name,
        COUNT(subp.track_id) number_of_tracks,
        CAST(SUM(subp.milliseconds) / 1000 AS INTEGER) length_seconds
        
   FROM sub_playlist subp
   GROUP BY playlist_id
   ORDER BY playlist_id


 * sqlite:///chinook.db
Done.


playlist_id,playlist_name,number_of_tracks,length_seconds
1,Music,3290,877683
3,TV Shows,213,501094
5,90’s Music,1477,398705
8,Music,3290,877683
9,Music Videos,1,294
10,TV Shows,213,501094
11,Brazilian Music,39,9486
12,Classical,75,21770
13,Classical 101 - Deep Cuts,25,6755
14,Classical 101 - Next Steps,25,7575


## Creating views

When we use the WITH clause, we're creating a temporary named subquery that we can use only within that query. But what if we find ourselves using the same WITH with lots of different queries? It would be nice to permanently define a subquery that we can use again and again.

We do this by creating a **view**, which we can then use in all future queries. An easy way to think of this is the WITH clause creates a temporary view. The syntax for creating a view is:

```
CREATE VIEW database.view_name AS
    SELECT * FROM database.table;
```

We'll be specifying the database name using `[database name].[view or table name]` syntax in instead of just `[view or table name]`. You'll need to use this in conjunction with any views because we have manually attached the database. If you're working with SQLite on your local machine, or in one of our Jupyter projects, you don't need to specify the database name like in the following example:

```
CREATE VIEW view_name AS
    SELECT * FROM table;
```

Here's an example of how to create a view called customer_2, identical to the existing customer table:

In [47]:
%%sql

CREATE VIEW customer_2 AS
    SELECT * FROM customer;

 * sqlite:///chinook.db
Done.


[]

In [48]:
%%sql

SELECT * FROM customer_2
LIMIT 3;

 * sqlite:///chinook.db
Done.


customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
2,Leonie,Köhler,,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,+49 0711 2842222,,leonekohler@surfeu.de,5
3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3


If we wanted to modify this view, and tried to redefine it, we'd get an error:

In [49]:
%%sql

CREATE VIEW customer_2 AS
    SELECT
        customer_id,
        first_name || last_name name,
        phone,
        email,
        support_rep_id
    FROM customer;

 * sqlite:///chinook.db
(sqlite3.OperationalError) table customer_2 already exists
[SQL: CREATE VIEW customer_2 AS
    SELECT
        customer_id,
        first_name || last_name name,
        phone,
        email,
        support_rep_id
    FROM customer;]
(Background on this error at: http://sqlalche.me/e/13/e3q8)


If we wish to redefine a view, we first have to delete, or **drop** the existing view:

In [50]:
%%sql

DROP VIEW customer_2;

 * sqlite:///chinook.db
Done.


[]

We're going to create two views that give us versions of the customer table where the customers in the view have specific criteria. The first is a view of all customers that live in the USA.

In [51]:
%%sql

CREATE VIEW customer_usa AS 
     SELECT * FROM customer
     WHERE country = "USA";

 * sqlite:///chinook.db
Done.


[]

We have created this view for you - you can query it in the code editor on the right. Once a view is created it acts exactly like a table - you don't need to specify that it's a view when you are querying it, and you can do anything with a view that you could do with a table (keeping in mind that in our interface you'll have to use [database name].[view_name]).

*Let's create a second view of customers that have purchased more than $90 from our store.*

1. *Create a view called customer_gt_90_dollars:*
- *The view should contain the columns from customer, in their original order.*
- *The view should contain only customers who have purchased more than $90 in tracks from the store.*
2. *After the SQL query that creates the view, write a second query to display your newly created view: SELECT * FROM chinook.customer_gt_90_dollars;.*
- *Make sure you use a semicolon (;) to indicate the end of each query.*

In [52]:
%%sql

CREATE VIEW customer_gt_90_dollars AS 
    SELECT c.*
    FROM customer c INNER JOIN invoice i ON c.customer_id = i.customer_id
    GROUP BY i.customer_id
    HAVING SUM(i.total) > 90

 * sqlite:///chinook.db
Done.


[]

In [54]:
%%sql

SELECT * FROM customer_gt_90_dollars LIMIT 4

 * sqlite:///chinook.db
Done.


customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3
5,František,Wichterlová,JetBrains s.r.o.,Klanova 9/506,Prague,,Czech Republic,14700,+420 2 4172 5555,+420 2 4172 5555,frantisekw@jetbrains.com,4
6,Helena,Holý,,Rilská 3174/6,Prague,,Czech Republic,14300,+420 2 4177 0449,,hholy@gmail.com,5


## Combining rows with union

We have now created two views: customer_usa and customer_gt_90_dollars. How can we find customers who are in different permutations of these two views:

- Customers in the USA **or have** spent more than \$90
- Customers in the USA **and have** spent more than \$90
- Customers in the USA **and have not** spent more than \$90

These scenarios require a different type of join as we're wanting to join rows from not tables, and not columns. Let's start by looking at just the first scenario, where we want to combine rows that exist in either view.

Where regular joins are used to join columns, the union operator is used to join rows from tables and/or views.

![img](img/union.png)

The syntax for the union operator is composed of two or more SELECT statements:

```
[select_statement_one]
UNION
[select_statement_two]
```

Rather than using the ON keyword, the statements before and after UNION must have the **same number of columns**, with **compatible types in order**. We'll learn more about types in a later mission, but as an example, FLOAT and INT are compatible types, but FLOAT and TEXT are not).

![img](img/compat.png)

Because we created customer_usa and customer_gt_90_dollars with identical column names, order, and type as customer, we can safely use UNION.

To achieve our first scenario (identify customers who are in the USA **or have spent** more than $90), the two SELECT statements will be very simple - we can just select all columns and rows from each of the two views.

*Use UNION to produce a table of customers in the USA or have spent more than \$90, using the customer_usa and customer_gt_90_dollars views:*
- The result should contain the columns from customer, in their original order.


In [57]:
%%sql

SELECT * FROM customer_usa

UNION

SELECT * FROM customer_gt_90_dollars

LIMIT 3;

 * sqlite:///chinook.db
Done.


customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3
5,František,Wichterlová,JetBrains s.r.o.,Klanova 9/506,Prague,,Czech Republic,14700,+420 2 4172 5555,+420 2 4172 5555,frantisekw@jetbrains.com,4


## Combining rows using intersect and except

The three scenarios we discussed at the start of the previous screen were:

- Customers who are in the USA **or have** spent more than \$90
- Customers who are in the USA **and have** spent more than \$90
- Customers who are in the USA **and have not** spent more than \$90

We just successfully used UNION for the first, but what about the other two? There are two other operators that will help us with these - **intersect** and **except**. Combined, these three operators allow us to perform set operations in SQL. Here's a diagram and explanation of how these compare with union.

![img](img/sets.png)

![img](img/table_sets.png)

Both the syntax and the rules about column number and ordering of similar types are the same for INTERSECT and EXCEPT as they are for UNION. This means that identifying customers who are in the USA and have spent more than $90 can be done with the following query:

In [59]:
%%sql

SELECT * from customer_usa

INTERSECT

SELECT * from customer_gt_90_dollars

LIMIT 5;

 * sqlite:///chinook.db
Done.


customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
17,Jack,Smith,Microsoft Corporation,1 Microsoft Way,Redmond,WA,USA,98052-8300,+1 (425) 882-8080,+1 (425) 882-8081,jacksmith@microsoft.com,5
20,Dan,Miller,,541 Del Medio Avenue,Mountain View,CA,USA,94040-111,+1 (650) 644-3358,,dmiller@comcast.com,4
21,Kathy,Chase,,801 W 4th Street,Reno,NV,USA,89503,+1 (775) 223-7665,,kachase@hotmail.com,5
22,Heather,Leacock,,120 S Orange Ave,Orlando,FL,USA,32801,+1 (407) 999-7788,,hleacock@gmail.com,4


Identifying customers who are in the USA and have not spent $90 can be done with the following query:

In [62]:
%%sql

SELECT * from customer_usa

EXCEPT

SELECT * from customer_gt_90_dollars

LIMIT 5

 * sqlite:///chinook.db
Done.


customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
16,Frank,Harris,Google Inc.,1600 Amphitheatre Parkway,Mountain View,CA,USA,94043-1351,+1 (650) 253-0000,+1 (650) 253-0000,fharris@google.com,4
18,Michelle,Brooks,,627 Broadway,New York,NY,USA,10012-2612,+1 (212) 221-3546,+1 (212) 221-4679,michelleb@aol.com,3
19,Tim,Goyer,Apple Inc.,1 Infinite Loop,Cupertino,CA,USA,95014,+1 (408) 996-1010,+1 (408) 996-1011,tgoyer@apple.com,3
23,John,Gordon,,69 Salem Street,Boston,MA,USA,2113,+1 (617) 522-1333,,johngordon22@yahoo.com,4
24,Frank,Ralston,,162 E Superior Street,Chicago,IL,USA,60611,+1 (312) 332-3232,,fralston@gmail.com,3


The results of UNION, INTERSECT and EXCEPT conform to the 'everything in SQL is a table' concept we learned in the SQL fundamentals course. The results of these operations can be used in subqueries and joined to other tables for more complex analysis. Let's look at a scenario where we'll need to join the results of a set operation to another table:

*Write a query that works out how many customers that are in the USA and have purchased more than \$90 are assigned to each sales support agent. For the purposes of this exercise, no two employees have the same name.*

- *Your result should have the following columns, in order:*

    - *employee_name - The first_name and last_name of the employee separated by a space, eg Luke Skywalker.*
    - *customers_usa_gt_90 - The number of customers assigned to that employee that are both from the USA and have have purchased more than $90 worth of tracks.*
- *The result should include all employees with the title "Sales Support Agent", but not employees with any other title.*
- *Order your results by the employee_name column.*

In [65]:
%%sql

WITH usa_90 AS (
    SELECT * from customer_usa
    INTERSECT
    SELECT * from customer_gt_90_dollars)

SELECT e.first_name || ' ' || e.last_name employee_name,
       COUNT(u.customer_id) customers_usa_gt_90
    FROM  employee e LEFT JOIN usa_90 u ON u.support_rep_id = e.employee_id
    GROUP BY e.employee_id
    HAVING e.title = 'Sales Support Agent'
    ORDER BY 1

 * sqlite:///chinook.db
Done.


employee_name,customers_usa_gt_90
Jane Peacock,0
Margaret Park,2
Steve Johnson,2


## Multiple named subqueries

When we learned about WITH, we said with clauses allow you to define one or more named subqueries, but we didn't show you the syntax for creating more than one named subquery. To do this, you use a single WITH clause and multiple, comma-separated alias/subquery pairs:

```
WITH
    [alias_name] AS ([subquery]),
    [alias_name_2] AS ([subquery_2]),
    [alias_name_3] AS ([subquery_3])

SELECT [main_query]
```

While each subquery can be independent, we can actually use the result of the first subquery in subsequent subqueries, and so on. **This can be a useful way of building readable complex queries.**

Let's look at a simple example where we create three named subqueries that build on each other.



In [66]:
%%sql

WITH
    usa AS
        (
        SELECT * FROM customer
        WHERE country = "USA"
        ),
    last_name_g AS
        (
         SELECT * FROM usa
         WHERE last_name LIKE "G%"
        ),
    state_ca AS
        (
        SELECT * FROM last_name_g
        WHERE state = "CA"
        )

SELECT
    first_name,
    last_name,
    country,
    state
FROM state_ca

 * sqlite:///chinook.db
Done.


first_name,last_name,country,state
Tim,Goyer,USA,CA


In reality, we'd usually write this as a single query using multiple AND operators in our WHERE clause, but it helps us demonstrate how multiple subqueries can be defined with a single WITH clause. Let's use a more 'real life' example to gather total sales data on customers from India.

*Write a query that uses multiple named subqueries in a WITH clause to gather total sales data on customers from India:*

- *The first named subquery should return all customers that are from India.*
- *The second named subquery should calculate the sum total for every customer.*
- *The main query should join the two named subqueries, resulting in the following final columns:*
    - *customer_name - The first_name and last_name of the customer, separated by a space, eg Luke Skywalker.*
    - *total_purchases - The total amount spent on purchases by that customer.*
- *The results should be sorted by the customer_name column in alphabetical order.*

In [68]:
%%sql

WITH c_india AS (SELECT * FROM customer WHERE country = 'India'),
     total_customer AS (SELECT c.customer_id, 
                               SUM(i.total) total
                        FROM customer c INNER JOIN invoice i ON c.customer_id = i.customer_id
                        GROUP BY 1)
    
SELECT c_i.first_name || ' ' || c_i.last_name customer_name,
       t_c.total total_purchases
    FROM c_india c_i INNER JOIN total_customer t_c ON c_i.customer_id = t_c.customer_id
    ORDER BY 1

 * sqlite:///chinook.db
Done.


customer_name,total_purchases
Manoj Pareek,111.87
Puja Srivastava,71.28


## Challenge: each country's best customer

It's time to bring everything we've learned in the course so far to write a complex query. This query will be a bit harder than anything we've written so far, so don't be discouraged if this challenge takes you a while. Write your query in steps, running it as you go to check on your results— this will make troubleshooting much easier.

We will be writing a query to **find the customer from each country that has spent the most money at our store**. In our database there are no 'ties' for best customer in each country, and we will ignore this case for the exercise.

Our final results will look like this. For expected results, we rounded to two decimal places; however, when running your query, don't worry about rounding the total_purchased column.

![img](img/result.png)


To help you out, the query you will write will include:

- One or more named subqueries defined in a WITH clause
- Aggregate functions like SUM() and MAX()
- Several INNER JOINs
- A subquery to define a column
- GROUP BY and ORDER BY clauses

Remember that there are multiple ways to write this query, and the list above is based on the approach we took in our solution.

*Create a query to find the customer from each country that has spent the most money at our store, ordered alphabetically by country. Your query should return the following columns, in order:*

- *country - The name of each country that we have a customer from.*
- *customer_name - The first_name and last_name of the customer from that country with the most total purchases, separated by a space, eg Luke Skywalker.*
- *total_purchased - The total dollar amount that customer has purchased.*

In [81]:
%%sql

WITH
    customer_country_purchases AS
        (
         SELECT
             i.customer_id,
             c.country,
             SUM(i.total) total_purchases
         FROM invoice i
         INNER JOIN customer c ON i.customer_id = c.customer_id
         GROUP BY 1
        ),
    country_max_purchase AS
        (
         SELECT
             country,
             MAX(total_purchases) max_purchase
         FROM customer_country_purchases
         GROUP BY 1
        ),
    country_best_customer AS
        (
         SELECT
            cmp.country,
            cmp.max_purchase,
            (
             SELECT ccp.customer_id
             FROM customer_country_purchases ccp
             WHERE ccp.country = cmp.country 
                AND cmp.max_purchase = ccp.total_purchases
            ) customer_id
         FROM country_max_purchase cmp
        )
       
                 
SELECT cbc.country country,
       c.first_name || " " || c.last_name customer_name,
       cbc.max_purchase total_purchased
FROM customer c INNER JOIN country_best_customer cbc 
      ON cbc.customer_id = c.customer_id
ORDER BY country ASC
     

 * sqlite:///chinook.db
Done.


country,customer_name,total_purchased
Argentina,Diego Gutiérrez,39.6
Australia,Mark Taylor,81.18
Austria,Astrid Gruber,69.3
Belgium,Daan Peeters,60.38999999999999
Brazil,Luís Gonçalves,108.89999999999998
Canada,François Tremblay,99.99
Chile,Luis Rojas,97.02
Czech Republic,František Wichterlová,144.54000000000002
Denmark,Kara Nielsen,37.61999999999999
Finland,Terhi Hämäläinen,79.2
