# Intermediate Lesson on Geospatial Data 

## Structured Query Language (SQL)

<strong>Lesson Developers:</strong> Jayakrishnan Ajayakumar, Shana Crosson, Mohsen Ahmadkhani

#### Part 3 of 5

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
# sys.path.append('supplementary')
import hourofci
try:
    import os
    os.chdir('supplementary')
except:
    pass

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')

## But how do we talk to the DBMS?
We now know that DBMS acts as an interface between us and the database and helps to convert our request so that the database can understand it. We use a special language called **SQL** to fetch data from the database. 

>**Structured Query Language (SQL)** is a programming language designed to get information out of and put it into a relational database.

<img src = "supplementary/images/sql_funny.png" width = "600px">


Run the cell below to see two databases and corresponding tables. Use the dropdowns to select the database and tables. What do you notice about the data types and data content in each table?

In [None]:
import displaydatabases
from questiondisplay import QueryWindow
disp = displaydatabases.Display()
disp.displayDatabases

## Talking to a Database using SQL

### 1. Selecting records from one or more tables in a database

>**Select statement** is used to **retrieve data from a Database** 

The general **syntax*** for **select statement** is 

```mysql
select column1,column2..columnN from table_name
```

Where column1, column2 are the columns that you want to select from the table (with name table_name).

*Syntax of a programming language is a set of punctuation rules that defines what the combination of symbols and characters means to the computer. 

#### 1.1 Select All Columns from a Table

The syntax for selecting all columns from a table is

```mysql
select * from table_name
```
The <b>*</b> symbol indicates that we would want all the columns from the table. 

Let's look at a concrete example. 




<img src = "supplementary/images/SELECT_STAR.png" width = "400px">

We have a table BigCats which contains five records (rows) and has four columns. The query
```mysql
select * from BigCats
```
will retrieve all the rows and columns from the BigCats table

Now you can try out exercises based on the select * query below

**1. Select all columns from actor table**

In [None]:
from IPython.display import display
import displaydatabases
from questiondisplay import QueryWindow

display(QueryWindow(1).display())

**2. Select all columns from staff table**

In [None]:
display(QueryWindow(2).display())

#### 1.2 Select subset of columns from a table

The syntax for selecting subset of columns from a table is

```mysql
select column1,column2...columnN from table_name
```
Let's look at a concrete example.


<img src = "supplementary/images/SELECT_COLUMN.png" width = "500px">

The query
```mysql
select name from BigCats
```
will retrieve the single column 'name' and the 5 records associated with it.

Try out more examples as given in the next slides. 

**3. Select staff_id,first_name and last_name from staff table**

In [None]:
display(QueryWindow(3).display())

**4. Select first_name and last_name from actor table**

In [None]:
display(QueryWindow(4).display())

#### 1.3 Select distinct values from a column

The **syntax** for **select distinct statement** is 

```mysql
select distinct column1,column2..columnN from table_name
```

Where column1, column2, columnN are the columns that you want to select from the table (with name table_name) and only distinct values for column1 will be selected

Let's see a concrete example.


<img src = "supplementary/images/distinct.png" width = "500px">

The query
```mysql
select distinct state from Salary
```
will retrieve the unique states from the state column

Try out following examples!

**5. Select the unique set of cities from employees table**

In [None]:
display(QueryWindow(5).display())

### 2. Filter Records with where clause

The **where clause** is used to extract records that meets a specified condition. It is one of the most powerful feature of an SQL query.

The syntax for where clause is 

```mysql
SELECT column1, column2, ..columnN
FROM table_name
WHERE condition;
```

Let's look at some concrete examples


<img src = "supplementary/images/whereI.png" width = "500px">
The query


```mysql
select * from Salary where salary>200000
```
will retrieve all rows and columns (since we are using \*) that matches the criteria that salary should be greater than 200000 (which is 2 rows)

Let's look at another example


<img src = "supplementary/images/whereII.png" width = "500px">

The query
```mysql
select * from Salary where Job_Title = 'Doctor'
```
will retrieve all rows and columns (since we are using \*) that matches the criteria that Job_Title is 'Doctor'. 

**Note** since Job_Title is of data type Text, we use single quotes while querying. Since salary is numercical we don't need quotes. 

Try out more examples as given below

**6. From film table select films having length more than 100 minutes**

In [None]:
display(QueryWindow(6).display())

#### 2.1 AND, OR, NOT for filtering based on multiple conditions 

The AND and OR operators are used for filtering records based on more than one condition.  <br/>
The AND operator displays a record if **all the conditions separated by AND are True**

The **syntax for AND operator** is 

```mysql
SELECT column1, column2, ..columnN
FROM table_name
WHERE condition1 AND condition2 AND condition3...
```

The OR operator displays a record if **any of the conditions separated by OR is True**.

The **syntax for OR operator** is 

```mysql
SELECT column1, column2, ..columnN
FROM table_name
WHERE condition1 OR condition2 OR condition3...
```

The NOT operator displays a record if the **condition(s) is NOT TRUE**.

The **syntax for NOT operator** is 

```mysql
SELECT column1, column2, ..columnN
FROM table_name
WHERE NOT condition;
```
Lets look at some examples


<img src = "supplementary/images/AND.png" width = "700px">

The query
```mysql
select * from Salary where state = 'New York' and salary>200000
```
will retrieve all rows and columns (since we are using \*) that matches the criteria that state is 'New York' (again this is Text) and has a salary greater than 200000 (we have only one such record)


<img src = "supplementary/images/OR.png" width = "700px">
In this example we are using the OR operator

The query
```mysql
select * from Salary where state = 'New York' or state = 'Ohio'
```
will retrieve all rows and columns that matches the criteria that state is 'New York'  **OR** state is 'Ohio'. We have three such records (2 from New York and 1 from Ohio)


<img src = "supplementary/images/NOT.png" width = "700px">

In this example we are using the NOT operator

The query
```mysql
select * from Salary where not Job_Title = 'Doctor'
```
will retrieve all rows and columns that matches the criteria that Job_Title is not 'Doctor'. We have three such records with Job_Title not as Doctor.

Try out the examples given below

**9. Retrieve employees with title as Sales Manager *and* city as Calgary**

In [None]:
display(QueryWindow(9).display())

**10. Retrieve all films from film table with length greater than 100 minutes and rating equals to PG**

In [None]:
display(QueryWindow(10).display())

**11. Retrieve all films from film table with length greater than 100 minutes *or* rating equals to PG**

In [None]:
display(QueryWindow(11).display())

**12. Select all invoices from invoices table where BillingCountry is either Canada and USA**

In [None]:
display(QueryWindow(12).display())

**13. Select all invoices from invoices table where Total is greater than 1 and less than 5 or total greater than 10 and less than 100**

In [None]:
display(QueryWindow(13).display())

**14. Select all invoices from invoices table that have BillingCountry other than Canada**

In [None]:
display(QueryWindow(14).display())

### 3. Order records using Order By

Order by keyword is used to **sort query results by ascending or descending**

The syntax for Order by is 

```mysql
SELECT column1, column2, ..columnN
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
```

Lets look at an example


<img src = "supplementary/images/desc.png" width = "700px">

The query
```mysql
select * from Salary order by salary desc
```
will retrieve all rows and columns and order them by salary in decreasing order (highest first).

If you want to order the table by salary in ascending order (lowest first), the query will be

```mysql
select * from Salary order by salary asc
```

or you can even write

```mysql
select * from Salary order by salary
```

As by default the ordering will be in ascending.

Try out the examples given below

**15. Select first_name and last_name from actor table and sort by actor first_name ascending**

In [None]:
display(QueryWindow(15).display())

**16. Sort payment table by amount in descending order**

In [None]:
display(QueryWindow(16).display())

**17. Select first_name and last_name from actor table and sort by actor first_name ascending and actor last_name descending**

In [None]:
display(QueryWindow(17).display())

### 4. Finding Minimum, Maximum, Average, Sum and Count for Columns

Aggregate functions **min(), max(), avg(), sum(), and count()** can be used to find the Minimum, Maximum, Average, Sum and Count for a selected column

Syntax for **min()**

```mysql
SELECT MIN(column_name) FROM table_name
```

Syntax for **max()**

```mysql
SELECT MAX(column_name) FROM table_name
```

Syntax for **avg()**

```mysql
SELECT AVG(column_name) FROM table_name
```

Syntax for **sum()**

```mysql
SELECT sum(column_name) FROM table_name
```

Syntax for **count()**

```mysql
SELECT count(column_name) FROM table_name
```

Let's look at some concrete examples


<img src = "supplementary/images/count.png" width = "700px">

count(\*) returns the number of rows returned by the select statement. In this example the query

```mysql
SELECT count(*) FROM Salary where salary>200000
```

will return a column with name count(\*) with a single row having value of 2 (because there are only 2 records with salary greater than 200000). If the query was

```mysql
SELECT count(*) FROM Salary where state = 'Ohio'
```

the value will be 1

Now let's look at an example for min() function


<img src = "supplementary/images/min.png" width = "700px">

This query

```mysql
SELECT min(salary) FROM Salary
```

Returns the minimum value from the salary column in the Salary table (which is 110000).


<img src = "supplementary/images/max.png" width = "700px">
Similarly the max() function returns the maximum value for the particular column (salary) from the table (Salary) 


<img src = "supplementary/images/avg.png" width = "700px">

As the name implies the avg() function returns the average of the values for the particular column. For example the query

```mysql
SELECT avg(salary) FROM Salary where Job_Title = 'Computer Scientist'
```

returns the average salary where the Job_Title is 'Computer Scientist'


<img src = "supplementary/images/sum.png" width = "700px">

The sum() function returns the sum of the values in the particular column. The query

```mysql
SELECT sum(salary) FROM Salary where Job_Title = 'Doctor'
```
returns the sum of values in the salary column where the Job_Title is 'Doctor'

Try out the examples to get a better understanding of the aggregate functions

**18. Select minimum amount from payment table.**

In [None]:
display(QueryWindow(18).display())

### 5. Grouping Records together by Group By statement

The Group By statement is used to **arrange identical data into groups**. The group by clause follows the where clause (if it's present) and precedes the order by clause (if it's present). For example you want to aggregate the total number of invoices by Country, or you want to get the count of employees with different Title (how many General Manager, Sales Manager etc)

The Group By statement is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or more columns.

The syntax for Group By statement

```mysql
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s)
```

The where condition and order by are optional and depends up on the use-case.

Let's look at some examples


<img src = "supplementary/images/groupbycount.png" width = "700px">

In this example we are grouping by the column State and then applying the aggregate function count()

The query 

```mysql
SELECT State,count(*) from Salary group by State
```

returns the count of each state in the table. You can think of count(\*) as being applied to each set of states grouped together. We have four different states with two states having count 2 and 2 states having count of 1. 

**Point to Ponder**

This will give just the state name and the count. What if we were able to Map the counts. Then we could see whether there are any spatial patterns. This is the power of spatial data. The ability to map gives you the flexibility to look at the data from a completely new perspective. We will look at how this is possible in the next chapter.


<img src = "supplementary/images/groupbyavg.png" width = "700px">

In this query

```mysql
SELECT Job_Title,avg(salary) from Salary group by Job_Title
```
we are calculating the average salary for each Job_Title. We have two Job_Title (that of Computer Scientist and Doctor).


<img src = "supplementary/images/groupbysum.png" width = "700px">

This query

```mysql
SELECT Job_Title,sum(salary) from Salary group by Job_Title
```

is very similar to the avg() query, but here we are calculating the sum of salaries (which might not be a useful result in reality).


<img src = "supplementary/images/groupbymax.png" width = "700px">

By this query
```mysql
SELECT Job_Title,max(salary) from Salary group by Job_Title
```
we select the maximum salary for each Job_Title.


<img src = "supplementary/images/groupbymin.png" width = "700px">

and by this query

```mysql
SELECT Job_Title,min(salary) from Salary group by Job_Title
```
we select the minimum salary for each Job_Title.

Now you can try out the queries given below 

**26. Select sum Total of invoices for each BillingCountry**

In [None]:
display(QueryWindow(26).display())

**27. Select number of invoices for each BillingCountry**

In [None]:
display(QueryWindow(27).display())

**28. select average length of films from film table grouped on rating.**

In [None]:
display(QueryWindow(28).display())

**29. Select maximum rental_rate of films from film table grouped on rating.**

In [None]:
display(QueryWindow(29).display())

**30. Select average rental_rate for each rating for each release_year.**

In [None]:
display(QueryWindow(30).display())

**31. Select total number of invoices for each BillingCity for the BillingCountry Germany.**

In [None]:
display(QueryWindow(31).display())

**32. Select sum total of Total for invoices for each BillingCity for the BillingCountry Germany and order the records by the sum total in descending order.**

In [None]:
display(QueryWindow(32).display())

### 6. Aliases for providing Temporary Name

Aliases are used to give a **table or column in a table a temporary name**. Most of the time aliases are used to make the **query more readable**. Aliases **only exists until the query is running**.  

The syntax for column alias is 

```mysql
SELECT column_name AS alias_name
FROM table_name;
```

And syntax for table alias is 

```mysql
SELECT column_name
FROM table_name as alias_name;
```

Let's look at some examples


<img src = "supplementary/images/columnalias.png" width = "700px">

In this query

```mysql
SELECT Job_Title,min(salary) as MIN_SALARY from Salary group by Job_Title
```
we are using an alias name for the column min(salary) (which is part of the result) as MIN_SALARY. Eventhough it doesn't change anything to the result, it makes the result more readable. 

Next we look at how we can use table alias in an effective way. 


<img src = "supplementary/images/groupbyavg.png" width = "700px">

In this query 

```mysql
SELECT s.Job_Title,s.salary from Salary as s where s.state
```
we are setting an alias name 's' for the table Salary. Notice how we use the '.' operator to access the columns. This is particularly useful when we are joining multiple tables (about which we will learn in the last section). 

Try out the examples given below

**33. Select sum total of Total for invoices for each BillingCity for the BillingCountry Germany and order the records by the sum total in descending order. Name the sum total of Total as TotalAmount**

In [None]:
display(QueryWindow(33).display())

**34. Select total number of invoices for each BillingCity for the BillingCountry Germany. Name the total number of invoices column as TotalInvoices**

In [None]:
display(QueryWindow(34).display())

### 7. Join for Combining Multiple Tables

A join clause is used to **combine records from multiple tables using related columns between them.** It is one of the most powerful concepts in a relational database (joining based on relations). There are four ways of combining tables

1. Inner Join
2. Left Join
3. Right Join
4. Outer Join

We will be covering only Inner Join here

#### 7.1 Inner Join
For inner join **records that have matching values in both tables will only be retrieved**.  

![innerjoin](supplementary/images/innerjoin.png)

The syntax for inner join is 

```mysql
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name
```

Inner join can also be written without an explicit inner join clause

```mysql
SELECT column_name(s)
FROM table1,table2
where table1.column_name = table2.column_name
```

We ill be mostly using the second form of inner join for this section.

Let's look at some examples based on the customer and orders table that we have seen in the Foreign key section


<img src = "supplementary/images/join.png" width = "500px">

1. Select customers name along with the number of orders they made.

```mysql
select c.name,count(*) as total_orders from customer as c,orders as o where c.CustomerId = o.CustId group by c.name
```

This should return


<img src = "supplementary/images/join_res1.png" width = "300px">


To get a better understanding let's see a pictorial representation of how this all works.

When you are joining the two tables using a **where clause (c.CustomerId = o.CustId)** we are getting a result with the matching records from both the tables

<img src = "supplementary/images/join_partials.png" width = "1000px">
Now when you apply the group by name and count(\*) you get the total number of records for each unique name. 

If you simply write the query

```mysql
select * from customer as c,orders as o where c.CustomerId = o.CustId
```

you will get the table just shown above.

Now what if we want to count the orders based on states

2. Select state along with total number of orders from each state

The query 
```mysql
select c.state,count(*) as total_orders from customer as c,orders as o where c.CustomerId = o.CustId group by c.state
```

should give 


<img src = "supplementary/images/join2.png" width = "300px">
Now one final example based on the tables shown in the next slide


Look at the syntax for the query. We are using two tables customer and orders. We seperate them by ','. We also know that CustomerId is the primary key for Customer table and its a foreign key in orders table with the name CustId. We use this relationship in our **where** clause to join the tow tables. Notice that we have used alias name 'c' and 'o' for customer and orders table respectively. 
<img src = "supplementary/images/kinder_shooting.png" width = "600px">
3. Select shool names and the number of shootings in the same neighborhood as the schools

We can join these two tables based on the Neighborhood column common in both the tables

The query 
```mysql
select k.name,count(*) as total_shoootings from Kindergarten as k,Shootings as s where k.Neighborhood = s.Neighborhood group by k.name
```
 should give 
 
 <img src = "supplementary/images/kinder_shooting_results.png" width = "300px">
 Notice that the kindergarten 'Kindercare' is not in the result. This is because there is no matching record for the neighborhood 'Shaker Square' in the Shootings table.
 
 Now try out the examples given below to strengthen your understanding on Joins.

**35. Display the city along with country from city and country table.**

In [None]:
display(QueryWindow(35).display())

**36. Select title and actor first_name and actor last_name for all the films from film, actor and film_actor tables**

In [None]:
display(QueryWindow(36).display())

**37. Select title of all English movies from film table**

In [None]:
display(QueryWindow(37).display())

**38. Select artist Name and the total number of albums composed by them as TotalAlbums**

In [None]:
display(QueryWindow(38).display())

**39. select customers first_name and last_name and the total amount they spend as TotalAmount. Sort the results by total amount in descending order.**

In [None]:
display(QueryWindow(39).display())

**40. select customers first_name and last_name and the total number of rentals they had as TotalRentals. Sort the results by total rentals in descending order.**

In [None]:
display(QueryWindow(40).display())

Now that you have a good idea about writing SQL queries, what do you think is missing!!!

Let's again look at the Kindergarten and Shootings tables

<img src = "supplementary/images/kinder_shooting.png" width = "600px">
What if we want to ask questions such as

1. Which are the schools that have atleast a single shooting event **with in 100 meters** of its location

2. How many shooting events **with in 1000 meters** of each schools. 

3. How many shooting events **with in each census tracts**.

4. How many shooting events **with in each zip codes**.

5. The **distance** to the closest shooting event for each schools.

Can we answer such questions based on our tables. We seems to have a location field, but currently its just text and we can't ask any spatial questions using Text data type. So how will we solve this.

The answer to these questions will be clear when we learn about spatial databases and spatial queries in the next section. So stay tuned!!!!! 

Click the link below to move on


<br>
<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="gd-4.ipynb">Click here to go to the next notebook.</a></font>