# Intermediate Lesson on Geospatial Data 

## Spatial Databases

<strong>Lesson Developers:</strong> Jayakrishnan Ajayakumar, Shana Crosson, Mohsen Ahmadkhani

#### Part 4 of 5

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display, clear_output
from ipywidgets import interactive, Textarea, HBox, Button, Layout
import ipywidgets as widgets
import sqlite3
import spatialite
import pandas as pd
import geopandas as gpd

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
# sys.path.append('supplementary')
import hourofci
try:
    import os
    os.chdir('supplementary')
except:
    pass

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')

 ## Making the database spatially-enabled

To perform spatial queries we should **spatially enable our database**. This is exactly what a spatial database is built for. 

> A spatial database is a database that has been extended to include spatial data that represents objects defined in a geographic space, along with tools for querying and analyzing such data.

Remember: 
<ul>
    <li>A spatial database is a database, so we can still leverage all the functionalities of a traditional non-spatial database. 
    <li>A spatial database includes a new data type called <b>Geometry</b> that enables spatial operations on and between objects (i.e., points, lines, and/or polygons). 
</ul>    

Let's look at three example relations (tables) that has spatial data in the form of geometry. 

<table style="background: #fff; font-size:25px; text-align:left">
    <tr>
        <td style="background: #fff; font-size:25px; text-align:left">As shown below, the three tables have a special column with <b>geometry</b> data type. 
    <br/>They maintain spatial information that is non-readable for human as they look like series of numbers, symbols, and letters like the column on the right. <br/>
        <i style="font-size:20px">*Please note that the graphics in geometry column of the three tables above are for illustration. </i></td>
        <td style="background: #fff; font-size:25px; text-align:left">  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </td>
<td><img src = "supplementary/images/geom_col.jpg" width = "500px"></td>
    </tr>
</table>
<center><img src = "supplementary/images/geometry_types.png" width = "600px" height = "100px"></center>


### The three basic geometry types are 

1. **Points** (schools, shooting, earthquake,your location)
2. **Lines** (rivers, streets, roads, railway lines)
3. **Polygon** (countries, states, census tracts, zip codes)


Apart from supporting geometry types, spatial databases also support operations on geometries (e.g., intersection of two polygons).

Queries that involve geometry types are called spatial queries which we are going to be covered in this lesson. 

## Spatial Queries

> **Spatial queries are queries in a spatial database** that can be answered on the **basis of geometric information only,** i.e., the spatial position and extent of the objects involved.

Spatial functions start with <b> `ST_` </b> prefix and perform specific spatial operations. 

There are many ST_ functions in spatial queries that we will introduce a few of them here and in the next segment. 

But before we start, let's take a look at our `us_states` table that will be used in this lesson. 

### Fetching and Visualizing our data 
In our spatial database there is a table holding information of the contiguous US states named `us_states`. 

In the module below: <br/>
<ul>
    <li>
        To fetch all rows from it as a <b>Dataframe</b> click the <i>Execute!</i> button.
    </li>
    <li>
        To <b>plot</b> it click the <i>Plot!</i> button. 
    </li>
</ul>


**Please note that in this query we use `ST_AsBinary(geom)` instead of `geom`. This is a function to translate the geometry column to something readable for python. You won't use it so often, don't get intimidated!!!*


In [None]:
q1 = """SELECT pk_uid, statefp, geoid, name, aland , awater, ST_AsBinary(geom) as geom 
FROM us_states
"""
inp1 = Textarea(description='<b>Query:</b>', value= q1, layout=Layout(width='40%', height='120px'))
button1 = Button(description="Execute!")
plot1 = Button(description="Plot!")
Box1 = HBox([inp1, button1, plot1])

db = spatialite.connect('databases/spatialDB.sqlite')

def execute_query1(b): 
    clear_output()
    button1.on_click(execute_query1)
    plot1.on_click(plot_query1)
    display(Box1)
    print('Please wait...')
    gdff1 = gpd.GeoDataFrame.from_postgis(inp1.value, db,crs = 'EPSG:3857')
    clear_output()
    button1.on_click(execute_query1)
    plot1.on_click(plot_query1)
    display(Box1)
    return display(gdff1)

def plot_query1(b): 
    clear_output()
    button1.on_click(execute_query1)
    plot1.on_click(plot_query1)
    display(Box1)
    print('Please wait...')
    gdff1 = gpd.GeoDataFrame.from_postgis(inp1.value, db,crs = 'EPSG:3857')
    clear_output()
    button1.on_click(execute_query1)
    plot1.on_click(plot_query1)
    display(Box1)
    return display(gdff1.plot())

button1.on_click(execute_query1)
plot1.on_click(plot_query1)
display(Box1)


### Calculate Area
The function **st_area(geom)** calculates the area for each polygon object (e.g., a US state) in the table.

The syntax is as follows:

```SQL
select st_area(geom)
from tableName
where condition
```

In the next slide, we use this function in action!


#### Query: What are the area of the five largest states in the contiguos US in squared Kilometers?
Here is the query:

```sql
SELECT name, ST_AREA(ST_TRANSFORM(geom,3857))/1000000 AS Area_Squared_KMs 
FROM us_states 
ORDER BY Area_Squared_KMs DESC 
LIMIT 5
```

#### Let's dismantle it!
Note that again to calculate the area in **squared meters** we need to transform the geometry (`geom`) to Web Mercator projection system using the `st_transform(geom, 3857)` function.

In this example, we 
<ul>
    <li>
    First transfer the geometry (geom) to Web Mercator projection system using the <b>st_transform(geom, 3857)</b> function to set <i><b>meters</b></i> as the measurement unit.
    </li>
    <li>
    Calculate the area using <b>st_area(geom)</b> method. This will return area values in squared meters which will be too large to read! So, we divide it by 1000000 to get the values in squared Kilometers. 
    </li>
    <li>
    Use <b>ORDER BY</b> clause to sort the results by the area in descending order (from largest to smallest).
    </li>
    <li>
    Filter the top five rows using <b>LIMIT 5</b> clause to get the five largest states in the contiguous US. 
    </li>
</ul>

EASY PEASY!!!

In [None]:

inp8 = Textarea(description='<b>Query:</b>', value="SELECT name, ST_AREA(ST_TRANSFORM(geom,3857))/1000000 AS Area_Squared_KMs \nFROM us_states \nORDER BY Area_Squared_KMs DESC \nLIMIT 5" , layout=Layout(width='40%', height='120px'))
button8 = Button(description="Execute!")          
Box8 = HBox([inp8, button8])

db = spatialite.connect('databases/spatialDB.sqlite')


def execute_query8(b):
    clear_output()
    button8.on_click(execute_query8)
    display(Box8)
    print('Please wait...')
    table18 = pd.read_sql_query(inp8.value,db)
    clear_output()
    button8.on_click(execute_query8)
    display(Box8)
    return display(table18)

button8.on_click(execute_query8)
display(Box8)



## Generating Centroids
Calculating the centroid of a set of polygons is indeed a **spatial** operation. 

Performing this operation is as easy as using `st_centroid(geom)` function! This function gets the geometry column as the only parameter and returns the centroids (points). 

In the example below you can generate the centroids of the US states yourself! 

In [None]:

q2 = "select geoid, name, ST_AsBinary(ST_CENTROID(geom)) as geom from us_states"

inp2 = Textarea(description='<b>Query:</b>', value= q2, layout=Layout(width='40%', height='120px'))
button2 = Button(description="Execute!")
plot2 = Button(description="Plot!")
Box2 = HBox([inp2, button2, plot2])

db = spatialite.connect('databases/spatialDB.sqlite')

def execute_query2(b): 
    clear_output()
    button2.on_click(execute_query2)
    plot2.on_click(plot_query2)
    display(Box2)
    print('Please wait...')
    gdff2 = gpd.GeoDataFrame.from_postgis(inp2.value, db,crs = 'EPSG:3857')
    clear_output()
    button2.on_click(execute_query2)
    plot2.on_click(plot_query2)
    display(Box2)
    return display(gdff2)

def plot_query2(b): 
    clear_output()
    button2.on_click(execute_query2)
    plot2.on_click(plot_query2)
    display(Box2)
    print('Please wait...')
    gdff2 = gpd.GeoDataFrame.from_postgis(inp2.value, db,crs = 'EPSG:3857')
    clear_output()
    button2.on_click(execute_query2)
    plot2.on_click(plot_query2)
    display(Box2)
    return display(gdff2.plot())

button2.on_click(execute_query2)
plot2.on_click(plot_query2)
display(Box2)






### Within a Distance Queries

With in distance queries are used to find out geometrical objects that are with in a specific distance of a particular geometrical object.


<img src = "supplementary/images/withindistance.png" width = "500px">

<img src = "supplementary/images/distance_within_example.png" width = "500px">

#### To make queries of such type, we need to learn buffer function first! 

## Generating Buffers

<img src = "supplementary/images/buffer.png" width = "600px"> 

Creating buffers of any radius is another common **spatial** operation that has a built-in function called `st_buffer(geom, radius)`. 
This function gets the geometry column along with the radius of the buffer. The unit of the radius depends on the coordinate system used. 

The following query will make a buffer of 30 KMs around the state of Minnesota:

```sql
select st_buffer(st_transform(geom, 3857), 30000) as geom 
from us_states 
where name='Minnesota'
```
Notice that in this query we use `st_transform(geom, 3857)` function to transform the geometry to be represented in the Web Mercator spatial reference system that uses **meters** as the length unit.

The number 3857 is the spatial reference system identifier of <a href="https://en.wikipedia.org/wiki/Web_Mercator_projection">Web Mercator projection system</a>. 

Run this query in the next slide and change the buffer radius to see how it looks! 

In [None]:

q3 = "select ST_AsBinary(st_buffer(ST_Transform(geom, 3857), 30000)) as geom from us_states where name='Minnesota'"


inp3 = Textarea(description='<b>Query:</b>', value= q3, layout=Layout(width='40%', height='120px'))
button3 = Button(description="Execute!")
plot3 = Button(description="Plot!")
Box3 = HBox([inp3, button3, plot3])

db = spatialite.connect('databases/spatialDB.sqlite')

def execute_query3(b): 
    clear_output()
    button3.on_click(execute_query3)
    plot3.on_click(plot_query3)
    display(Box3)
    print('Please wait...')
    gdff3 = gpd.GeoDataFrame.from_postgis(inp3.value, db,crs = 'EPSG:3857')
    clear_output()
    button3.on_click(execute_query3)
    plot3.on_click(plot_query3)
    display(Box3)
    return display(gdff3)

def plot_query3(b): 
    clear_output()
    button3.on_click(execute_query3)
    plot3.on_click(plot_query3)
    display(Box3)
    print('Please wait...')
    gdff3 = gpd.GeoDataFrame.from_postgis(inp3.value, db,crs = 'EPSG:3857')
    clear_output()
    button3.on_click(execute_query3)
    plot3.on_click(plot_query3)
    display(Box3)
    return display(gdff3.plot())

button3.on_click(execute_query3)
plot3.on_click(plot_query3)
display(Box3)



## Challenge!
#### Can you make a buffer of 2 KMs around the centroid of Minnesota?

You can check your solution by clicking on *Reveal the SQL code!* button!

In [None]:

q4 = "SELECT \nFROM \nWHERE"


inp4 = Textarea(description='<b>Query:</b>', value= q4, layout=Layout(width='40%', height='120px'))
button4 = Button(description="Execute!")
plot4 = Button(description="Plot!")
Box4 = HBox([inp4, button4, plot4])

db = spatialite.connect('databases/spatialDB.sqlite')

def execute_query4(b): 
    clear_output()
    button4.on_click(execute_query4)
    plot4.on_click(plot_query4)
    display(Box4)
    print('Please wait...')
    if inp4.value != q4:
        gdff4 = gpd.GeoDataFrame.from_postgis(inp4.value, db,crs = 'EPSG:3857')
        clear_output()
        button4.on_click(execute_query4)
        plot4.on_click(plot_query4)
        display(Box4)
        return display(gdff4)
    else:
        clear_output()
        button4.on_click(execute_query4)
        plot4.on_click(plot_query4)
        display(Box4)
        return print('Wrong query! Please try again!')

def plot_query4(b): 
    clear_output()
    button4.on_click(execute_query4)
    plot4.on_click(plot_query4)
    display(Box4)
    print('Please wait...')
    if inp4.value != q4:
        gdff4 = gpd.GeoDataFrame.from_postgis(inp4.value, db,crs = 'EPSG:3857')
        clear_output()
        button4.on_click(execute_query4)
        plot4.on_click(plot_query4)
        display(Box4)
        return display(gdff4.plot())
    else:
        clear_output()
        button4.on_click(execute_query4)
        plot4.on_click(plot_query4)
        display(Box4)
        return print('Wrong query! Please try again!')

button4.on_click(execute_query4)
plot4.on_click(plot_query4)
display(Box4)


In [None]:
button5 = Button(description="Reveal the SQL code!")
Box5 = HBox([button5])
query5 = "SELECT ST_AsBinary(st_buffer(st_centroid(ST_Transform(geom, 3857)), 2000)) as geom \nFROM us_states \nWHERE name='Minnesota' or name='California'"

def on_click5(b):
    clear_output()
    display(Box5)
    return print(query5)

button5.on_click(on_click5)
display(Box5)

Click the link below to move on


<br>
<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="gd-5.ipynb">Click here to go to the next notebook.</a></font>