## Intermediate Parallel Computing

### Segment 6 of 6

# Exploration

>### I'm a coffeeholic; where can I drink an Espresso in St. Paul with a Mississippi river view?!


*Lesson Developer: Mohsen Ahmadkhani, ahmad178@umn.edu*

## Reminder
<a href="#/slide-2-0" class="navigate-right" style="background-color:blue;color:white;padding:8px;margin:2px;font-weight:bold;">Continue with the lesson</a>

<br>
</br>
<font size="+1">

By continuing with this lesson you are granting your permission to take part in this research study for the Hour of Cyberinfrastructure: Developing Cyber Literacy for GIScience project. In this study, you will be learning about cyberinfrastructure and related concepts using a web-based platform that will take approximately one hour per lesson. Participation in this study is voluntary.

Participants in this research must be 18 years or older. If you are under the age of 18 then please exit this webpage or navigate to another website such as the Hour of Code at https://hourofcode.com, which is designed for K-12 students.

If you are not interested in participating please exit the browser or navigate to this website: http://www.umn.edu. Your participation is voluntary and you are free to stop the lesson at any time.

For the full description please navigate to this website: <a href="../../gateway-lesson/gateway/gateway-1.ipynb">Gateway Lesson Research Study Permission</a>.

</font>

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')


# Demo Problem:
>## What cafes in St Paul, MN are in 50-meter neighborhood of Mississippi river?

To solve this problem, we will need the following datasets: 
1. Rivers and lake centerlines
2. Cafes in St. Paul


To access these datasets we will import the first dataset as we downloaded the previous segment. For the second dataset we will get it from OpenStreetMap liberary. 


First let's import the required packages and build our spatially enabled Spark Session. 

In [None]:
from pyspark.sql import SparkSession
from sedona.register import SedonaRegistrator
from sedona.utils import SedonaKryoRegistrator, KryoSerializer
import geopandas as gpd
from ipyleaflet import Map, GeoData

In [None]:
spark = SparkSession.\
    builder.\
    master("local[*]").\
    appName("Spatial Spark Demo").\
    config("spark.serializer", KryoSerializer.getName).\
    config("spark.kryo.registrator", SedonaKryoRegistrator.getName) .\
    config("spark.jars.packages", "org.apache.sedona:sedona-python-adapter-3.0_2.12:1.2.1-incubating,org.datasyslab:geotools-wrapper:1.1.0-25.2") .\
    getOrCreate()

SedonaRegistrator.registerAll(spark)
sc = spark.sparkContext


Let's read in the rivers' shapefile using geopandas.

In [None]:
rivers = gpd.read_file('ne_10m_rivers_lake_centerlines.shp')
rivers = rivers[['featurecla', 'name', 'name_alt', 'rivernum', 'geometry']]
rivers_layer = GeoData(geo_dataframe = rivers, style={'color':'blue'})

Now, we download the point dataset of coffee shops in the city of St. Paul using the `osmnx` package.  

In [None]:
import osmnx as ox 

place = 'St Paul, MN'
tags = {'amenity':'cafe', 'cuisine':'coffee-shop'}  
coffee_shops = ox.geometries_from_place(place, tags) 
coffee_shops = coffee_shops.to_crs('epsg:4326')[['name', 'geometry']]
coffee_shops = coffee_shops[coffee_shops['geometry'].type == 'Point']
coffee_shops.head()

Let's look at the coffee shops.

In [None]:
coffee_shops_layer = GeoData(geo_dataframe = coffee_shops, point_style={'color': 'black'})
mymap3 = Map(center=(44.96,-93.13), zoom = 11)
mymap3.add_layer(coffee_shops_layer)
mymap3

### Converting GeoPandas to Apache Sedona


In [None]:
rivers_spdf = spark.createDataFrame(rivers)
rivers_spdf.printSchema()

In [None]:
coffee_shops_spdf = spark.createDataFrame(coffee_shops)
coffee_shops_spdf.printSchema()

### Creating SQL Views

Next, we create two Views named `rivers` and `Coffee shops`. We will query from these two tables. 

In [None]:
rivers_spdf.createOrReplaceTempView("rivers")
coffee_shops_spdf.createOrReplaceTempView("cafes")

### Solution
One way to answer the question is the following spatial query. We first want to build a budder of 50 meters around each cafe and then see which of them intersect with the Mississippi river. 

>```sql
SELECT c.name cafe, c.geometry as geom
FROM cafes c, rivers r
WHERE r.name = 'Mississippi' and ST_INTERSECTS(ST_TRANSFORM(r.geometry, 'epsg:4326','epsg:2180'), ST_BUFFER(ST_TRANSFORM(c.geometry, 'epsg:4326','epsg:2180'), 50))


In this query we used `ST_TRANSFORM` function to reproject our data to a projection system that uses meters as length unit.  Then we applied `ST_BUFFER` function to build a buffer of 50 meters around each cafe. Finally, we used `ST_INTERSECTS` to see if the Mississipi river intersects with the buffers. Let's execute this query in the next cell. 

In [None]:
riverside_cafes = spark.sql("""
SELECT c.name cafe, c.geometry as geom
FROM cafes c, rivers r
WHERE r.name = 'Mississippi' and ST_INTERSECTS(ST_TRANSFORM(r.geometry, 'epsg:4326','epsg:2180'), ST_BUFFER(ST_TRANSFORM(c.geometry, 'epsg:4326','epsg:2180'), 50))
""")

riverside_cafes.show()

Sounds good! Now, you know which cafe to go when you crave for a triple-shot Espresso with an outlook of Mississippi river in St Paul!!

Do you want to see them on the map?! Run the cell below. 

The queried cafes are colored in red and the others remained black. Also, the Mississippi river is displayed in blue. 

In [None]:

riverside_cafesdf = riverside_cafes.toPandas()
riverside_cafesdf = gpd.GeoDataFrame(riverside_cafesdf, geometry="geom")
riverside_cafes_layer = GeoData(geo_dataframe = riverside_cafesdf, point_style={'color': 'red'})

coffee_shops_layer = GeoData(geo_dataframe = coffee_shops, point_style={'color': 'black'})

mymap4 = Map(center=(44.96,-93.13), zoom = 11)
mymap4.add_layer(coffee_shops_layer)
mymap4.add_layer(rivers_layer)
mymap4.add_layer(riverside_cafes_layer)
mymap4

## What Else?

Apache Sedona enables tens of other spatial functions like centroid, distance, transformation, buffer and many more that we cannot cover them all here. 

You can see a list of all available spatial functions at https://sedona.apache.org/api/sql/Function. 

# Congratulations!


**You have finished an Hour of CI!**


But, before you go ... 

1. Please fill out a very brief questionnaire to provide feedback and help us improve the Hour of CI lessons. It is fast and your feedback is very important to let us know what you learned and how we can improve the lessons in the future.
2. If you would like a certificate, then please type your name below and click "Create Certificate" and you will be presented with a PDF certificate.

<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="https://forms.gle/JUUBm76rLB8iYppN7">Take the questionnaire and provide feedback</a></font>

In [None]:

# This code cell loads the Interact Textbox that will ask users for their name
# Once they click "Create Certificate" then it will add their name to the certificate template
# And present them a PDF certificate
from PIL import Image
from PIL import ImageFont
from PIL import ImageDraw

from ipywidgets import interact

def make_cert(learner_name, lesson_name):
    cert_filename = 'hourofci_certificate.pdf'

    img = Image.open("../../supplementary/hci-certificate-template.jpg")
    draw = ImageDraw.Draw(img)

    cert_font   = ImageFont.truetype('../../supplementary/cruft.ttf', 150)
    cert_fontsm = ImageFont.truetype('../../supplementary/cruft.ttf', 80)
    
    _,_,w,h = cert_font.getbbox(learner_name)  
    draw.text( xy = (1650-w/2,1100-h/2), text = learner_name, fill=(0,0,0),font=cert_font)
    
    _,_,w,h = cert_fontsm.getbbox(lesson_name)
    draw.text( xy = (1650-w/2,1100-h/2 + 750), text = lesson_name, fill=(0,0,0),font=cert_fontsm)
    
    img.save(cert_filename, "PDF", resolution=100.0)   
    return cert_filename


interact_cert=interact.options(manual=True, manual_name="Create Certificate")

@interact_cert(name="Your Name")
def f(name):
    print("Congratulations",name)
    filename = make_cert(name, 'Intermediate Parallel Computing')
    print("Download your certificate by clicking the link below.")


<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="hourofci_certificate.pdf?download=1" download="hourofci_certificate.pdf">Download your certificate</a></font>