In [2]:
import pandas as pd 
import geopandas as gpd

# In 1 hour around the world of spatial data

What are we going to talk about? 
- Spatial data 101 
- Spatial data wrangling 
- Use cases 

What we are not going to talk about? 
- GIS systems 
- Spatial data science 
- Advanced stuff (geocoding, routing) 

# Introduction: What is spatial data? 

Data associated with spatial component - objects referencing position on the earth's surface. 

Spatial data consists of a spatial information (**where**), attributes (**what**) and sometimes temporal information (**when**).

Examples: 
- satellite imagery 
- topography
- power lines and substations 
- exercise tracking (e.g. running, biking)

# Types of spatial data

Depending on the use case geographic data can be represented in two ways: 

- raster data (grid)
 
![RasterData](./imgs/raster.PNG)

- vector data (point representation) 

![Vector Data](./imgs/vector.PNG)

## Raster representation

Geographic space is divided into cells. 

Each cell is geographically located and receives attributes or properties. 

Often generated from satellite or flyover sensor data.

Formats: jpg, png, .tiff

![Raster_Example](imgs/raster_example.PNG)


## Vector representation

Points with X, Y, (Z) - coordinates which can be connected to more complex geometric types like Polygons or Lines. 

Examples: Shapefile (.shp), GeoJSON (.geojson), GeoPackage (.gpkg)

![Vector Example](imgs/vector_example.png)


In [5]:
gdf_substations = gpd.read_file("data/substation.geojson")
gdf_substations.sample(20)

Unnamed: 0,osm_id,category,longitude,latitude,city,geometry
386,node/6310881189,substation,7.000345,51.388091,Essen,POINT (360865.054 5694880.986)
99,node/1251811580,substation,7.083524,51.445307,Essen,POINT (366818.497 5701088.737)
246,node/3805700299,substation,6.981876,51.4568,Essen,POINT (359790.533 5702556.398)
442,node/9319154501,substation,7.025626,51.45098,Essen,POINT (362812.337 5701826.395)
457,way/59925390,substation,6.99607,51.514257,Essen,"POLYGON ((360704.497 5708993.154, 361040.214 5..."
424,node/7880732562,substation,7.098473,51.447314,Essen,POINT (367863.063 5701284.848)
516,way/422075523,substation,7.01437,51.460608,Essen,"POLYGON ((362054.324 5702924.205, 362063.536 5..."
83,node/837693800,substation,7.037883,51.454251,Essen,POINT (363673.711 5702167.278)
389,node/6370454785,substation,7.010086,51.399231,Essen,POINT (361576.340 5696101.372)
343,node/4902669563,substation,6.949137,51.362595,Essen,POINT (357222.990 5692144.451)


# Geometric representation

- Points, Polygons, LineStrings, (Multi-, GeometryCollections) 


# Projections and coordinate reference systems

Geospatial data is multidimensional (X, Y, *Z* - coordinates)

# Spatial data wrangling

## Geopandas 

https://geopandas.org/en/stable/docs/user_guide/io.html

## PostGiS

https://postgis.net/

```
CREATE EXTENSION postgis;
-- enable raster support (for 3+)
CREATE EXTENSION postgis_raster;
```

## Spatial Relationships

## Spatial Indices

Makes answering questions like 
- is X **inside of** Y 
- does X **intersect** Y 
- is X **near** Y 

go from 🐌 to 🚀!

On a high level spatial indices work by: 
1. representing objects by **2D-bounding boxes** (4 values regardless of geometry)
2. making spatial comparisons between bounding boxes (**cheap!**) 
3. reserving expensive (but accurate) comparisons between **matched bounding boxes**  

![Spatial Index Example](./imgs/spatial_index.png)

## Spatial Joins and neighbor searches

Spatial Join is the bread-and-butter of any spatial analysis.

Instead of join-keys the 

Which objects are near other objects? 

Naive approach: 

Best-practise: 

-spatial functions like 
    - `ST_INTERSECTS`,
    - `ST_OVERLAPS`,
    - `ST_WITHIN` 
    - or `ST_DWithin` 
leverage spatial indices for computation

-  ST_DWithin(geometry g1, geometry g2, double precision distance_of_srid)
Simple Example Query: 
` SELECT id, geometry FROM some_table WHERE ST_DWithin(geometry, 'SRID=3857;POINT(3072163.4 7159374.1)')`

Neighbor Query between two tables:

```
SELECT
    id_loc,
    geom_loc, 
    id_other_loc, 
    geom_other_loc, 
    --ST_Distance(geom_loc, geom_other_loc) as distance_m
FROM locations
CROSS JOIN LATERAL (
    SELECT 
        id_loc, 
        geom_other_loc
    FROM other_locations
    WHERE ST_DWithin(geom_loc, geom_other_loc, max_distance)
) subq
```

Spatial Query between with distance filter:

```
SELECT
    id_loc,
    geom_loc, 
    id_other_loc, 
    geom_other_loc, 
    ST_Distance(geom_loc, geom_other_loc) as distance_m
FROM locations
CROSS JOIN LATERAL (
    SELECT 
        id_loc, 
        geom_other_loc
    FROM other_locations
    WHERE ST_DWithin(geom_loc, geom_other_loc, max_distance) --spatial index is used
) subq
```

Spatial Query with limiting the number of neighbors

```
SELECT
    id_loc,
    geom_loc, 
    id_other_loc, 
    geom_other_loc, 
    ST_Distance(geom_loc, geom_other_loc) as distance_m
FROM locations
CROSS JOIN LATERAL (
    SELECT 
        id_loc, 
        geom_other_loc
    FROM other_locations
    ORDER BY geom_loc <-> geom_other_loc --spatial index is used
    LIMIT 1
) subq
```

Spatial Query with distance filter and limiting the number of neighbors

```
SELECT
    id_loc,
    geom_loc, 
    id_other_loc, 
    geom_other_loc, 
    ST_Distance(geom_loc, geom_other_loc) as distance_m
FROM locations
CROSS JOIN LATERAL (
    SELECT 
        id_loc, 
        geom_other_loc
    FROM other_locations
    WHERE ST_DWithin(geom_loc, geom_other_loc, max_distance) --spatial index is used
    ORDER BY geom_loc <-> geom_other_loc --spatial index is used
    LIMIT 1
) subq
```

# Use cases: 

## Which substations are at risk of flooding? 

## Which fire department will put the fire out at my house? 

## Where can I build my house? 

# Tableau and Postgis

In [None]:
- Spatial Clustering 
- 

# Spatial technology stack (Python)

- Data Wrangling: 
    - geopandas (pandas extension for geometric data)
    - pyproj (geographic reference systems) 
    - rtree (spatial indexing) 
    - pygeos (spatial indexing)
    - shapely (geometry data)
    - scipy/scikitlean (neighbor-searches) 
    - networkx 
    
- Plotting: 
    - matplotlib (+ descartes) 
    - folium (Python leaflet wrapper) 
    - altair (interactive visualizations) 
    - datashader (plotting extremely large datasets) 
 
- Misc: 
    - osmnx, osmium (OpenStreetMap data)

# References

- https://github.com/martinchristen/bigdatabbq2021
- [PostGis Feature Overview](https://www.youtube.com/watch?v=g4DgAVCmiDE)
- [Introduction to Spatial Indexing](https://blog.crunchydata.com/blog/the-many-spatial-indexes-of-postgis)
- [PostGiS Day 2021](https://www.youtube.com/playlist?list=PLesw5jpZchudjKjwvFks-gAbz9ZZzysFm)