In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import warnings
warnings.simplefilter("ignore")

<div class="main-title">
<h1>Geospatial data</h1>
<p>Introduction<p>
</div>

## Before we begin

We encorage you to check out the tutorial by Joris Van den Bossche  
[Introduction to geospatial data analysis with GeoPandas](https://github.com/jorisvandenbossche/geopandas-tutorial)  

<div class="center-content">
    <a href="https://t.ly/agtgJ">https://t.ly/agtgJ</a>
    <img src="assets/geospatial_intro.png" style="height: 300px; width: 300px; margin: auto;"/>
</div>

## What is this part for?

- introduce a couple of basic concepts
- build a "vocabulary"
- understand SRAI APIs
- mostly used by SRAI under the hood
- pre/post processing, data preparation, analysis and visualization

## SRAI utilizes [GeoPandas](https://geopandas.org/)

- one of its main under-the-hood libraries
- SRAI builds on-top of GeoPandas
- most functions either accept as input, return or otherwise work with GeoDataFrames
- easy to use existing GeoPandas functionalities
  - pre-processing, post-processing, data-preparation, visualization etc.

## What is GeoPandas

- open source
- simplifies working with geospatial data
- extends pandas for spatial operations
- geometric operations - shapely
- fiona for file access and matplotlib for plotting

## What are GeoDataFrames

- an extension of Pandas DataFrames
- consist of:
  - **geometries**: the column where spatial objects are stored
  - **properties**: the rest of the columns, describing the geometries

## Let's load some data

GeoPandas implements reading from a number of sources:
- files in formats supported by fiona
- PostGIS databases
- Feather and Parquet files

We'll be use a shapefile zip with countries from [Natural Earth](https://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries/)

## Read the shapefile

In [None]:
import geopandas as gpd
countries = gpd.read_file("data/ne_110m_admin_0_countries.zip")
countries = countries[["ISO_A3", "NAME", "CONTINENT", "POP_EST", "geometry"]]
countries.head(5)

## Visualize the geometries

We can use:
- `.plot()` to plot the geometries on a static map (matplotlib)
- `.explore()` to view them on an interactive map (Folium / Leaflet.js)

.plot()

In [None]:
countries.plot()

.explore()

In [None]:
countries.explore()

## We are working with a DataFrame

In [None]:
type(countries)

In [None]:
import pandas as pd
isinstance(countries, pd.DataFrame)

In [None]:
countries.columns

## Pandas operations

In [None]:
countries['POP_EST'].mean()

In [None]:
countries['CONTINENT'].value_counts()

## The geometry column

In [None]:
type(countries["POP_EST"]), type(countries.geometry)

In [None]:
countries.geometry

## Calculating the area

In [None]:
countries.geometry.area

## Using GeoDataFrames in SRAI

In [None]:
poland_gdf = countries[countries["NAME"] == "Poland"]
poland_gdf

In [None]:
from srai.regionalizers import AdministrativeBoundaryRegionalizer

regionalizer = AdministrativeBoundaryRegionalizer(admin_level=4)
regions_gdf = regionalizer.transform(poland_gdf)
regions_gdf.head(5)

In [None]:
regions_gdf.explore()

## Let's go deeper - [Shapely](https://shapely.readthedocs.io/en/stable/manual.html) objects
- GeoPandas uses Shapely - geometry column
- geometric operations

In [None]:
type(regions_gdf.iloc[2].geometry)

In [None]:
print(regions_gdf.iloc[2].name)
voivodeship_region = regions_gdf.iloc[2:3]
voivodeship_geom = voivodeship_region.geometry[0]
voivodeship_geom

## Geometry's properties

In [None]:
voivodeship_geom.area

In [None]:
## minimum bounding region
voivodeship_geom.bounds

## Let's create a geometry object

In [None]:
from shapely.geometry import LineString
bounds = voivodeship_geom.bounds
line = LineString(
    [(bounds[0], bounds[1]),
    (bounds[2], bounds[3]),]
)
line

## View it

In [None]:
gpd.GeoSeries([line, voivodeship_geom]).plot(cmap='tab10')

## Spatial operations

In [None]:
line.within(voivodeship_geom)

In [None]:
line.intersects(voivodeship_geom)

## Spatial operations on GeoDataFrames
You can use the same spatial operations as in Shapely, on entire GeoDataFrames.

Let's prepare some data to show it.

## One last concept - Spatial indexes
- A tool to divide and index space
- Examples include [H3](https://github.com/uber/h3), [S2](https://s2geometry.io/about/), [Geohash](https://en.wikipedia.org/wiki/Geohash)

<div class="image-container">
    <figure>
      <img src="https://s2geometry.io/devguide/img/s2hierarchy.gif">
      <figcaption><a href="https://s2geometry.io/">S2</a></figcaption>
    </figure>
    <figure>
      <img src="https://h3geo.org/images/neighbors.png">
        <figcaption><a href="https://h3geo.org/docs/highlights/aggregation">H3</a></figcaption>
    </figure>
    <figure>
      <img src="https://upload.wikimedia.org/wikipedia/commons/3/3d/Geohash-grid.png">
      <figcaption><a href="https://h3geo.org/docs/highlights/aggregation">Geohash</a></figcaption>
    </figure>
</div>

## H3 - Hexagonal hierarchical geospatial indexing system
- hexagonal grid
- can be (approximately) subdivided into finer and finer hexagonal grids

## Revisitting the previous example

In [None]:
gpd.GeoSeries([line, voivodeship_geom]).plot(cmap='tab10')

In [None]:
from srai.regionalizers import geocode_to_region_gdf, H3Regionalizer
from utils import CB_SAFE_PALLETE

regionized = H3Regionalizer(resolution=6).transform(voivodeship_region)
regionized["intersects"] = regionized.intersects(line)
regionized.explore("intersects")

## Spatial joins

In [None]:
from srai.regionalizers import geocode_to_region_gdf, H3Regionalizer
from utils import CB_SAFE_PALLETE

prague_gdf = geocode_to_region_gdf("Prague, Czech Republic")
regionized = H3Regionalizer(resolution=7).transform(prague_gdf)
regionized.explore()

## Get bicycle data for Prague

In [None]:
from srai.loaders import OSMOnlineLoader

loader = OSMOnlineLoader()
prague_bikes = loader.load(prague_gdf, {"amenity": "bicycle_rental"})
prague_bikes.explore(tiles="CartoDB Positron")

## Perform the join

In [None]:
joint_gdf = regionized.sjoin(prague_bikes)
joint_gdf

## Count bike stations

In [None]:
regionized.sjoin(prague_bikes).groupby("region_id").size()

## To sum up

- GeoPandas
    - Pandas spatial extension
    - very useful tool for working with geospatial data
    - used by SRAI internally
- Shapely
    - used by GeoPandas
    - implements geometries and spatial operations
- Spatial indexes
- Spatial operations
    - both on Shapely objects and GeoDataFrames
    - relationships such as `within`, `intersects`
    - spatial joins