In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import warnings
warnings.simplefilter("ignore")

<div class="main-title">
<h1>Geospatial data</h1>
<p>Introduction<p>
</div>

## Before we begin

We encorage you to check out the tutorial by Joris Van den Bossche  
[Introduction to geospatial data analysis with GeoPandas](https://github.com/jorisvandenbossche/geopandas-tutorial)  

<div class="center-content">
    <a href="https://t.ly/agtgJ">https://t.ly/agtgJ</a>
    <img src="../../assets/geospatial_intro.png" style="height: 300px; width: 300px; margin: auto;"/>
</div>

## What is this part for?

- introduce basic concepts related to geospatial data analysis
- build a common "vocabulary"
- understand geospatial libraries APIs

## How is it structured?
- intro to geopandas,
- shapely - geometry library,
- map projections and coordinate reference systems,
- grid systems,
- spatial operations.

## What is GeoPandas

- open source
- simplifies working with geospatial data
- extends pandas for spatial operations
- geometric operations - shapely
- fiona for file access and matplotlib for plotting

## Let's load some data

GeoPandas implements reading from a number of sources:
- files in formats supported by fiona
- PostGIS databases
- Feather and Parquet files

We'll be use a shapefile zip with countries from [Natural Earth](https://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries/)

## Read the shapefile

In [None]:
import geopandas as gpd
countries = gpd.read_file("data/ne_110m_admin_0_countries.zip")
countries = countries[["ISO_A3", "NAME", "CONTINENT", "POP_EST", "geometry"]]
countries.head(5)

## GeoDataFrames

- an extension of Pandas DataFrames
- consist of:
  - **geometries**: the column where spatial objects are stored
  - **properties**: the rest of the columns, describing the geometries

## Let's visualize it

We can use:
- `.plot()` to plot the geometries on a static map (matplotlib)
- `.explore()` to view them on an interactive map (Folium / Leaflet.js)

.plot()

In [None]:
countries.plot()

.explore()

In [None]:
countries.explore()

## We are working with a DataFrame

In [None]:
type(countries)

In [None]:
import pandas as pd
isinstance(countries, pd.DataFrame)

In [None]:
countries.columns

## Pandas operations

In [None]:
countries['POP_EST'].mean()

In [None]:
countries['CONTINENT'].value_counts()

## The geometry column

In [None]:
type(countries["POP_EST"]), type(countries.geometry)

In [None]:
countries.geometry

## Calculating the area

In [None]:
countries.geometry.area

<div class="alert alert-warning">
Note: Geopandas assumes 2D cartesian plane, so this is only valid when using a proper coordinate reference system.
</div>

## Let's go deeper - [Shapely](https://shapely.readthedocs.io/en/stable/manual.html) objects
- GeoPandas uses Shapely - geometry column
- geometric operations

In [None]:
pl_gdf = countries[countries["NAME"] == "Poland"]
pl_gdf

In [None]:
pl_geom = pl_gdf.iloc[0].geometry
pl_geom

In [None]:
type(pl_geom)

## Geometry's properties

In [None]:
pl_geom.area

In [None]:
## minimum bounding region
pl_geom.bounds

## Creating a geometry manually

In [None]:
from shapely.geometry import LineString
bounds = pl_geom.bounds
line = LineString(
    [(bounds[0], bounds[1]),
    (bounds[2], bounds[3]),]
)
line

## View it

In [None]:
gpd.GeoSeries([line, pl_geom]).plot(cmap='tab10')

## Coordinate Reference Systems

A coordinate reference system (CRS) then defines how the two-dimensional, projected map in your GIS relates to real places on the earth.
For a detailed description, see e.g. https://docs.qgis.org/3.28/en/docs/gentle_gis_introduction/coordinate_reference_systems.html

The topic **map projection** is very complex and even professionals who have studied geography, geodetics or any other GIS related science, often have problems with the correct definition of map projections and coordinate reference systems. Usually when you work with GIS, you already have projected data to start with. In most cases these data will be projected in a certain CRS, so you don’t have to create a new CRS or even re project the data from one CRS to another. That said, it is always useful to have an idea about what map projection and CRS means.

![Projection families](../../assets/projection_families.png)

### Geographic Coordinate Systems
The use of Geographic Coordinate Reference Systems is very common. They use degrees of latitude and longitude and sometimes also a height value to describe a location on the earth’s surface. The most popular is called WGS 84.
![Geographic coordinate system](../../assets/geographic_crs.png)

<div class="alert alert-success">
Note: Throughout the tutorial and the SRAI library, we default to the WGS 84 coordinate system.
</div>

## Projected coordinate reference systems
Another type of CRS is a projected coordinate reference system.
In this type of CRS (x,y) values often represent meters or feet, which makes it easier to work with e.g. when calculating distances or areas.
![Projected CRS](../../assets/projected_crs.png)

## CRS in Python

### Check crs in GeoPandas

In [None]:
countries.crs

### Let's fix Shapely plotting

The plotting didn't work in Shapely before because it interpreted degrees as cartesian coordinates. Let's fix that.

In [None]:
pl_geom_reprojected = pl_gdf.to_crs(2180).geometry.iloc[0]
pl_geom_reprojected

### Calculate area of Poland properly

The boundaries from the shapefile are not perfect but we should be able to get a rough estimate of the area using a proper CRS.

In [None]:
pl_geom_reprojected

In [None]:
area_km2 = pl_geom_reprojected.area / 10**6
print(f"Rough estimate of Poland's area: {area_km2:.2f} km2")

### What happens if we use .area in WGS 84?

In [None]:
pl_gdf.area

## Grid systems
- A tool to divide and index space
- Examples include [H3](https://github.com/uber/h3), [S2](https://s2geometry.io/about/), [Geohash](https://en.wikipedia.org/wiki/Geohash)

<div class="image-container">
    <figure>
      <img src="https://s2geometry.io/devguide/img/s2hierarchy.gif">
      <figcaption><a href="https://s2geometry.io/">S2</a></figcaption>
    </figure>
    <figure>
      <img src="https://h3geo.org/images/neighbors.png">
        <figcaption><a href="https://h3geo.org/docs/highlights/aggregation">H3</a></figcaption>
    </figure>
    <figure>
      <img src="https://upload.wikimedia.org/wikipedia/commons/3/3d/Geohash-grid.png">
      <figcaption><a href="https://h3geo.org/docs/highlights/aggregation">Geohash</a></figcaption>
    </figure>
</div>

## H3 - Hexagonal hierarchical geospatial indexing system
- hexagonal grid
- can be (approximately) subdivided into finer and finer hexagonal grids

## Spatial operations

In [None]:
gpd.GeoSeries([line, pl_geom]).plot(cmap='tab10')

In [None]:
line.within(pl_geom)

In [None]:
line.intersects(pl_geom)

## Spatial operations on GeoDataFrames
You can use the same spatial operations as in Shapely, on entire GeoDataFrames.

In [None]:
pl_de_gdf = countries[countries["NAME"].isin(["Poland", "Germany"])]
pl_de_gdf

In [None]:
merged_geom = pl_de_gdf.unary_union
merged_geom

In [None]:
gpd.GeoSeries([line, merged_geom]).plot(cmap='tab10')

In [None]:
from srai.regionalizers import geocode_to_region_gdf, H3Regionalizer
from utils import CB_SAFE_PALLETE

regionized = H3Regionalizer(resolution=3).transform(pl_de_gdf)
regionized["intersects"] = regionized.intersects(line)
regionized.explore("intersects")

## Spatial joins

In [None]:
from srai.regionalizers import geocode_to_region_gdf, H3Regionalizer
from utils import CB_SAFE_PALLETE

prague_gdf = geocode_to_region_gdf("Prague, Czech Republic")
regionized = H3Regionalizer(resolution=7).transform(prague_gdf)
regionized.explore()

## Get bicycle data for Prague

In [None]:
from srai.loaders import OSMOnlineLoader

loader = OSMOnlineLoader()
prague_bikes = loader.load(prague_gdf, {"amenity": "bicycle_rental"})
prague_bikes.explore(tiles="CartoDB Positron")

## Perform the join

In [None]:
joint_gdf = regionized.sjoin(prague_bikes)
joint_gdf

## Count bike stations

In [None]:
regionized.sjoin(prague_bikes).groupby("region_id").size()

## OpenStreetMap demo

OpenStreetMap is a free open data source of map data.  
It is a collaborative project to create a free editable map of the world.  
It is built using vector data with optional tags to describe the features.  
The main page of the project is https://www.openstreetmap.org/  
You can find example map features here: https://wiki.openstreetmap.org/wiki/Map_Features and here: https://taginfo.openstreetmap.org/


### Elements
Elements are the basic components of OpenStreetMap's conceptual data model of the physical world.
Elements are of three types:
- nodes (defining points in space),
- ways (defining linear features and area boundaries), and
- relations (which are sometimes used to explain how other elements work together).
All of the above can have one or more associated tags (key:value pairs) which describe the meaning of a particular element.

## To sum up

- GeoPandas
    - Pandas spatial extension
    - very useful tool for working with geospatial data
    - used by SRAI internally
- Shapely
    - used by GeoPandas
    - implements geometries and spatial operations
- Coordinate reference systems
  - how the two-dimensional, projected map relates to real places on the earth
  - "basic" but non-trivial
- Grid systems
- Spatial operations
    - both on Shapely objects and GeoDataFrames
    - relationships such as `within`, `intersects`
    - spatial joins
- OpenStreetMap
    - open source map of the world
    - vector data (points, lines, polygons)
    - described by key:value tags