# Working with Geometries

## Introduction

This notebook demonstrates how to work with geometries in DuckDB.

## Installation

Uncomment the following cell to install the required packages if needed.

In [3]:
%pip install duckdb leafmap

Collecting leafmap
  Downloading leafmap-0.60.0-py2.py3-none-any.whl.metadata (17 kB)
Collecting duckdb
  Downloading duckdb-1.4.4-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.metadata (4.3 kB)
Collecting geojson (from leafmap)
  Downloading geojson-3.2.0-py3-none-any.whl.metadata (16 kB)
Collecting ipyvuetify (from leafmap)
  Downloading ipyvuetify-1.11.3-py2.py3-none-any.whl.metadata (7.5 kB)
Collecting maplibre (from leafmap)
  Downloading maplibre-0.3.6-py3-none-any.whl.metadata (4.2 kB)
Collecting pystac-client (from leafmap)
  Downloading pystac_client-0.9.0-py3-none-any.whl.metadata (3.1 kB)
Collecting whiteboxgui (from leafmap)
  Downloading whiteboxgui-2.3.0-py2.py3-none-any.whl.metadata (5.7 kB)
Collecting ipyvue<2,>=1.7 (from ipyvuetify->leafmap)
  Downloading ipyvue-1.11.3-py2.py3-none-any.whl.metadata (987 bytes)
Collecting eval-type-backport (from maplibre->leafmap)
  Downloading eval_type_backport-0.3.1-py3-none-any.whl.metadata (2.4 kB)
Collecting pystac>

In [1]:
%pip install duckdb duckdb-engine jupysql

Collecting duckdb-engine
  Downloading duckdb_engine-0.17.0-py3-none-any.whl.metadata (8.4 kB)
Collecting jupysql
  Downloading jupysql-0.11.1-py3-none-any.whl.metadata (5.9 kB)
Collecting jupysql-plugin>=0.4.2 (from jupysql)
  Downloading jupysql_plugin-0.4.5-py3-none-any.whl.metadata (7.8 kB)
Collecting ploomber-core>=0.2.7 (from jupysql)
  Downloading ploomber_core-0.2.27-py3-none-any.whl.metadata (532 bytes)
Collecting posthog>=3.0 (from ploomber-core>=0.2.7->jupysql)
  Downloading posthog-7.8.5-py3-none-any.whl.metadata (6.4 kB)
Collecting backoff>=1.10.0 (from posthog>=3.0->ploomber-core>=0.2.7->jupysql)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Downloading duckdb_engine-0.17.0-py3-none-any.whl (49 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.7/49.7 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jupysql-0.11.1-py3-none-any.whl (95 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.1/95.1 kB[0m [31m4.

## Library Import and Configuration

In [1]:
import duckdb
import leafmap
import os

## Sample Data

The datasets in the database are in NAD83 / UTM zone 18N projection, EPSG:26918.

In [2]:
url = "https://storage.googleapis.com/qm2/CASA0025/nyc_data.db.zip"
leafmap.download_file(url, unzip=True)

Downloading...
From: https://storage.googleapis.com/qm2/CASA0025/nyc_data.db.zip
To: /content/nyc_data.db.zip
100%|██████████| 8.60M/8.60M [00:01<00:00, 4.80MB/s]

Extracting files...





'/content/nyc_data.db.zip'

In [3]:
print("/content/nyc_data.db.zip", os.listdir())

/content/nyc_data.db.zip ['.config', 'nyc_data.db.zip', 'nyc_data.db', 'sample_data']


In [4]:
%load_ext sql


%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

In [5]:
%sql duckdb:///nyc_data.db

In [6]:
%%sql

INSTALL httpfs;
LOAD httpfs;

Unnamed: 0,Success


In [7]:
%%sql
INSTALL spatial;

LOAD spatial;


Unnamed: 0,Success


## Connecting to DuckDB

Connect jupysql to DuckDB using a SQLAlchemy-style connection string. You may either connect to an in memory DuckDB, or a file backed db.

In [None]:
#con = duckdb.connect("nyc_data.db")

In [None]:
#con.install_extension("spatial")
#con.load_extension("spatial")

In [None]:
#con.sql("SHOW TABLES;")

## Creating samples

In [None]:
#con.sql("""

#CREATE or REPLACE TABLE samples (name VARCHAR, geom GEOMETRY);

#INSERT INTO samples VALUES
  ('Point', ST_GeomFromText('POINT(-100 40)')),
  ('Linestring', ST_GeomFromText('LINESTRING(0 0, 1 1, 2 1, 2 2)')),
  ('Polygon', ST_GeomFromText('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))')),
  ('PolygonWithHole', ST_GeomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0),(1 1, 1 2, 2 2, 2 1, 1 1))')),
  ('Collection', ST_GeomFromText('GEOMETRYCOLLECTION(POINT(2 0),POLYGON((0 0, 1 0, 1 1, 0 1, 0 0)))'));

SELECT * FROM samples;

  """)

In [8]:
%sql CREATE or REPLACE TABLE samples (name VARCHAR, geom GEOMETRY);

Unnamed: 0,Success


In [10]:
%%sql
insert into samples values
  ('Point', ST_GeomFromText('POINT(-100 40)')),
  ('Linestring', ST_GeomFromText('LINESTRING(0 0, 1 1, 2 1, 2 2)')),
  ('Polygon', ST_GeomFromText('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))')),
  ('PolygonWithHole', ST_GeomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0),(1 1, 1 2, 2 2, 2 1, 1 1))')),
  ('Collection', ST_GeomFromText('GEOMETRYCOLLECTION(POINT(2 0),POLYGON((0 0, 1 0, 1 1, 0 1, 0 0)))'));

Unnamed: 0,Success


In [11]:
%%sql
select * from samples

Unnamed: 0,name,geom
0,Point,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
1,Linestring,"[1, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,Polygon,"[2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,PolygonWithHole,"[2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
4,Collection,"[6, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [None]:
con.sql("SELECT name, ST_AsText(geom) AS geometry FROM samples;")

In [12]:
%%sql
select name, ST_ASTEXT(geom) as geometry from samples;

Unnamed: 0,name,geometry
0,Point,POINT (-100 40)
1,Linestring,"LINESTRING (0 0, 1 1, 2 1, 2 2)"
2,Polygon,"POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))"
3,PolygonWithHole,"POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, ..."
4,Collection,"GEOMETRYCOLLECTION (POINT (2 0), POLYGON ((0 0..."


In [None]:
con.sql("""

COPY samples TO 'samples.geojson' (FORMAT GDAL, DRIVER GeoJSON);

""")

In [13]:
%%sql
copy samples to 'samples.geojson'(format GDAL, driver geojson);

Unnamed: 0,Success


## Points

![](https://postgis.net/workshops/postgis-intro/_images/points.png)

A spatial point represents a single location on the Earth. This point is represented by a single coordinate (including either 2-, 3- or 4-dimensions). Points are used to represent objects when the exact details, such as shape and size, are not important at the target scale. For example, cities on a map of the world can be described as points, while a map of a single state might represent cities as polygons.


In [None]:
con.sql("""

SELECT ST_AsText(geom)
  FROM samples
  WHERE name = 'Point';

""")

In [14]:
%%sql
select ST_ASTEXT(geom)
from samples
where name = 'Point';

Unnamed: 0,st_astext(geom)
0,POINT (-100 40)


Some of the specific spatial functions for working with points are:

- **ST_X(geom)** returns the X ordinate
- **ST_Y(geom)** returns the Y ordinate

So, we can read the ordinates from a point like this:

In [None]:
con.sql("""

SELECT ST_X(geom), ST_Y(geom)
  FROM samples
  WHERE name = 'Point';

""")

In [15]:
%%sql

SELECT ST_X(geom), ST_Y(geom)
  FROM samples
  WHERE name = 'Point';

Unnamed: 0,st_x(geom),st_y(geom)
0,-100.0,40.0


In [None]:
con.sql("""

SELECT * FROM nyc_subway_stations

""")

In [16]:
%%sql
select * from nyc_subway_stations;

Unnamed: 0,OBJECTID,ID,NAME,ALT_NAME,CROSS_ST,LONG_NAME,LABEL,BOROUGH,NGHBHD,ROUTES,TRANSFERS,COLOR,EXPRESS,CLOSED,geom
0,1.0,376.0,Cortlandt St,,Church St,"Cortlandt St (R,W) Manhattan","Cortlandt St (R,W)",Manhattan,,"R,W","R,W",YELLOW,,,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
1,2.0,2.0,Rector St,,,Rector St (1) Manhattan,Rector St (1),Manhattan,,1,1,RED,,,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
2,3.0,1.0,South Ferry,,,South Ferry (1) Manhattan,South Ferry (1),Manhattan,,1,1,RED,,,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
3,4.0,125.0,138th St,Grand Concourse,Grand Concourse,"138th St / Grand Concourse (4,5) Bronx","138th St / Grand Concourse (4,5)",Bronx,,45,45,GREEN,,,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
4,5.0,126.0,149th St,Grand Concourse,Grand Concourse,149th St / Grand Concourse (4) Bronx,149th St / Grand Concourse (4),Bronx,,4,245,GREEN,express,,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
486,487.0,909.0,JFK Terminal 8,,,"JFK Terminal 8, Queens",JFK Terminal 8,Queens,,,,AIR-BLUE,,,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
487,488.0,903.0,Federal Circle,Rental Car,,"Federal Circle / Rental Car, Queens",Federal Circle / Rental Car,Queens,,,,AIR-BLUE,,,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
488,489.0,902.0,Long Term Parking,,,"Long Term Parking, Queens",Long Term Parking,Queens,,,,AIR-BLUE,,,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
489,490.0,901.0,Howard Beach,,159th Ave,"Howard Beach, Queens",Howard Beach,Queens,,,A,AIR-BLUE,,,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."


In [None]:
con.sql("""

SELECT name, ST_AsText(geom)
  FROM nyc_subway_stations
  LIMIT 10;

""")

In [17]:
%%sql
select name, ST_ASTEXT(geom)
from nyc_subway_stations
limit 10;

Unnamed: 0,NAME,st_astext(geom)
0,Cortlandt St,POINT (583521.854408956 4507077.862599085)
1,Rector St,POINT (583324.4866324601 4506805.373160211)
2,South Ferry,POINT (583304.1823994748 4506069.654048115)
3,138th St,POINT (590250.10594797 4518558.019924332)
4,149th St,POINT (590454.7399891173 4519145.719617855)
5,149th St,POINT (590465.8934191109 4519168.697483203)
6,161st St,POINT (590573.169495527 4520214.766177284)
7,167th St,POINT (591252.8314104103 4520950.353355553)
8,167th St,POINT (590946.3972262995 4521077.318976877)
9,170th St,POINT (591583.6111452815 4521434.846626811)


## Linestrings

![](https://postgis.net/workshops/postgis-intro/_images/lines.png)


A **linestring** is a path between locations. It takes the form of an
ordered series of two or more points. Roads and rivers are typically
represented as linestrings. A linestring is said to be **closed** if it
starts and ends on the same point. It is said to be **simple** if it
does not cross or touch itself (except at its endpoints if it is
closed). A linestring can be both **closed** and **simple**.

The street network for New York (`nyc_streets`) was loaded earlier in
the workshop. This dataset contains details such as name, and type. A
single real world street may consist of many linestrings, each
representing a segment of road with different attributes.

The following SQL query will return the geometry associated with one
linestring (in the `ST_AsText` column).

In [None]:
con.sql("""

SELECT ST_AsText(geom)
  FROM samples
  WHERE name = 'Linestring';

""")

In [18]:
%%sql
select ST_ASTEXT(geom)
from samples
where name = 'Linestring';

Unnamed: 0,st_astext(geom)
0,"LINESTRING (0 0, 1 1, 2 1, 2 2)"


Some of the specific spatial functions for working with linestrings are:

-   `ST_Length(geom)` returns the length of the linestring
-   `ST_StartPoint(geom)` returns the first coordinate as a point
-   `ST_EndPoint(geom)` returns the last coordinate as a point
-   `ST_NPoints(geom)` returns the number of coordinates in the
    linestring

So, the length of our linestring is:

In [None]:
con.sql("""

SELECT ST_Length(geom)
  FROM samples
  WHERE name = 'Linestring';

""")

In [20]:
%%sql
select st_length(geom)
from samples
where name = 'Linestring';

Unnamed: 0,st_length(geom)
0,3.414214


## Polygons

![](https://postgis.net/workshops/postgis-intro/_images/polygons.png)

A polygon is a representation of an area. The outer boundary of the
polygon is represented by a ring. This ring is a linestring that is both
closed and simple as defined above. Holes within the polygon are also
represented by rings.

Polygons are used to represent objects whose size and shape are
important. City limits, parks, building footprints or bodies of water
are all commonly represented as polygons when the scale is sufficiently
high to see their area. Roads and rivers can sometimes be represented as
polygons.

The following SQL query will return the geometry associated with one
polygon (in the `ST_AsText` column).

In [None]:
con.sql("""

SELECT ST_AsText(geom)
  FROM samples
  WHERE name LIKE 'Polygon%';

""")

In [22]:
%%sql
select ST_ASTEXT(geom)
from samples
where name like 'Polygon%';

Unnamed: 0,st_astext(geom)
0,"POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))"
1,"POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, ..."


Some of the specific spatial functions for working with polygons are:

-   `ST_Area(geom)` returns the area of the polygons
-   `ST_NRings(geom)` returns the number of rings (usually 1, more
    of there are holes)
-   `ST_ExteriorRing(geom)` returns the outer ring as a linestring
-   `ST_InteriorRingN(geometry,n)` returns a specified interior ring as
    a linestring
-   `ST_Perimeter(geom)` returns the length of all the rings

We can calculate the area of our polygons using the area function:

In [None]:
con.sql("""

SELECT name, ST_Area(geom)
  FROM samples
  WHERE name LIKE 'Polygon%';

""")

In [23]:
%%sql
select name, ST_AREA(geom)
from samples
where name like 'Polygon%';

Unnamed: 0,name,st_area(geom)
0,Polygon,1.0
1,PolygonWithHole,99.0


## Collections

There are four collection types, which group multiple simple samples
into sets.

-   **MultiPoint**, a collection of points
-   **MultiLineString**, a collection of linestrings
-   **MultiPolygon**, a collection of polygons
-   **GeometryCollection**, a heterogeneous collection of any geometry
    (including other collections)

Collections are another concept that shows up in GIS software more than
in generic graphics software. They are useful for directly modeling real
world objects as spatial objects. For example, how to model a lot that
is split by a right-of-way? As a **MultiPolygon**, with a part on either
side of the right-of-way.

Our example collection contains a polygon and a point:

In [None]:
con.sql("""

SELECT name, ST_AsText(geom)
  FROM samples
  WHERE name = 'Collection';

""")

In [24]:
%%sql
select name, ST_ASTEXT(geom)
from samples
where name = 'Collection';

Unnamed: 0,name,st_astext(geom)
0,Collection,"GEOMETRYCOLLECTION (POINT (2 0), POLYGON ((0 0..."


## Data Visualization

In [None]:
con.sql("SHOW TABLES;")

In [27]:
%%sql
SHOW TABLES;

Unnamed: 0,Success


In [32]:
subway_stations_df = con.sql("SELECT * EXCLUDE geom, ST_AsText(geom) as geometry FROM nyc_subway_stations").df()
subway_stations_df.head()

NameError: name 'con' is not defined

In [None]:
subway_stations_gdf = leafmap.df_to_gdf(subway_stations_df, src_crs="EPSG:26918", dst_crs="EPSG:4326")
subway_stations_gdf.head()

In [None]:
subway_stations_gdf.explore()

In [None]:
nyc_streets_df = con.sql("SELECT * EXCLUDE geom, ST_AsText(geom) as geometry FROM nyc_streets").df()
nyc_streets_df.head()

In [None]:
nyc_streets_gdf = leafmap.df_to_gdf(nyc_streets_df, src_crs="EPSG:26918", dst_crs="EPSG:4326")
nyc_streets_gdf.head()

In [None]:
nyc_streets_gdf.explore()