# Download Overture Maps with DuckDB 

## What is DuckDB?
DuckDB is designed to support analytical query workloads, also known as Online analytical processing (OLAP). It includes a columnar-vectorized query execution engine. This is more performant than traditional systems such as PostgreSQL, MySQL, or SQLite, which process each row sequentially. There are many plugins available. You can easily transfer your data in environments such as Amazon S3, Google Cloud Storage, postgresql using plugins. You can perform spatial analysis by installing the Spatial plugin.

### Creating a database

In [1]:
import duckdb
import geopandas as gpd
db = duckdb.connect("data.db")

### Installing plug-ins for data access and spatial analysis
We install the "spatial" plugin to perform spatial analysis.
We are installing the "httpfs" plugin to access POI data in Amazon S3. Then we define the region as "us-west-2".

In [2]:
db.sql("""
INSTALL spatial;
INSTALL httpfs;
LOAD spatial;
LOAD httpfs;
SET s3_region='us-west-2';
""")


In [3]:
# Parameters
release = "2024-03-12-alpha.0"
theme = "transportation"
ptype = "segment"
bbFile = "C:\\Data\\GitHub\\jetgeo\\OM2UML\\Data\\StorOslo.geojson"

Get the bounding box to search within. Use https://geojson.io/ to create a bounding box for your area of interest

In [4]:
# Bounding box
import geojson
from shapely.geometry import shape

def get_bbox(geometry):
    polygon = shape(geometry)
    return polygon.bounds

with open(bbFile) as f:
    gj = geojson.load(f)
    features = gj['features'][0]  # Assuming you want the first feature

bbox = get_bbox(features['geometry'])
print("Bounding Box (minx, miny, maxx, maxy):", bbox)


Bounding Box (minx, miny, maxx, maxy): (10.387707, 59.818286, 11.103814, 60.021827)


Get data and count the number of items

In [5]:
# db.sql("""
#   create table places as 
#   select * from read_parquet('s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/theme=places/type=*/*')
# """)
try:
    db.sql("""DROP VIEW """ + ptype)
except:
    print("No existing table " + ptype)    

strSql= """CREATE VIEW """ + ptype + """ AS 
       SELECT * FROM read_parquet('s3://overturemaps-us-west-2/release/""" + release + """/theme="""+ theme + """/type=""" + ptype + """/*', filename=true, hive_partitioning=1)
       WHERE 
              bbox.minx > """ + str(bbox[0])+ """ AND 
              bbox.miny > """ + str(bbox[1])+ """ AND
              bbox.maxx < """ + str(bbox[2])+ """ AND 
              bbox.maxy < """ + str(bbox[3])+ """ ;
       """
# print(strSql)
db.sql(strSql)
db.sql("""select count(*) as count from """ + ptype).show()

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

┌────────┐
│ count  │
│ int64  │
├────────┤
│ 130269 │
└────────┘



Print all column names in the data set 

In [6]:
res = db.sql('SELECT * FROM ' + ptype)
# Get the column names
column_names = res.columns

# Print the list of column names
print("Column Names:")
for name in column_names:
    print(name)

Column Names:
id
geometry
bbox
version
update_time
sources
subtype
names
level
connector_ids
road
filename
theme
type


Write to json

In [7]:

# Convert the result to a Pandas DataFrame
df = res.df()
# Remove some columns before exporting in order to reduce the file size
columns_to_drop = ['geometry', 'bbox','filename','theme', 'type','sources','version']
df.drop(columns=columns_to_drop, inplace=True)

# Write the DataFrame to a nicely formatted JSON file
output_file = ptype + ".json"
df.to_json(output_file, orient="records", lines=True, indent=2)

print(f"Data written to {output_file} in nicely formatted JSON format.")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Data written to segment.json in nicely formatted JSON format.


Convert geometry and write to GeoJSON with only a few attributes

In [8]:
db.sql("""COPY (
    SELECT ST_GeomFromWKB(""" + ptype + """.geometry) as geometry, id, subtype FROM """ + ptype + """
) TO '""" + ptype + """.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Next: How to flatten the RoadType.class attribute?

In [None]:
#Flatten the RoadType.class attribute to Segment.class



The content below is from https://github.com/Youssef-Harby/OvertureMapsDownloader 

In [22]:
db.sql("""
    select * from """ + ptype + """ limit 25
""").show()


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

┌──────────────────────┬──────────────────────┬──────────────────────┬───┬────────────────┬─────────┐
│          id          │       geometry       │         bbox         │ … │     theme      │  type   │
│       varchar        │         blob         │ struct(minx double…  │   │    varchar     │ varchar │
├──────────────────────┼──────────────────────┼──────────────────────┼───┼────────────────┼─────────┤
│ 08a08a4aa6b17fff04…  │ \x00\x00\x00\x00\x…  │ {'minx': 11.061659…  │ … │ transportation │ segment │
│ 08b08a4aa6b16fff04…  │ \x00\x00\x00\x00\x…  │ {'minx': 11.062199…  │ … │ transportation │ segment │
│ 08b08a4aa44abfff04…  │ \x00\x00\x00\x00\x…  │ {'minx': 11.056290…  │ … │ transportation │ segment │
│ 08808a4aa45fffff04…  │ \x00\x00\x00\x00\x…  │ {'minx': 11.055904…  │ … │ transportation │ segment │
│ 08808a4aa45fffff04…  │ \x00\x00\x00\x00\x…  │ {'minx': 11.055904…  │ … │ transportation │ segment │
│ 08a08a4aa44e7fff04…  │ \x00\x00\x00\x00\x…  │ {'minx': 11.056651…  │ … │ transpo

You can find the diagram of the POI data [here](https://docs.overturemaps.org/reference/places/place/). There are columns in the data that we need to preprocess.

In [8]:
db.sql("""
    select names, categories, confidence,brand,addresses from places limit 5
""").show()

┌──────────────────────┬──────────────────────┬─────────────────────┬──────────────────────┬───────────────────────────┐
│        names         │      categories      │     confidence      │        brand         │         addresses         │
│ map(varchar, map(v…  │ struct(main varcha…  │       double        │ struct("names" map…  │  map(varchar, varchar)[]  │
├──────────────────────┼──────────────────────┼─────────────────────┼──────────────────────┼───────────────────────────┤
│ {common=[{value=Br…  │ {'main': veterinar…  │  0.5989174246788025 │ {'names': NULL, 'w…  │ [{postcode=LL57 2NX, fr…  │
│ {common=[{value=Tr…  │ {'main': park, 'al…  │  0.9108787178993225 │ {'names': NULL, 'w…  │ [{country=AR}]            │
│ {common=[{value=St…  │ {'main': beauty_sa…  │  0.9628990292549133 │ {'names': NULL, 'w…  │ [{locality=São Paulo, p…  │
│ {common=[{value=เต้…  │ {'main': thai_rest…  │ 0.47563284635543823 │ {'names': NULL, 'w…  │ [{locality=Thap Sakae, …  │
│ {common=[{value=ร้า…  │ {'mai

For example, in order to find out which category it is in the categories column, we need to get the information from the data held in the "struct" type. You can review the document to learn about DuckDB data types.
For example, to extract which country you are located in the "Addresses" column:

In [9]:
db.sql("""
    select replace(json_extract(CAST(addresses AS JSON), '$[0].country')::varchar,'"','') as country from places limit 5
""").show()

┌─────────┐
│ country │
│ varchar │
├─────────┤
│ GB      │
│ AR      │
│ BR      │
│ TH      │
│ TH      │
└─────────┘




After creating a column called “country” to extract country short names, we add the extracted data.

In [10]:
try:
       db.sql("""ALTER TABLE places ADD COLUMN country VARCHAR;
              update places set country = replace(json_extract(CAST(places.addresses AS JSON), '$[0].country')::varchar,'"','')

       """)
except duckdb.Error as e:
    print(e)
    pass

Catalog Error: Column with name country already exists!


We run the following query to add the POI data in Turkey to a separate table and obtain the address, category, name, geometry information.

In [11]:
db.sql("""
       create or replace table turkey_places as (
              select
                     replace(json_extract(places.addresses::json,'$[0].locality'),'"','')::varchar as locality,
                     replace(json_extract(places.addresses::json,'$[0].region'),'"','')::varchar as region,
                     replace(json_extract(places.addresses::json,'$[0].postcode'),'"','')::varchar as postcode,
                     replace(json_extract(places.addresses::json,'$[0].freeform'),'"','')::varchar as freeform,

                     categories.main as categories_main,

                     replace(json_extract(places.names::json,'$.common[0].value'),'"','')::varchar as names,
                     confidence,
                     bbox,
                     st_transform(st_point(st_y(st_geomfromwkb(geometry)),st_x(st_geomfromwkb(geometry))),'EPSG:4326','EPSG:3857') as geom


              from places 
                     where country ='TR' 
       )


""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Created table:

In [12]:
db.sql("""
select * from turkey_places limit 5
""").df()

Unnamed: 0,locality,region,postcode,freeform,categories_main,names,confidence,bbox,geom
0,,,,,public_plaza,Elite Mamak Society,0.630477,"{'minx': 32.919501, 'maxx': 32.919501, 'miny':...","[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
1,Konya,42.0,42120.0,İlkay Sokak 8,professional_services,AGRO TÖKE,0.655315,"{'minx': 32.522156, 'maxx': 32.522156, 'miny':...","[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
2,Kayseri,,38010.0,Ufuk Sokak 1,local_and_state_government_offices,Sosyal Güvenlik Kurumu İl Müdürlüğü,0.606282,"{'minx': 35.48468, 'maxx': 35.48468, 'miny': 3...","[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
3,Ankara,6.0,6300.0,492. Cadde 31/B,car_dealer,EYMEN OTO Kiralama,0.593438,"{'minx': 32.8882484, 'maxx': 32.8882484, 'miny...","[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
4,Adana,1.0,1120.0,62017. Sokak 13,contractor,Gökpınar İnşaat,0.623355,"{'minx': 35.3315582, 'maxx': 35.3315582, 'miny...","[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."


As an example, I will examine the POI data in Istanbul. I created two tables to obtain the POI points located within 500 m of the designated park points.

In [13]:
db.sql("""
    create or replace table park_ist as (
        select * from turkey_places where locality = 'İstanbul' and categories_main='park'   
    );

    create or replace table poi_ist as (
        select * from turkey_places where locality = 'İstanbul' and categories_main <> 'park'
    )

""")

Number of POIs in Istanbul:

In [14]:
db.sql(
    """
select count(*) from poi_ist

"""
)
'''
count
  181959
'''

'\ncount\n  181959\n'

Number of POIs designated as Parks in Istanbul:

In [15]:
db.sql(
    """
select count(*) from park_ist

""")

'''
count
  492
'''


'\ncount\n  492\n'

To query the POI points within 500 m of the points included in the park category:

In [16]:
df = db.sql("""
              select  poi_ist.region as poi_ist_region,poi_ist.freeform as poi_ist_freeform,poi_ist.categories_main as poi_ist_categori ,
              park_ist.categories_main as park_categori , park_ist.names as park_names, park_ist.freeform as park_ist_freeform,

              st_distance(poi_ist.geom,park_ist.geom) as dist,
              ST_AsText(poi_ist.geom) as geom,
              ST_AsText(park_ist.geom) as geom2

              from poi_ist, park_ist 

              where ST_DWithin(poi_ist.geom, park_ist.geom,500) 
       """).to_df()

gdf = gpd.GeoDataFrame(df,geometry= gpd.GeoSeries.from_wkt(df['geom']),crs="EPSG:3857")
gdf.to_file("poi.geojson",driver="GeoJSON")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))