# Spatial query of the Overture Buildings dataset with Fused

<a href="https://githubtocolab.com/fusedio/udfs/blob/main/public/Overture_Buildings/overture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/></a>


Welcome! This Notebook designed to perform a spatial query on the Overture Buildings dataset. 

The original dataset is several GB in size. The Fused User Defined Function (UDF) below fetches only the buildings that fall within the area of interest. This means you can use this code to simplify your workflows by loading only the fraction of data you care about.

The Notebook structure is:

1. Define an area of interest by drawing a polygon on a Leaflet map
2. Perform a spatial query over the dataset with a UDF
3. View the output on a map

Let's begin!

In [10]:
# Uncomment this line to install packages if needed
# !pip3 install fused folium geopandas

In [11]:
import fused
import geopandas as gpd
import folium
from folium.plugins import Draw

# Output file to save the input polygon as a geojson
FILENAME='draw.geojson'

# Create a map centered at the given location
MAP_LOCATION = (37.7749,  -122.4194)

# 1. Define an area of interest

Draw a polygon around the area of interest and on the map click the "export" button (top right) to save it as a geojson.

Note: the workflow is designed to work with a single polygon.

In [4]:
# Create a new map
m = folium.Map(location = MAP_LOCATION, tiles='OpenStreetMap', zoom_start=16)

# Add the draw control to the map
Draw(export=True, filename=FILENAME).add_to(m)
m

# 2. Perform a spatial query

This User Defined Function (UDF) queries a geo partitioned version of the Overture Buildings Dataset (hosted on an S3 bucket) that corresponds to the area of an input GeoDataFrame. It returns a GeoDataFrame containing the subsampled data, which gets cached to the local environment for added speed.

In [12]:
@fused.udf
def udf(
    bbox,
    release="2024-02-15-alpha.0",
    theme="buildings",
    type=None,
    use_columns=None,
    num_parts=None,
    min_zoom=None,
):
    import concurrent.futures
    import pandas as pd

    # Load utility functions
    utils = fused.load(
        "https://github.com/fusedio/udfs/tree/f8f0c0f/public/common/"
    ).utils 

    # Set defaults acording to zoom level (to avoid fetching too much data)
    if min_zoom:
        min_zoom = int(min_zoom)
    elif theme == "admins":
        min_zoom = 7
    elif theme == "base":
        min_zoom = 9
    else:
        min_zoom = 12

    # Parameters for the overture table
    default_type_per_theme = {
        "buildings": "building",
        "admins": "administrativeBoundary",
        "places": "place",
        "base": "landUse",
        "transportation": "segment",
    }
    if not type:
        type = default_type_per_theme[theme]

    # Remote table with partitioned data
    table_path = f"s3://fused-asset/overture/{release}/theme={theme}/type={type}"
    table_path = table_path.rstrip("/")

    # Partitions
    num_parts = 1 if theme != "buildings" else 5

    # Get data from each partition
    def get_part(part):
        part_path = f"{table_path}/part={part}/" if num_parts != 1 else table_path
        try:
            return utils.table_to_tile(
                bbox, table=part_path, use_columns=use_columns, min_zoom=min_zoom
            )
        except ValueError:
            return None


    # Use thread pool if multi-part
    if num_parts > 1:
        with concurrent.futures.ThreadPoolExecutor(max_workers=num_parts) as pool:
            dfs = list(pool.map(get_part, range(num_parts)))
    else:
        dfs = [get_part(0)]

    # Concatenate results
    dfs = [df for df in dfs if df is not None]

    if len(dfs):
        df = pd.concat(dfs)
        print(df.columns)
        for col in df.columns:
            # Some overture columns do not serialize nicely and can have compatability
            # issues with some Parquet implementations.
            # Here we coerce to string to work around that.
            if col != "geometry":
                df[col] = df[col].apply(str)
        return df
    else:
        print("No data found.")
        return None

In [13]:
# Use only the first bbox element
gdf_bbox = gpd.read_file(FILENAME).iloc[:1]

print(gdf_bbox)

                                            geometry
0  POLYGON ((-122.43671 37.78455, -122.44160 37.7...


In [14]:
# Run the UDF on your local machine
gdf_buildings = udf(bbox=gdf_bbox).run_local()
gdf_buildings.head()

Index(['fused_index', 'id', 'geometry', 'bbox', 'names', 'version',
       'updateTime', 'sources', 'class', 'hasParts', 'height', 'numFloors',
       'facadeColor', 'facadeMaterial', 'roofMaterial', 'roofShape',
       'roofDirection', 'roofOrientation', 'roofColor', 'eaveHeight', 'level'],
      dtype='object')


Unnamed: 0,fused_index,id,geometry,bbox,names,version,updateTime,sources,class,hasParts,...,numFloors,facadeColor,facadeMaterial,roofMaterial,roofShape,roofDirection,roofOrientation,roofColor,eaveHeight,level
7253,7253,08b283082d604fff0200111b8e8f2d04,"POLYGON ((-122.43899 37.76801, -122.43889 37.7...","{'maxx': -122.4388613, 'maxy': 37.7682067, 'mi...",,0,2019-08-24T04:47:17.000Z,"[{'confidence': None, 'dataset': 'OpenStreetMa...",,False,...,,,,,,,,,,
7255,7255,08b283082d631fff020062f72432bf3d,"POLYGON ((-122.43869 37.76821, -122.43881 37.7...","{'maxx': -122.4384127, 'maxy': 37.7682224, 'mi...",,0,2019-08-24T04:37:52.000Z,"[{'confidence': None, 'dataset': 'OpenStreetMa...",,False,...,,,,,,,,,,
7256,7256,08b283082d631fff0200af15925cd49f,"POLYGON ((-122.43868 37.76827, -122.43867 37.7...","{'maxx': -122.4384174, 'maxy': 37.768283, 'min...",,0,2019-08-24T04:37:52.000Z,"[{'confidence': None, 'dataset': 'OpenStreetMa...",,False,...,,,,,,,,,,
7294,7294,08b283082d631fff0200e001388b3b4c,"POLYGON ((-122.43843 37.76808, -122.43856 37.7...","{'maxx': -122.4384113, 'maxy': 37.7680847, 'mi...",,0,2019-08-24T04:37:52.000Z,"[{'confidence': None, 'dataset': 'OpenStreetMa...",,False,...,,,,,,,,,,
7296,7296,08b283082d631fff0200e17fce87859e,"POLYGON ((-122.43857 37.76815, -122.43857 37.7...","{'maxx': -122.4384213, 'maxy': 37.7681541, 'mi...",,0,2019-08-24T04:37:52.000Z,"[{'confidence': None, 'dataset': 'OpenStreetMa...",,False,...,,,,,,,,,,


# 3. View the output on a map

In [15]:
# Create a new map
m = folium.Map(location = MAP_LOCATION, tiles='OpenStreetMap', zoom_start=16)

# Add the bounding box to the map
folium.GeoJson(
    gdf_bbox.to_json(),
    name='geojson',
    style_function=lambda x: {'color': 'red', 'weight': 10, 'fillOpacity': 0}
).add_to(m)

# Add the buildings to the map
folium.GeoJson(
    gdf_buildings.to_json(),
    name='geojson'
).add_to(m)

# Add a layer control panel to the map.
folium.LayerControl().add_to(m)

# Display the map
m