Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
916158c
#361 Remove ITileService contract
jathavaan May 27, 2026
39bee54
#361 Remove ITileApiService contract
jathavaan May 27, 2026
1efc415
#361 Remove IMVTService contract
jathavaan May 27, 2026
e9a217b
#361 Remove tile re-exports from contracts __init__
jathavaan May 27, 2026
c42e9db
#361 Remove convert_pmtiles_to_bytes from IBytesService
jathavaan May 27, 2026
0373b9e
#361 Remove TileService implementation
jathavaan May 27, 2026
de58ddf
#361 Remove TileApiService implementation
jathavaan May 27, 2026
b0b8751
#361 Remove MVTService implementation
jathavaan May 27, 2026
266e6a8
#361 Remove tile re-exports from services __init__
jathavaan May 27, 2026
17a3f9e
#361 Remove convert_pmtiles_to_bytes from BytesService
jathavaan May 27, 2026
0de92b1
#361 Remove tile/MVT service wiring from DI container
jathavaan May 27, 2026
71ae5d8
#361 Remove tile_server.py and endpoints directory
jathavaan May 27, 2026
997a0b9
#361 Remove tile_server from DI wire modules
jathavaan May 27, 2026
700f79b
#361 Remove PMTiles/MVT setup functions from benchmarking framework
jathavaan May 27, 2026
f3bc681
#361 Remove tile-related config vars from Config
jathavaan May 27, 2026
c2439e0
#361 Remove VECTOR_TILE enum members from BenchmarkIteration
jathavaan May 27, 2026
0c82e7d
#361 Remove TILES from StorageContainer enum
jathavaan May 27, 2026
89375dc
#361 Remove Api.Dockerfile
jathavaan May 27, 2026
ef565e7
#361 Remove vmt-api-server from docker-compose
jathavaan May 27, 2026
af2aaec
#361 Remove publish-api workflow
jathavaan May 27, 2026
2de48cd
#361 Remove VMT API build job from PR tests workflow
jathavaan May 27, 2026
c016ae0
#361 Remove tile-related dependencies from requirements
jathavaan May 27, 2026
afc845a
#361 Remove tile/VMT references from README
jathavaan May 27, 2026
3f5888a
#361 Remove tile references from CLAUDE.md
jathavaan May 27, 2026
6b2efcc
#361 Remove Publish APIs badge from README
jathavaan May 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 0 additions & 15 deletions .docker/Api.Dockerfile

This file was deleted.

132 changes: 0 additions & 132 deletions .github/workflows/publish-api.yml

This file was deleted.

26 changes: 0 additions & 26 deletions .github/workflows/pull-request-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ jobs:
pull-requests: read
outputs:
orchestrator: ${{ steps.filter.outputs.orchestrator }}
api: ${{ steps.filter.outputs.api }}
benchmarks: ${{ steps.filter.outputs.benchmarks }}
steps:
- uses: actions/checkout@v4
Expand All @@ -57,13 +56,6 @@ jobs:
- '.docker/Setup.Dockerfile'
- 'requirements.txt'
- 'docker-compose.yml'
api:
- 'src/**'
- '!src/presentation/entrypoints/**'
- '!src/presentation/databricks/**'
- '.docker/Api.Dockerfile'
- 'requirements.txt'
- 'docker-compose.yml'

compile:
name: Check Python syntax
Expand Down Expand Up @@ -109,24 +101,6 @@ jobs:
- name: Build Container Orchestrator from docker-compose
run: docker compose build container-orchestrator

build-api-image:
name: Build VMT API Server
runs-on: ubuntu-latest
needs:
- compile
- detect-changes
if: needs.detect-changes.outputs.api == 'true'

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build VMT API Server from docker-compose
run: docker compose build vmt-api-server

build-benchmark-images:
name: Build ${{ matrix.display_name }}
runs-on: ubuntu-latest
Expand Down
4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ Reproducible benchmarking framework comparing cloud-native (DuckDB + GeoParquet,

## Stack

Python, DuckDB (spatial), PostGIS on Azure Database for PostgreSQL, Apache Sedona on Databricks, Azure Blob Storage, Azure Container Instances, `dependency_injector`, FastAPI, PMTiles/MVT. See `requirements.txt` for versions.
Python, DuckDB (spatial), PostGIS on Azure Database for PostgreSQL, Apache Sedona on Databricks, Azure Blob Storage, Azure Container Instances, `dependency_injector`. See `requirements.txt` for versions.

## Layout (Clean Architecture)

- `src/domain/` — enums only; no dependencies on other layers.
- `src/application/` — `contracts/` (service interfaces), `dtos/`, `common/` (logger, monitor).
- `src/infra/` — `infrastructure/services/` (contract impls), `infrastructure/containers.py` (DI wiring), `persistence/context/` (DuckDB, Postgres, Blob clients).
- `src/presentation/` — `entrypoints/` (one file per benchmark), `configuration/app_config.py` (`initialize_dependencies`), `databricks/` (notebook script), `endpoints/tile_server.py` (FastAPI VMT server).
- `src/presentation/` — `entrypoints/` (one file per benchmark), `configuration/app_config.py` (`initialize_dependencies`), `databricks/` (notebook script).
- `main.py` — outside-ACI orchestrator. Reads `benchmarks.yml`, launches one ACI per experiment.
- `benchmark_runner.py` — in-container dispatcher. Matches `--script-id` to a function in `src/presentation/entrypoints/`.
- `benchmarks.yml` — experiment manifest. Each entry: `id`, `image`, `cpu`, `memory_gb`, `related_script_ids`.
Expand Down
41 changes: 5 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

doppa is a reproducible benchmarking framework for evaluating traditional geospatial query stacks
(PostGIS, shapefiles) against cloud-native geospatial (CNG) alternatives (DuckDB over GeoParquet in
blob storage, PMTiles/MVT vector tiles, and Apache Sedona on Databricks) across a range of real-world
blob storage and Apache Sedona on Databricks) across a range of real-world
spatial query patterns: point-in-polygon lookups, k-nearest-neighbour search, bounding-box filtering,
and a national-scale spatial join.

Expand All @@ -13,7 +13,7 @@ measurable and reproducible on identical datasets and hardware.

<div align="center">

[![Push containers to Azure Container Registry](https://github.com/kartAI/doppa-data/actions/workflows/push-containers-to-acr.yml/badge.svg)](https://github.com/kartAI/doppa-data/actions/workflows/push-containers-to-acr.yml) [![Publish APIs](https://github.com/kartAI/doppa-data/actions/workflows/publish-api.yml/badge.svg)](https://github.com/kartAI/doppa-data/actions/workflows/publish-api.yml)
[![Push containers to Azure Container Registry](https://github.com/kartAI/doppa-data/actions/workflows/push-containers-to-acr.yml/badge.svg)](https://github.com/kartAI/doppa-data/actions/workflows/push-containers-to-acr.yml)

</div>

Expand Down Expand Up @@ -60,7 +60,7 @@ format internals to client-observed cost is measured end to end.
**Cloud-native vector formats vs. traditional formats on cloud storage.** Empirical comparisons in the literature
(Holmes 2023; Flatgeobuf 2024) measure write times and file sizes on local disk and do not place cloud-native and
traditional formats side by side on cloud storage. doppa benchmarks GeoParquet over Azure Blob Storage (via DuckDB)
against PostGIS on Azure Database for PostgreSQL, and PMTiles against WMS-style vector tiles, across the active
against PostGIS on Azure Database for PostgreSQL, across the active
catalog of query patterns: point-in-polygon lookups, k-nearest-neighbour search, bounding-box filtering, and a
national-scale spatial join. The local-Shapefile entrypoints sit on the side as a laptop-workflow reference, with the
Shapefile downloaded ahead of the timed scope to emulate that workflow rather than to bench the format on cloud
Expand Down Expand Up @@ -167,8 +167,6 @@ to the elapsed-time distribution.
| PostGIS | Single-node, managed service | Azure Database for PostgreSQL Flexible Server |
| GeoPandas + Shapefile | Single-node, local-disk baseline | Shapefile pre-downloaded to the container before the timed scope |
| Apache Sedona | Distributed | Azure Databricks, 2 / 4 / 8 / 12 / 16 `Standard_D4s_v3` workers, reading GeoParquet via ABFS |
| PMTiles | Cloud-native vector tiles | PMTiles archive in blob storage, accessed via HTTP range reads |
| WMS-style vector tiles | Traditional vector tiles | `doppa-vmt` web app for containers, tiles assembled on demand |

DuckDB and PostGIS each run inside an Azure Container Instance with 4 vCPU and 16 GB RAM, so CPU and memory baselines
match between the single-node engines.
Expand Down Expand Up @@ -348,17 +346,16 @@ so.
#### Resource naming

The resource names used throughout this section (`doppa`, `doppabs`, `doppaacr`, `doppa-uami`,
`doppa-db`, `doppa-vmt`, `doppa-databricks`) are baked into source and configuration. Keep them
`doppa-db`, `doppa-databricks`) are baked into source and configuration. Keep them
as-is for the simplest setup; this is also what the thesis deployment uses, so reproducing the
published results requires these exact names.

If you need to rename a resource, the following references must be updated together:

| Location | What is hardcoded |
|-------------------------------------|--------------------------------------------------------------------------------|
| `src/config.py` | Default values for resource group, blob URL/account, VMT URL, STAC container |
| `src/config.py` | Default values for resource group, blob URL/account, STAC container |
| `benchmarks.yml` | ACR image references (`doppaacr.azurecr.io/<image>:latest`) for every benchmark |
| `.github/workflows/publish-api.yml` | `webapp_name: doppa-vmt` |

`src/config.py` defaults can also be overridden via the corresponding environment variables
(see [Local development](#local-development) and [GitHub Actions](#github-actions)) without
Expand Down Expand Up @@ -464,34 +461,6 @@ same setting change the following:
- `effective_cache_size`: `6291456`
- `work_mem`: `65536`

#### Web app for containers

Create
a [web app for containers](https://portal.azure.com/#view/Microsoft_Azure_Marketplace/GalleryItemDetailsBladeNopdl/id/Microsoft.AppSvcLinux/selectionMode~/false/resourceGroupId//resourceGroupLocation//dontDiscardJourney~/false/selectedMenuId/home/launchingContext~/%7B%22galleryItemId%22%3A%22Microsoft.AppSvcLinux%22%2C%22source%22%3A%5B%22GalleryFeaturedMenuItemPart%22%2C%22VirtualizedTileDetails%22%5D%2C%22menuItemId%22%3A%22home%22%2C%22subMenuItemId%22%3A%22Search%20results%22%2C%22telemetryId%22%3A%22135c4e97-6a92-446e-aa0a-3f2201ddfdb1%22%7D/searchTelemetryId/c154ee0a-06d6-49e4-a17f-3820937e6335)
The process is the same for each of the following API servers:

- `doppa-vmt`

Under *Basics*:

- Resource group: `doppa`
- Name: `<name-from-list-above>`
- Publish: `Container`
- Operating system: `Linux`
- Pricing plan: `Premium V4 P0V4`

Under *Container*:

- Image source: `Azure Container Registry`
- Registry: `doppaacr`
- Authentication: `Managed identity`
- Identity: `doppa-uami`
- Image: `<select the image that matches with the name>`
- Tag: `latest`
- Startup command `uvicorn src.presentation.endpoints.<API server script>:app --host 0.0.0.0 --port 8000`

Navigate to *Review + create* and create the resource. Repeat this process for each name in the list.

#### Databricks

The national-scale spatial join benchmarks run on Azure Databricks using Apache Sedona. A separate Databricks
Expand Down
11 changes: 0 additions & 11 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -217,14 +217,3 @@ services:
dockerfile: .docker/Query.Dockerfile
image: national-scale-spatial-join-databricks-partitioned-16-nodes:latest
command: python benchmark_runner.py --script-id national-scale-spatial-join-databricks-partitioned-16-nodes --benchmark-run 1 --run-id ABCDEF

vmt-api-server:
env_file:
- .env
build:
context: .
dockerfile: .docker/Api.Dockerfile
ports:
- "8000:8000"
image: vmt-api-server:latest
command: uvicorn src.presentation.endpoints.tile_server:app --host 0.0.0.0 --port 8000
5 changes: 0 additions & 5 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ dependency-injector==4.48.2
dotenv==0.9.9
duckdb==1.4.0
executing==2.2.1
fastapi==0.135.1
fastjsonschema==2.21.2
fiona==1.10.1
folium==0.20.0
Expand Down Expand Up @@ -89,7 +88,6 @@ lark==1.3.0
MarkupSafe==3.0.3
matplotlib==3.10.6
matplotlib-inline==0.1.7
mercantile==1.2.1
mistune==3.1.4
msal==1.34.0
msal-extensions==1.3.1
Expand All @@ -111,7 +109,6 @@ parso==0.8.5
pexpect==4.9.0
pillow==11.3.0
platformdirs==4.4.0
pmtiles==3.7.0
prometheus_client==0.23.1
prompt_toolkit==3.0.52
propcache==0.4.1
Expand Down Expand Up @@ -152,7 +149,6 @@ sniffio==1.3.1
soupsieve==2.8
SQLAlchemy==2.0.47
stack-data==0.6.3
starlette==0.52.1
terminado==0.18.1
tinycss2==1.4.0
tornado==6.5.2
Expand All @@ -164,7 +160,6 @@ typing_extensions==4.15.0
tzdata==2025.2
uri-template==1.3.0
urllib3==2.5.0
uvicorn==0.41.0
viztracer==1.1.1
watchfiles==1.1.1
wcwidth==0.2.14
Expand Down
3 changes: 0 additions & 3 deletions src/application/contracts/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,10 @@
from .file_path_service_interface import IFilePathService
from .fkb_service_interface import IFKBService
from .monitoring_storage_service import IMonitoringStorageService
from .mvt_service_interface import IMVTService
from .open_street_map_file_service_interface import IOpenStreetMapFileService
from .open_street_map_service_interface import IOpenStreetMapService
from .release_service_interface import IReleaseService
from .stac_io_service_interface import IStacIOService
from .stac_service_interface import IStacService
from .test_dataset_service_interface import ITestDatasetService
from .tile_api_service_interface import ITileApiService
from .tile_service_interface import ITileService
from .vector_service_interface import IVectorService
11 changes: 0 additions & 11 deletions src/application/contracts/bytes_service_interface.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
from abc import ABC, abstractmethod
from pathlib import Path

import pandas as pd
import geopandas as gpd
Expand Down Expand Up @@ -62,13 +61,3 @@ def convert_df_to_parquet_bytes(df: pd.DataFrame | gpd.GeoDataFrame) -> bytes:
"""
raise NotImplementedError

@staticmethod
@abstractmethod
def convert_pmtiles_to_bytes(path: Path) -> bytes:
"""
Converts a PMTiles file to a byte array.
:param path: Path to the PMTiles file.
:return: Byte array representation of the PMTiles file.
:rtype: bytes
"""
raise NotImplementedError
Loading
Loading