GitHub - simonbesnard1/alsdb: alsDB is a Python package for ingesting, storing, and processing Airborne Laser Scanning (ALS/LiDAR) point clouds at scale.

Airborne Laser Scanning point clouds - from LAZ to TileDB at scale

alsDB reads LAZ/LAS files via PDAL, stores them in a TileDB sparse array (locally or on S3-compatible object storage), and provides a full processing pipeline for forest structure and biomass products.

The package is dataset-agnostic — CRS, bounding box, and acquisition year are read directly from the LAZ file header, so any national or global ALS dataset works without custom parsers.

Features


Scalable ingestion	Parallel batch ingest; each tile becomes a TileDB fragment
Multi-temporal	X / Y / Year dimensions; repeated surveys stored and queryable independently
Ingestion manifest	Tracks CRS, bbox, point count and status; re-ingestion is a no-op by default
CRS-aware schema	Domain bounds selected automatically from tile CRS or global fallback
Local + S3 storage	Identical API for filesystem paths and `s3://` URIs (tested on Ceph / RadosGW)
Zarr gridded output	CHM, DTM, DSM, AGB, gap fraction, LAI, and LiDAR metrics written directly to Zarr v3; no GeoTIFF intermediates, no mosaic step
Tiled processing	All products support `tile_size` / `n_workers` for large areas; parallel writes go to non-overlapping Zarr chunks
GEDI simulation	Full-waveform simulation and batch RH metric extraction at GEDI footprint scale
CLI	`alsdb ingest` and `alsdb info` commands

Installation

pixi is the only supported install method — pdal and python-pdal are conda-only packages and cannot be installed via pip alone.

git clone https://github.com/simonbesnard1/alsdb.git
cd alsdb
pixi install          # resolves all conda-forge + pip dependencies in one step
pixi shell            # activate the environment

Note: pip install alsdb will install the pure-Python dependencies but will fail to import without pdal present in your environment.

Dependencies

Package	Purpose
`pdal` / `python-pdal`	LAZ/LAS reading, HAG filter
`tiledb`	Sparse point-cloud storage (local + S3)
`zarr`	Zarr v3 store for gridded products
`numpy` / `scipy`	Array operations, rasterisation, peak detection
`pandas` / `xarray`	Query result formats
`pyarrow`	Parquet I/O for waveform batch results
`pyproj`	CRS parsing from WKT
`rioxarray`	CRS attachment on `to_dataset()` output (optional)
`matplotlib` / `plotly`	Visualisation
`click`	CLI

Quick start

import alsdb
from alsdb import ALSDatabase, ALSProvider

alsdb.setup_logging()

# Ingest
db = ALSDatabase(storage_type="local", uri="my_array")
db.ingest("path/to/tile.laz")                        # year, CRS, bbox from LAZ header
db.ingest_many(paths, max_workers=8)                 # batch — already-ingested files skipped

# Query
reader = ALSProvider(storage_type="local", uri="my_array")
df = reader.query_bbox(308_000, 4_688_000, 310_000, 4_690_000, year=2021)
print(reader.available_years())                      # [2019, 2021, 2023]

Examples

Runnable Jupyter notebooks are in the examples/ folder:

Notebook	What it covers
`01_ingest.ipynb`	Single tile, batch ingest, manifest, S3
`02_processing.ipynb`	CHM / DTM / DSM, gap / LAI, metrics, biomass (Næsset + sklearn), xarray, change detection
`03_waveform.ipynb`	Single footprint, batch simulation, 3-D waterfall, GEDI comparison
`04_visualisation.ipynb`	Point cloud plots, waveform plots, gridded product plots

Full API documentation and a conceptual guide are on Read the Docs.

CLI

# Ingest a tile
alsdb ingest tile.laz my_array

# Ingest to S3
alsdb ingest tile.laz s3://owner.bucket/als_array \
    --storage-type s3 \
    --s3-url https://s3.example.com \
    --s3-access-key KEY --s3-secret-key SECRET

# Show file metadata
alsdb info tile.laz

# Filter to ground + vegetation classes only
alsdb ingest tile.laz my_array -c 2 -c 3 -c 4 -c 5

Architecture

LAZ file
   │
   ▼
PDAL (readers.las)
   │  year / bbox / CRS ← LAZ header
   ▼
ALSTile.iter_chunks()           ← optional classification filter
   │  X, Y, attrs numpy arrays (1 M pts/chunk)
   ▼
ALSDatabase.write()
   │  TileDB sparse array  (X × Y × Year)
   │  ByteShuffle+ZSTD for float64 X/Y; DoubleDelta+ZSTD for int16 Year
   │  allows_duplicates=True  (multiple returns per XY)
   ▼
TileDB array (local  /  s3://)
   │
   ├── ALSProvider.query_bbox()          → pandas / xarray
   │
   ├── processing.chm / gap / biomass
   │      │  PDAL hag_delaunay + scipy binned_statistic_2d
   │      ▼
   │   ALSZarrStore  (Zarr v3, local / s3://)
   │      ├── 1m/   chm, dtm, dsm          (T × ny × nx) float32
   │      └── 10m/  gap, lai, biomass,
   │                h50…density            (T × ny × nx) float32
   │      store.to_dataset(resolution)     → xarray.Dataset (CRS-aware)
   │
   └── processing.waveform               → GEDI-like RH metrics

TileDB schema

Dimension	Type	Role
`X`	float64	UTM easting (m)
`Y`	float64	UTM northing (m)
`Year`	int16	Survey year

18 LAS attributes (Z, Intensity, ReturnNumber, Classification, RGB, …) are stored as TileDB attributes with ZSTD-9 compression. Spatial tile size defaults to 500 × 500 m; domain bounds are selected automatically per CRS.

Each ingested tile becomes a new TileDB fragment. Fragments are consolidated every consolidate_every tiles (default 50) to keep read performance healthy as the array grows.

License

EUPL-1.2 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github		.github
alsdb		alsdb
ci/requirements		ci/requirements
doc		doc
examples		examples
scripts		scripts
tests		tests
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.zenodo.json		.zenodo.json
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Airborne Laser Scanning point clouds - from LAZ to TileDB at scale

Features

Installation

Dependencies

Quick start

Examples

CLI

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Airborne Laser Scanning point clouds - from LAZ to TileDB at scale

Features

Installation

Dependencies

Quick start

Examples

CLI

Architecture

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages