# Sparkgeo GoGeomatics Workshop


Welcome to this workshop focused on Cloud Optimized Point Clouds (COPC), SpatioTemporal Asset Catalog (STAC), and the Pointcloud Data Abstraction Library (PDAL). We will cover the role of COPC and STAC in geospatial cloud systems and dive into using PDAL for creating cloud-native metadata. You'll learn practical methods for managing and processing large point cloud datasets in the cloud through demonstrations.


### Cloud Optimized Point Cloud (COPC)

A Cloud Optimized Point Cloud (COPC) is a point cloud data format engineered for efficient access and processing in cloud-based systems. Unlike traditional LiDAR formats like LAS or LAZ, COPC incorporates a hierarchical structure that enables faster and more precise spatial queries. This hierarchical metadata allows you to read specific subsets of large datasets without scanning the entire file.

COPC files are especially advantageous for large-scale, distributed computing scenarios where data is stored in cloud infrastructure. The format optimizes both storage and query performance, making it an ideal choice for applications requiring real-time data retrieval and analysis of massive point cloud datasets.

![COPC chunk table](./reference_material/copc-vlr-chunk-table-illustration.png)

### PDAL (Point Data Abstraction Library)

PDAL (Point Data Abstraction Library) is an open-source library designed for translating and processing point cloud data. It provides a standardized way to handle various point cloud formats, including LAS, LAZ, and more recently, Cloud Optimized Point Clouds (COPC).

Comparable to GDAL in the raster and vector data domains, PDAL serves as a crucial tool for geospatial applications involving point clouds. It allows for easy customization and extension, enabling the development of tailored point cloud processing workflows.

# Workshop Material

We will use the NRCAN COPC datasets for our examples in this notebook.
Let's have a look at LAS vs COPC datasets

See the NRCAN download site for some [COPC datasets around the Oil Sands](https://download-telecharger.services.geo.ca/pub/elevation/pointclouds_nuagespoints/NRCAN/Oil_Sands_2017_2/).

In [1]:
import json
import pdal 
import time

# Define the PDAL pipeline to read from the NRCan COPC repository (stored on AWS s3)
# Fort McMurray COPC File.
pipeline = {
    "pipeline": [
        {
            "type": "readers.copc",
            "filename": "https://ftp-maps-canada-ca.s3.amazonaws.com/pub/elevation/pointclouds_nuagespoints/NRCAN/Fort_McMurray_2018/AB_FortMcMurray2018_20180518_NAD83CSRS_UTMZ12_1km_E4760_N62940_CQL1_CLASS.copc.laz"
        },
        {
            "type": "filters.head",
            "count": 0
        }
    ]
}

# Convert the Python dictionary to a JSON string
pipeline_json = json.dumps(pipeline)

# Execute the PDAL pipeline
pipeline = pdal.Pipeline(pipeline_json)

Compare the speeds of opening a file `pdal.execute()` vs reading just the header of the file `pdal.quickinfo`.

In [2]:
# Test the speeds of openning vs quicklook
start_time_quickinfo = time.time()
quickinfo = pipeline.quickinfo
end_time_quickinfo = time.time()

time_taken_quickinfo = round(end_time_quickinfo - start_time_quickinfo, 1)

start_time_execute = time.time()
info = pipeline.execute()
end_time_execute = time.time()

time_taken_execute = round(end_time_execute - start_time_execute, 1)

print(f"Time taken by pdal.Pipeline.execute(): {time_taken_execute} seconds")
print(f"Time taken by pdal.quickinfo: {time_taken_quickinfo} seconds")

Time taken by pdal.Pipeline.execute(): 150.5 seconds
Time taken by pdal.quickinfo: 5.2 seconds


Curl failure: Timeout was reached
Curl failure: Timeout was reached
Curl failure: Timeout was reached


In [3]:
# Read some basic stats
copc_metadata = pipeline.metadata['metadata']['readers.copc']['copc']
count_metadata = pipeline.metadata['metadata']['readers.copc']['count']
srs_metadata = pipeline.metadata['metadata']['readers.copc']['srs']['json']['name']

# Display the metadata
print(f"COPC Metadata: {copc_metadata}")
print(f"Point Count: {count_metadata}")
print(f"Spatial Reference System: {srs_metadata}")

COPC Metadata: True
Point Count: 12180055
Spatial Reference System: NAD83(CSRS) / UTM zone 12N


In [4]:
# what other properties are available?
pipeline.metadata['metadata']['readers.copc']

{'comp_spatialreference': 'PROJCS["NAD83(CSRS) / UTM zone 12N",GEOGCS["NAD83(CSRS)",DATUM["NAD83_Canadian_Spatial_Reference_System",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6140"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4617"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-111],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","2956"]]',
 'compressed': True,
 'copc': True,
 'copc_info': {'center_x': 476499.995,
  'center_y': 6294499.995,
  'center_z': 830.715,
  'gpstime_maximum': 2.390892103e-258,
  'gpstime_minimum': 210693071.1,
  'halfsize': 499.995,
  'root_hier_offset': 82152445,
  'root_hier_size': 11424,
  'spacing': 6.802653061},
 'count': 12180055,
 '


That is good for getting some metadata and understanding the COPC data we are working with. Let's visualize it.

In [5]:
# !pip install pyproj
import pyproj

# Create a geojson bbox from the PDAL bounds
def generate_geojson_bbox(metadata):
    # Extract bounds and source CRS
    minx = metadata['minx']
    miny = metadata['miny']
    maxx = metadata['maxx']
    maxy = metadata['maxy']
    source_crs_wkt = metadata['comp_spatialreference']
    
    # Initialize coordinate transformation
    source_crs = pyproj.CRS.from_wkt(source_crs_wkt)
    target_crs = pyproj.CRS.from_epsg(4326)  # WGS84
    transformer = pyproj.Transformer.from_crs(source_crs, target_crs, always_xy=True)
    
    # Transform corners of the bounding box to WGS84
    minx, miny = transformer.transform(minx, miny)
    maxx, maxy = transformer.transform(maxx, maxy)
    
    # Generate GeoJSON Polygon to represent the bounding box
    geojson_bbox = {
        "type": "Feature",
        "geometry": {
            "type": "Polygon",
            "coordinates": [[
                [minx, miny],
                [maxx, miny],
                [maxx, maxy],
                [minx, maxy],
                [minx, miny]
            ]]
        }
    }
    
    return json.dumps(geojson_bbox, indent=4)

In [8]:
# !pip install folium
import folium

# Convert JSON string to dictionary and get the bounds
copc_metadata = pipeline.metadata['metadata']['readers.copc']
pdal_bounds = generate_geojson_bbox(copc_metadata)
pdal_bounds_dict = json.loads(pdal_bounds)

coordinates = pdal_bounds_dict['geometry']['coordinates'][0]
center_lat = (coordinates[0][1] + coordinates[2][1]) / 2
center_lon = (coordinates[0][0] + coordinates[2][0]) / 2
center = [center_lat, center_lon]

m = folium.Map(location=center, zoom_start=11)
folium.GeoJson(pdal_bounds).add_to(m)
m