# Marker Clustering with Large Datasets

This notebook demonstrates MapLibreum's marker clustering capabilities with large datasets.

## Overview

MapLibreum can efficiently cluster tens of thousands (or even hundreds of thousands) of markers,
making it practical to visualize large datasets on interactive maps.


In [None]:
import random
from maplibreum import Map
from maplibreum.cluster import MarkerCluster
from maplibreum.core import Marker

## Example 1: Clustering 10,000 Random Points

Let's start with a manageable dataset of 10,000 random points across the United States.

In [None]:
# Create map centered on the US
m1 = Map(center=[-98, 39], zoom=4)

# Create a marker cluster
cluster1 = MarkerCluster(
    name="random_10k",
    radius=50,
    max_zoom=14
)

# Add 10,000 random markers
for i in range(10_000):
    lng = random.uniform(-125, -65)  # US longitude range
    lat = random.uniform(25, 50)     # US latitude range
    marker = Marker(coordinates=[lng, lat], color="#007cbf")
    cluster1.add_marker(marker)

cluster1.add_to(m1)
m1

## Example 2: Using GeoJSON for Larger Datasets

For larger datasets, it's more efficient to use GeoJSON directly with the  method.

Let's create a map with 50,000 points.

In [None]:
# Generate GeoJSON data with 50,000 points
geojson_data = {
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    random.uniform(-125, -65),
                    random.uniform(25, 50)
                ]
            },
            "properties": {"id": i, "name": f"Point {i}"}
        }
        for i in range(50_000)
    ]
}

# Create map with clustered GeoJSON
m2 = Map(center=[-98, 39], zoom=4)
m2.add_clustered_geojson(
    geojson_data,
    radius=60,
    max_zoom=12
)
m2

## Example 3: Tuning Clustering Parameters

You can adjust the clustering behavior using  and  parameters:

- **cluster_radius**: Controls how tightly points are grouped (default: 50 pixels)
- **cluster_max_zoom**: Maximum zoom level at which to cluster (default: 14)

Let's compare different settings with the same dataset.

In [None]:
# Generate a common dataset
test_data = {
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    random.uniform(-125, -65),
                    random.uniform(25, 50)
                ]
            },
            "properties": {"id": i}
        }
        for i in range(20_000)
    ]
}

# Map with tight clustering (smaller radius)
m3a = Map(center=[-98, 39], zoom=4)
m3a.add_clustered_geojson(test_data, radius=30, max_zoom=14)
print("Map with tight clustering (radius=30):")
m3a

In [None]:
# Map with loose clustering (larger radius)
m3b = Map(center=[-98, 39], zoom=4)
m3b.add_clustered_geojson(test_data, radius=80, max_zoom=14)
print("Map with loose clustering (radius=80):")
m3b

## Performance Notes

Based on benchmarking (see  for details):

- **10,000 points**: ~0.008s (instant)
- **50,000 points**: ~0.038s (nearly instant)
- **100,000 points**: ~0.078s (very fast)
- **200,000 points**: ~0.153s (fast)
- **500,000 points**: ~0.380s (good)

MapLibreum can handle very large datasets efficiently thanks to MapLibre GL JS's optimized clustering engine.

## Tips for Working with Large Datasets

1. **Use GeoJSON format**: More efficient than adding individual markers
2. **Adjust cluster_radius**: Larger values create fewer clusters (better performance)
3. **Set cluster_max_zoom**: Lower values cluster at higher zoom levels (better performance)
4. **Consider server-side clustering**: For datasets over 1 million points

## Saving the Map

You can save any of these maps to an HTML file:

In [None]:
# Save the 50k point map
m2.save("large_cluster_map.html")
print("Map saved to large_cluster_map.html")