# Hexagonal indexing

https://eng.uber.com/h3/

## Uber's H3

* Uber works with a lot of geographical data. They open sourced one of their geo indexing libraries which devides a 2d space into hexagons possible in multiple granularities:
  * Python wrapper is available with and intuitive API. Javascript binding as well
  * Each lat, lng can be mapped in O(1) time to a hexagon id (hash) with a certain resolution
  * Hexagon traversal is fast O(1) (operations like parent hexagon, child hexagons or adjacent hexagons)
  * Only 2 Dimentional indexing over a sphere (planet earth). Nothing against using the same method over other 2 dimentional spaces
  * In memory indexing (index not stored on disk) Unlike trees-based indexes which can be found in DBMS like postgis BUT since a hash id is produced, it can be used with any conventional indexes in any Database

<img src="img/h3.png"></img>
<img src="img/h3splitting.png"></img>

## Why Hexagons?

* Only three polygons tile together without gaps
* Ability to naturaly devide a sphere's surface
* Adjacent hexagons are equaly far
* Traversal using only bitwise operations
* They are not perfect though:
  * A hexagon is not perfectly devidable to smaller hexagons
  * Hexagons cannot perfectly cover planet earth (squares can)

<img src="img/hexa.png"></img>

## Side note: Google's S2

https://s2geometry.io/

* Can be used for 2 Dimentioanl indexing on a sphere
* In memory

## installation

In [1]:
!pip install h3

You should consider upgrading via the '/Users/fr27lv/opt/anaconda3/bin/python -m pip install --upgrade pip' command.[0m


In [2]:
import h3

## Basic use

### Indexing

In [3]:
# Get the id of the hexagon from lng/lat
h3.geo_to_h3(lat=50, lng=8, resolution=9)

'891faec4d37ffff'

* **Notice** the string. It is actually a 64bit number. It could be indexed using conventional database tree indecies.
* This enable fast prefix queries for different resolutions.

In [4]:
# Get the lat/lng of the center of a hexagon from its id
h3.h3_to_geo('891faec4d37ffff')

(49.99965438361455, 8.002389024210904)

In [5]:
# Get the polygon boundary of the hexagon from its id
h3.h3_to_geo_boundary('891faec4d37ffff', geo_json=False)

((50.00023096421835, 7.999905575883487),
 (49.998562822637005, 8.000148699309813),
 (49.99798621187932, 8.002632090638492),
 (49.999077737473364, 8.004872466948562),
 (50.000745899300895, 8.004629446418397),
 (50.001322515289395, 8.0021459466788))

### Traversal

In [6]:
# Get id of parent hexagon (hexagon in one resolution lower)
h3.h3_to_parent('881faec4d3fffff')

'871faec4dffffff'

In [7]:
# Get id of child hexagons (hexagon in one resolution higher)
h3.h3_to_children('891faec4d37ffff')

{'8a1faec4d347fff',
 '8a1faec4d34ffff',
 '8a1faec4d357fff',
 '8a1faec4d35ffff',
 '8a1faec4d367fff',
 '8a1faec4d36ffff',
 '8a1faec4d377fff'}

## Examples online

https://towardsdatascience.com/fast-geospatial-indexing-with-h3-90e862482585

https://observablehq.com/@nrabinowitz/h3-radius-lookup

How to query using h3 as an index

In [8]:
import pandas as pd
import numpy as np

res = 10

## Example: Search for nearest points

* Given 10000000 latitudes and longitudes (points on earth)
* Find the closest points to one randomly chosen point

In [9]:
%%time

# Our data points randomly generated
data = pd.DataFrame({
    "lat": np.random.uniform(-90, 90, 10000000),
    "lng": np.random.uniform(-180, 180, 10000000)
})
data["hexa"] = data.apply(lambda x:h3.geo_to_h3(resolution=res, **x), axis=1)
data = data.set_index("hexa")

hexas = set(data.index)

CPU times: user 2min 19s, sys: 2.23 s, total: 2min 21s
Wall time: 2min 23s


Unnamed: 0_level_0,lat,lng
hexa,Unnamed: 1_level_1,Unnamed: 2_level_1
8a033028e49ffff,87.57781,-162.796147


In [10]:
def nearest_point(dat, target):
    """Return points in dat nearest to target"""
    radius = 1
    
    k_ring = h3.k_ring(target, radius)
    cross = dat & k_ring
    while len(cross) <= 1:
        k_ring = h3.k_ring(target, radius)
        cross = dat & k_ring
        radius *= 2
    return cross

In [11]:
def query(dat, target, radius=1):
    """Return points within hexagon radius"""

    k_ring = h3.k_ring(target, radius)
    cross = dat & k_ring
    
    return cross

In [37]:
# Target point. To search for points near it
targets = data.sample(100).index

In [38]:
%%time
for target in targets:
    nearest_point(hexas, target)

CPU times: user 707 ms, sys: 5.44 ms, total: 713 ms
Wall time: 712 ms


In [49]:
from scipy.spatial import KDTree

In [50]:
tree = KDTree(data.values)

In [59]:
target = (np.random.uniform(-90, 90), np.random.uniform(-180, 180))

In [65]:
%%time

d, r = tree.query(target)

#data.iloc[r, :], target

CPU times: user 278 µs, sys: 68 µs, total: 346 µs
Wall time: 320 µs


## Comparisons

* https://h3geo.org/docs/comparisons/s2/ good source to explore the latest in spatial indexing

## H3 Benefits
* Decoupling spatial indexing from database systems
* Streaming: [Here](https://eng.uber.com/building-scalable-streaming-pipelines/) Uber shows how they use hex ids in a streaming setup to calculate their pricing features in realtime
* Geared towards specific application (fault tolerance)
* Saving disk space
* Works with [kepler.gl](https://kepler.gl)
* Looks good when visualised