Skip to content

rmrschub/zCurve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zCurve

DOI

zCurve is a Python module with methods to efficiently map multidimensional data to a single dimension while preserving locality of the data points.

This mapping is commonly known as Z-order, Lebesgue curve, Morton space filling curve, Morton order or Morton code.

Image by David Eppstein, 2008

The Morton code of a multi-dimensional data point is calculated by bitwise interlacing the binary representations of its coordinate values.

zCurve provides two functions for handling the encoding and decoding of data points with arbitrary dimensionality and arbitrary coordinate size:

interlace(*data_point: int, dims: int = None, bits_per_dim: int = None) -> int
deinterlace(code_point: int, dims: int = 3) -> List[int]

When handling large multi-dimensional dataset (n > 10.000), zCurve offers some simple but convenient means of parallelizing the Morton encoding and decoding:

par_interlace(data_points: List[List[int]], dims: int = None, bits_per_dim: int = None) -> List[int]
par_deinterlace(code_points: List[int], dims: int = 3) -> List[List[int]]

Given the Morton codes of a multi-dimensional dataset, we can perform multi-dimensional range search using only a one-dimensional data structure. For range searching, zCurve offers two functions for calculating the necesaary LITMAX and BIGMIN values:

prev_morton(code_point: int, rmin_code: int, rmax_code: int, dims: int = 3) -> int
next_morton(code_point: int, rmin_code: int, rmax_code: int, dims: int = 3) -> int

This implementation is based on the following paper

Tropf, Herbert, and Helmut Herzog. "Multidimensional Range Search in Dynamically Balanced Trees." ANGEWANDTE INFO. 2 (1981): 71-77.

and it makes heavy use of the excellent gmpy2 module.

Installation

pip install zCurve

Usage

Basics

import zCurve as z 

imports the module.

code = z.interlace(2,16,8)

interlaces the 3D data point (2,16,8) into Morton code point 10248.

When explicitly specify dimensionality and bits per dimension of your data point

code = z.interlace(2,16,8, dims=3, bits_per_dim=5)

performance will benefit substantially.

z.deinterlace(4711)

deinterlaces the Morton code point 4711 into the 3D data point (29,1,3).

Parallel interlacing/deinterlacing

Given a potentially large list of n-dimensional data_points

from random import randrange

bit_size = 16
max_val = 2**bit_size - 1
no_samples = 10**6
data_points = [(randrange(0, max_val), randrange(0, max_val), randrange(0, max_val)) for i in range(no_samples)]

we can speed up things by using par_interlace and par_deinterlace

morton_codes = z.par_interlace(data_points, dims=3, bits_per_dim=16)
data_points == z.par_deinterlaces(morton_codes, dims=3)

Range searching

Image by Tropf and Herzog, 1981

When range searching, we can prune the search space by calculating BIGMIN (aka "GetNextZ-address") and LITMAX (aka "GetPrevZ-address") values.

point = z.interlace(6, 3, dims=2)  # => 30
rmin = z.interlace(5, 3, dims=2)   # => 27
rmax = z.interlace(10, 5, dims=2)  # => 102

BIGMIN = z.next_morton(point, rmin, rmax, dims=2) # => 31
LITMAX = z.prev_morton(point, rmin, rmax, dims=2) # => 27

In addition, we can easily check if a given Morton code point is within a specified range

z.in_range(58,27,102, dims=2) # => False
z.in_range(49,27,102, dims=2) # => True

Citation

@misc{rmrschub_2021_zCurve,
    author       = {René Schubotz},
    title        = {{zCurve: Multi-dimensional indexing using Morton space filling curves.}},
    month        = may,
    year         = 2021,
    doi          = {10.5281/zenodo.4777584},
    version      = {0.0.4},
    publisher    = {Zenodo},
    url          = {https://github.com/rmrschub/zCurve}
    }

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

About

zCurve maps multidimensional data to one dimension while preserving locality of the data points.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages