NearestNeighborsTest.jl

A Julia package for nearest-neighbor distance (NND) permutation tests — testing whether a subpopulation is more tightly clustered than expected by chance, using within-group k-NN distances.

Key feature: optional CUDA GPU acceleration for dramatic speedups (10–100×) on large problems.

Installation

using Pkg
Pkg.add(url="https://github.com/kchu25/NearestNeighborsTest.jl")

Quick Start

Single NND Permutation Test

using NearestNeighborsTest, Random

# Create some data: background of 50,000 points, 500 are clustered
rng = MersenneTwister(1)
n_bg = 50_000
background = rand(rng, n_bg)
subpop_positions = sort(randperm(rng, n_bg)[1:500])
for p in subpop_positions
    background[p] = 0.5 + 0.01 * randn(rng)  # tight cluster around 0.5
end

# CPU test
result = nnd_permutation_test_1d(subpop_positions, background; k=30, B=5_000)
println("k=$(result.k)  obs_mNND=$(result.obs_mNND)  p=$(result.p_value)")

# GPU test (requires NVIDIA GPU)
result_gpu = nnd_permutation_test_1d(subpop_positions, background; k=30, B=5_000, cuda=true)
println("GPU: k=$(result_gpu.k)  p=$(result_gpu.p_value)")

What Is the NND Test?

The nearest-neighbor distance (NND) permutation test asks: Is this subpopulation more tightly clustered than a random sample?

The test works on 1-D data and computes the mean within-group k-NN distance. Here's why it's useful:

Illustration: Clustered vs Random

Left: A clustered subpopulation (red dots) is packed together, giving small within-group k-NN distances → small p-value (significant).

Right: A random subpopulation (blue dots) is scattered throughout the background → large p-value (not significant).

Sensitivity Analysis (Multiple k Values)

ks = [5, 10, 15, 20, 25, 30]
results = nnd_sensitivity_batch_1d(subpop_positions, background; ks=ks, B=5_000)
for r in results
    println("k=$(r.k)  obs_mNND=$(round(r.obs_mNND; digits=6))  p=$(r.p_value)")
end

# GPU version
results_gpu = nnd_sensitivity_batch_1d(subpop_positions, background; ks=ks, B=5_000, cuda=true)

GPU Support

Pass cuda=true to either function to run on GPU. The GPU implementation:

Generates all random samples on-device via cuRAND (zero PCIe transfer)
Uses shared-memory bitonic sort + k-NN scan within each CUDA block
One block per permutation — massively parallel

Benchmarks

CPU vs GPU timing across different numbers of permutations (B) and subpopulation sizes (m):

GPU speedup (how many times faster the GPU is):

API Reference

`nnd_permutation_test_1d`

nnd_permutation_test_1d(subpop_positions, background; k=30, B=1_000, seed=42, cuda=false)

Test whether points at subpop_positions within background are more tightly clustered among themselves than a random draw of the same size.

Returns: NamedTuple with fields:

k — the actual k used (may be less than requested if k ≥ m)
obs_mNND — the observed mean k-th nearest-neighbor distance within the subpopulation
p_value — fraction of null permutations whose within-group mNND ≤ obs_mNND; small values indicate significant clustering

`nnd_sensitivity_batch_1d`

nnd_sensitivity_batch_1d(subpop_positions, background; ks, B=1_000, seed=42, cuda=false)

Batched NND test across multiple k values. Shares random draws across all k for efficiency.

Returns: Vector of NamedTuples, one per entry in ks, each with fields:

k — the actual k used (clamped to m-1 if needed)
obs_mNND — observed mean k-th nearest-neighbor distance within the subpopulation
p_value — fraction of null permutations whose within-group mNND ≤ obs_mNND

Method

For a subpopulation of size m drawn from a background of size N, the test statistic is the mean k-th nearest-neighbor distance within the group. The p-value is the fraction of B random null draws whose within-group mNND ≤ the observed value. A small p-value indicates the subpopulation is more clustered than expected by chance.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
assets		assets
docs		docs
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md
make_illustrations.jl		make_illustrations.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NearestNeighborsTest.jl

Installation

Quick Start

Single NND Permutation Test

What Is the NND Test?

Illustration: Clustered vs Random

Sensitivity Analysis (Multiple k Values)

GPU Support

Benchmarks

API Reference

`nnd_permutation_test_1d`

`nnd_sensitivity_batch_1d`

Method

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NearestNeighborsTest.jl

Installation

Quick Start

Single NND Permutation Test

What Is the NND Test?

Illustration: Clustered vs Random

Sensitivity Analysis (Multiple k Values)

GPU Support

Benchmarks

API Reference

nnd_permutation_test_1d

nnd_sensitivity_batch_1d

Method

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`nnd_permutation_test_1d`

`nnd_sensitivity_batch_1d`

Packages