# Spatial Data Structures
- KD-Trees for nearest neighbor search
- Real examples: Location search, Recommendation systems

In [None]:
import numpy as np
from scipy.spatial import KDTree, cKDTree
import matplotlib.pyplot as plt
print('Spatial data structures module loaded')

## KD-Tree Basics
**Purpose**: Fast nearest neighbor search in multi-dimensional space

**Complexity**:
- Build: O(n log n)
- Query: O(log n) average
- Brute force: O(n)

**Use when**: Need repeated nearest neighbor queries

**cKDTree**: C implementation, faster

In [None]:
# Random points in 2D
np.random.seed(42)
points = np.random.rand(1000, 2) * 100

print(f'Building KD-Tree')
print(f'  Points: {len(points)}')
print(f'  Dimensions: {points.shape[1]}\n')

# Build tree
tree = KDTree(points)
print(f'Tree built successfully')
print(f'  Tree size: {tree.n} points')
print(f'  Tree depth: ~{int(np.log2(tree.n))}')

## Nearest Neighbor Query
Find k nearest neighbors to a query point

**Methods**:
- `query(x, k)`: k nearest neighbors
- `query_ball_point(x, r)`: All within radius r
- `query_pairs(r)`: All pairs within distance r

In [None]:
# Query point
query_point = np.array([50, 50])
k = 5

print(f'Query: Find {k} nearest neighbors to {query_point}\n')

# k-NN search
distances, indices = tree.query(query_point, k=k)

print(f'Results:')
for i, (dist, idx) in enumerate(zip(distances, indices)):
    print(f'  {i+1}. Point {idx}: {points[idx]}, distance={dist:.2f}')

## Range Search
Find all points within radius r

In [None]:
# Range query
radius = 10
indices_range = tree.query_ball_point(query_point, r=radius)

print(f'Range search: radius={radius}')
print(f'  Found {len(indices_range)} points within range')

## Real Example: Store Locator
Find nearest stores to customer location
Applications: Retail, logistics, delivery

In [None]:
# Store locations (lat, lon in simplified coordinates)
np.random.seed(42)
store_locations = np.random.rand(100, 2) * 50  # 100 stores
store_names = [f'Store_{i:03d}' for i in range(100)]

print('Store Locator System')
print(f'  Total stores: {len(store_locations)}\n')

# Build KD-Tree
store_tree = KDTree(store_locations)

# Customer location
customer_loc = np.array([25.5, 30.2])
print(f'Customer location: {customer_loc}')
print(f'Find 3 nearest stores:\n')

# Find nearest stores
dists, indices = store_tree.query(customer_loc, k=3)

for i, (dist, idx) in enumerate(zip(dists, indices)):
    print(f'  {i+1}. {store_names[idx]}')
    print(f'     Location: {store_locations[idx]}')
    print(f'     Distance: {dist:.2f} km\n')

## Real Example: Recommendation System
Find similar users/items
Collaborative filtering

In [None]:
# User feature vectors (100 users, 20 features)
np.random.seed(42)
n_users = 100
n_features = 20
user_features = np.random.randn(n_users, n_features)

# Normalize (cosine similarity)
user_features = user_features / np.linalg.norm(user_features, axis=1, keepdims=True)

print('Recommendation System')
print(f'  Users: {n_users}')
print(f'  Features per user: {n_features}\n')

# Build tree
user_tree = KDTree(user_features)

# Find similar users
target_user = 42
k_similar = 5

print(f'Find users similar to User_{target_user}:')
dists, similar_users = user_tree.query(user_features[target_user], k=k_similar+1)

# Skip first (self)
for i, (dist, user_id) in enumerate(zip(dists[1:], similar_users[1:])):
    similarity = 1 - dist  # Convert distance to similarity
    print(f'  {i+1}. User_{user_id}: similarity={similarity:.3f}')

print('\nUse similar users to recommend items!')

## Summary

### KD-Tree Functions:
```python
from scipy.spatial import KDTree

# Build tree
tree = KDTree(points)

# k nearest neighbors
distances, indices = tree.query(query_point, k=5)

# Range search
indices = tree.query_ball_point(query_point, r=radius)

# All pairs within distance
pairs = tree.query_pairs(r=distance)
```

### When to Use:
✓ **Repeated queries**: Amortize build cost  
✓ **High dimensions**: Up to ~20 dims (curse of dimensionality)  
✓ **Static data**: Tree doesn't change  

### cKDTree vs KDTree:
- **cKDTree**: Faster (C implementation), use by default
- **KDTree**: Pure Python, easier to debug

### Applications:
✓ **GIS**: Nearest store, route planning  
✓ **Machine Learning**: k-NN classifier, outlier detection  
✓ **Recommendation**: Similar users/items  
✓ **Graphics**: Point cloud processing, collision detection  
✓ **Astronomy**: Star matching, galaxy clustering  