# Point distributions in n-dimensional spaces, the unusual behavior of distance metics as dimensionality increases

Demonstration of diminished distance ratio between nearest and furthest points as dimensionality increases.
> In recent years, the effect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a effciency and/or effectiveness perspective. Recent research results show that in high dimensional space, the concept of proximity, distance or nearest neighbor may not even be qualitatively meaningful. In this paper, we view the dimensionality curse from the point of view of the distance metrics which are used to measure the similarity between objects. We specifically examine the behavior of the commonly used L k norm and show that the problem of meaningfulness in high dimensionality is sensitive to the value of k. For example, this means that the Manhattan distance metric L(1 norm) is consistently more preferable than the Euclidean distance metric L(2 norm) for high dimensional data mining applications. Using the intuition derived from our analysis, we introduce and examine a natural extension of the L k norm to fractional distance metrics. We show that the fractional distance metric provides more meaningful results both from the theoretical and empirical perspective. The results show that fractional distance metrics can significantly improve the effectiveness of standard clustering algorithms such as the k-means algorithm.

Aggarwal, Charu C., Alexander Hinneburg, and Daniel A. Keim. "On the surprising behavior of distance metrics in high dimensional space." In International conference on database theory, pp. 420-434. Springer, Berlin, Heidelberg, 2001.

## Imports

In [7]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from typing import List

## Distance calculations

In [8]:
def l_sub_p_norm(p1: np.ndarray, p2: np.ndarray, norm_p: int = 2) -> float:
    """
    L1 norm corresponds with Manhattan distance (the distance along each axis summed independently),
    L2 corresponds to the common Euclidean distance, higher and lower order norms correspond to 
    identical calculations with different powers
    
    :param p1: first point (numpy array where each value corresponds to an independent dimension)
    :param p2: second point (numpy array where each value corresponds to an independent dimension)
    :param norm_p: the 'p' of the l sub p norm
    
    :returns: an l sub p distance metric in the n dimensional space
    """
    
    # make sure that the input points are of identical dimensionality
    if p1.shape != p2.shape:
        print('replace with warning')
        
    pass

## Generate normally distributed points in n-dimensional space

In [9]:
def point_generation(n_points: int, dimensions: List[int]) -> np.ndarray:
    pass