## Defining a KDE Function to Evaluate Six Dimensional Position-Velocity Space

**What is a KDE?**

In a kernel density estimation, each point of data is transformed into individual kernels which combine together to give a smooth probability density function for the input parameter(s). This proves useful as a regular histogram-style density estimate would not be differentiable but a kernel density estimate is.

**Using `sklearn.neighbors.KernelDensity`:**

There are six kernels that are currently avaliable with this module (gaussian, tophat, epanechnikov, exponential, linear, cosine). The methods from this module that were used are as follows:

`fit` takes in an NxM matrix of N data points and M parameters and fits them to the specified kernel and bandwidth.

`score_samples` takes in an array of points that are being queried and applies these points to the previously `fit` data. The input would be a QxM matrix of Q sets of points and M parameters and the output gives a 1xQ array of logarithmic probabilities at each of the points.

For more practical purposes the function converts the logarithmic probabilities to standard by taking an exponential. For the complete original documentation:

http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity

In [1]:
#Importing the required modules
import numpy as np
from sklearn.neighbors import KernelDensity

#Defining a KDE function to quickly compute probabilities for the data set
def KDE(inputs, samples, ker, bw):
    """
    Takes an NxM matrix for inputs and a QxM matrix for samples, a string for ker and a float for bw to output
    a 1xQ array of density values.
    
    Args:
        inputs (ndarray): NxM matrix, N = # of data points, M = # of parameters.
        samples (ndarray): QxM matrix, Q = # of points being evaluated, M = # of parameters.
        ker (string): One of the 6 avaliable kernel types (gaussian, tophat, epanechnikov, exponential, linear, cosine)
        bw (float): Bandwidth of the kernel as a dimensionless float.
    Returns:
        dens (ndarray): 1xQ array of density values for Q data points.
    
    """
    kde = KernelDensity(kernel=ker, bandwidth=bw).fit(inputs) #Fit data points to selected kernel and bandwidth
    log_dens = kde.score_samples(samples)                     #Get the log density for the selected samples
    dens = np.exp(log_dens)                                   #Apply exponential to get normal density from log
    return dens                                               #Return a 1xQ array of probabilities

The following section tests this function on some mock data. For more details about the setup for the mock data, see `sampling_R^6_to_R^6.ipynb` by Michael Poon.

In [29]:
#Testing with mock data
import random

mock_data3 = [] # tuples in 6 dimensions
for i in range(10):
    select_random = np.linspace(1.0, 10.0, 100) # 1.0 and 10.0 are arbitrary
    x4 = random.choice(select_random)
    x5 = random.choice(select_random)
    x6 = random.choice(select_random)
    x1 = 1
    x2 = 3 + x4 + 2*x5 + 3*x6 
    x3 = -3 - 2*x2 - 3*x2 - 4*x2
    mock_data3.append([x1, x2, x3, x4, x5, x6])
print(mock_data3)

[[1, 38.72727272727273, -351.5454545454545, 3.7272727272727275, 1.2727272727272727, 9.818181818181818], [1, 44.0, -399.0, 1.0, 7.181818181818182, 8.545454545454547], [1, 47.90909090909091, -434.18181818181813, 1.8181818181818183, 9.545454545454545, 8.0], [1, 26.0, -237.0, 5.0, 3.8181818181818183, 3.4545454545454546], [1, 37.09090909090909, -336.81818181818187, 1.5454545454545454, 2.909090909090909, 8.90909090909091], [1, 44.45454545454545, -403.0909090909091, 7.090909090909091, 9.0, 5.454545454545455], [1, 32.81818181818182, -298.3636363636364, 1.0909090909090908, 4.272727272727273, 6.7272727272727275], [1, 35.81818181818182, -325.3636363636364, 2.7272727272727275, 9.454545454545455, 3.7272727272727275], [1, 23.272727272727273, -212.45454545454544, 8.09090909090909, 3.6363636363636362, 1.6363636363636362], [1, 26.81818181818182, -244.3636363636364, 7.2727272727272725, 1.7272727272727273, 4.363636363636363]]


In [33]:
a = np.array(mock_data3)
samples_random = [[1, 2, 3, 4, 5, 6]]
samples_all = [[1, 38.72727272727273, -351.5454545454545, 3.7272727272727275, 1.2727272727272727, 9.818181818181818], [1, 44.0, -399.0, 1.0, 7.181818181818182, 8.545454545454547], [1, 47.90909090909091, -434.18181818181813, 1.8181818181818183, 9.545454545454545, 8.0], [1, 26.0, -237.0, 5.0, 3.8181818181818183, 3.4545454545454546], [1, 37.09090909090909, -336.81818181818187, 1.5454545454545454, 2.909090909090909, 8.90909090909091], [1, 44.45454545454545, -403.0909090909091, 7.090909090909091, 9.0, 5.454545454545455], [1, 32.81818181818182, -298.3636363636364, 1.0909090909090908, 4.272727272727273, 6.7272727272727275], [1, 35.81818181818182, -325.3636363636364, 2.7272727272727275, 9.454545454545455, 3.7272727272727275], [1, 23.272727272727273, -212.45454545454544, 8.09090909090909, 3.6363636363636362, 1.6363636363636362], [1, 26.81818181818182, -244.3636363636364, 7.2727272727272725, 1.7272727272727273, 4.363636363636363]]

In [35]:
KDE(a, samples_all, 'epanechnikov', 1)

array([0.07740368, 0.07740368, 0.07740368, 0.07740368, 0.07740368,
       0.07740368, 0.07740368, 0.07740368, 0.07740368, 0.07740368])