# Density Estimation

## Contents

* [Overview](#overview) 
* [Density estimation](#ekf)
    * [Parametric density estimation](#test_case_1)
    * [Non-parametric density estimation](#test_case_2)
* [References](#refs)

## <a name="ackw"></a> Acknowledgements

This  notebook is based on <a href="https://machinelearningmastery.com/probability-density-estimation/">A Gentle Introduction to Probability Density Estimation</a> by the <a href="https://machinelearningmastery.com/">Machine Learning Mastery</a>

## <a name="overview"></a> Overview

In this section review the concept of <a href="https://en.wikipedia.org/wiki/Density_estimation">density estimation</a>. Density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. The unobservable density function is thought of as the density according to which a large population is distributed; the data are usually thought of as a random sample from that population [1]. 



With density estimation methods, we can represent the data compactly using a parametric distribution e.g. Gaussian or Beta.  This representation allows us to work with huge datasets. However, simple distributions may not be accurate enough to represent the data. As an example consider the following dataset

## <a name="ekf"></a> Density estimation

The first step in density estimation is to create a histogram of the observations in the random sample.

### <a name="test_case_1"></a> Parametric density estimation

### <a name="test_case_2"></a> Non-parametric density estimation

In some cases, a data sample may not resemble a common probability distribution or cannot be easily made to fit the distribution. This is often the case when the data has two peaks (bimodal distribution) or many peaks (multimodal distribution).

In this case, parametric density estimation is not feasible and alternative methods can be used that do not use a common distribution. Instead, an algorithm is used to approximate the probability distribution of the data without a pre-defined distribution, referred to as a nonparametric method. 

Perhaps the most common nonparametric approach for estimating the probability density function of a continuous random variable is called kernel smoothing, or <a href="https://en.wikipedia.org/wiki/Kernel_density_estimation">kernel density estimation</a>, KDE for short.

In this case, a kernel is a mathematical function that returns a probability for a given value of a random variable. The kernel effectively smooths or interpolates the probabilities across the range of outcomes for a random variable such that the sum of probabilities equals one, a requirement of well-behaved probabilities.

The kernel function weights the contribution of observations from a data sample based on their relationship or distance to a given query sample for which the probability is requested.

A parameter, called the smoothing parameter or the bandwidth, controls the scope, or window of observations, from the data sample that contributes to estimating the probability for a given sample. As such, kernel density estimation is sometimes referred to as a Parzen-Rosenblatt window, or simply a Parzen window, after the developers of the method.

- Smoothing Parameter (bandwidth): Parameter that controls the number of samples or window of samples used to estimate the probability for a new point.

A large window may result in a coarse density with little details, whereas a small window may have too much detail and not be smooth or general enough to correctly cover new or unseen examples. The contribution of samples within the window can be shaped using different functions, sometimes referred to as basis functions, e.g. uniform normal, etc., with different effects on the smoothness of the resulting density function.

- Basis Function (kernel): The function chosen used to control the contribution of samples in the dataset toward estimating the probability of a new point.

## <a name="refs"></a> References

1. <a href="https://en.wikipedia.org/wiki/Density_estimation">Density estimation</a>.
2. ```Mathematics for Machine Learning```.