Performing non-parametric Parzen Window estimation on the 'ted_main.csv' dataset, both from scratch and using built-in functions. The duration column of the dataset is extracted to estimate its distribution using Parzen Window with Gaussian kernel.
Here is the implementation of the Parzen Window with Gaussian kernel from scratch.
def ParzenWindow_est(x,x_samples,h):
k=0
for xi in x_samples:
k += norm.pdf((x-xi)/h,loc=0,scale=1)
p=k/(len(x_samples)*h)
return p
Below are the results of the distribution estimation for various window widths (h).
h | 10 | [20, 50, 100] |
---|---|---|
Estimation |
Window width (h) is also known as the smoothing factor. As you can observe, increasing h leads to a smoother distribution.
from sklearn.neighbors import KernelDensity
The table below displays the sample set sizes, ranging from 250 samples to the entire dataset, with increments of 250.
10% of the Dataset | 20% of the Dataset | 29% of the Dataset | 39% of the Dataset | 49% of the Dataset |
---|---|---|---|---|
59% of the Dataset | 69% of the Dataset | 78% of the Dataset | 88% of the Dataset | 98% of the Dataset |
It can be inferred that the larger the sample set, the smoother the distribution becomes, eventually converging to the actual distribution.
- Course: Machine Learning [ECE 501]
- Semester: Spring 2023
- Institution: School of Electrical & Computer Engineering, College of Engineering, University of Tehran
- Instructors: Dr. A. Dehaqani, Dr. Tavassolipour