#  Name -> Deven Chhajed
# Roll No-> 32
# Batch -> B1 (CSE)
# Prn -> 1032210789
# Noisy Data
# Data Smoothening: Binning Types

# Importing the Libraries

**import numpy as np:** By including 'import numpy as np' in Python, you gain access to NumPy, a vital numerical computing library. NumPy provides robust support for arrays, matrices, and mathematical functions, making it a cornerstone for scientific and mathematical computations.

**import pandas as pd**: Pandas, a Python library for data manipulation and analysis, offers essential data structures such as DataFrames and Series to facilitate efficient data handling and analysis.

**import matplotlib.pyplot as plt:** matplotlib.pyplot simplifies the creation of diverse plots like lines, bars, scatter plots, and histograms, supporting static, animated, or interactive data visualizations.

**import math:** By importing the 'math' module in Python, you can tap into a comprehensive collection of mathematical functions and constants. These include trigonometric functions, logarithms, and fundamental mathematical constants like pi and e, all of which are essential for a wide array of mathematical calculations and operations.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math

## Data Smoothening:
Data smoothing is a technique used to reduce noise in a dataset by applying mathematical methods. It involves creating a smoother version of the data to reveal underlying patterns or trends while removing random fluctuations. Common methods include moving averages, exponential smoothing, and filters, each with its own way of reducing data noise. Data smoothing is useful in various fields, such as signal processing and time series analysis, to improve data interpretation and analysis.

## Data Smoothening Importance:
* Noise Reduction: Reduces random fluctuations in data.

* Visualization: Enhances data visualization and interpretation.

* Analysis: Stabilizes statistical analyses and machine learning.

* Forecasting: Improves prediction accuracy in time series data.

* Control Systems: Essential for stabilizing control systems.

* Signal Processing: Filters out unwanted noise in signals.

* Market Analysis: Aids trend identification and decision-making.

* Data Quality: Cleans data by removing outliers.

* Sensor Data: Ensures accurate readings in IoT and sensors.

## Data Smoothening Techniques:

**1. Equal Width Binning:** Equal width binning is a data preprocessing technique that involves dividing continuous data into a specified number of equal-width intervals or bins.
Binning by mean is a data preprocessing technique that involves grouping data points into bins or intervals based on their proximity to the mean (average) value of the data. This technique is used to create bins where each bin's center corresponds to the mean value of the data within that bin.


**2. Binning by Mean:** Binning by mean is a data preprocessing technique that involves grouping data points into bins or intervals based on their proximity to the mean (average) value of the data. This technique is used to create bins where each bin's center corresponds to the mean value of the data within that bin.

**3. Equal Frequency Binning:** Equal frequency binning, also known as equi-depth or quantile binning, is a data preprocessing technique used to discretize continuous numerical data into a set of bins or intervals such that each bin contains approximately the same number of data points. This technique is particularly useful when you want to ensure that each bin represents an equal portion of the dataset, making it suitable for handling skewed data distributions.


**4. Custom Binning:** Custom binning, also known as manual or user-defined binning, is a data preprocessing technique where you define bins or intervals based on your domain knowledge, specific requirements, or insights about the data. Unlike other binning techniques that use automated rules, custom binning allows you to group data points into bins according to your expertise and understanding of the data.

**5. Binning by Boundary:** Binning by boundary is a data preprocessing technique that involves dividing a dataset into bins or intervals based on predefined boundaries or thresholds. Instead of using statistical measures or automated rules, binning by boundary relies on specific values you choose to separate the data into meaningful categories or ranges.

**5. Binning by Median:** Binning by median is a data preprocessing technique that involves dividing a dataset into bins or intervals based on the median value of the data. This approach creates bins that balance the data distribution around the median, which is the middle value of a sorted dataset.




In [3]:
d = [0, 4, 12, 16, 16, 18, 24, 26, 28]
d

[0, 4, 12, 16, 16, 18, 24, 26, 28]

In [4]:
d.sort()
d

[0, 4, 12, 16, 16, 18, 24, 26, 28]

# Binning by Equal Frequency:

In [5]:
r=max(d)-min(d)
print('Range of list is: ',r)

Range of list is:  28


In [6]:
b=r/len(d)
bins=math.floor(b)
bins

3

In [7]:
bin1=np.zeros((bins,bins))
bin1

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [9]:
a=0
while(a<len(d)):
	for i in range(0,bins):
		for j in range(0, bins):
			bin1[i, j] = d[a]
			a += 1

In [10]:
bin1

array([[ 0.,  4., 12.],
       [16., 16., 18.],
       [24., 26., 28.]])

# Binning by Mean:

In [11]:
bin2=np.zeros((bins,bins))
bin2

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [12]:
for i in range(0,len(d),bins):
	mean=(d[i]+d[i+1]+d[i+2])/bins
	k=int(i/bins)
	for j in range(0,bins):
		bin2[k,j]=mean
bin2

array([[ 5.33333333,  5.33333333,  5.33333333],
       [16.66666667, 16.66666667, 16.66666667],
       [26.        , 26.        , 26.        ]])

# Binning by Bin Boundary

In [13]:
bin3=np.zeros((bins,bins))
bin3

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [15]:
for i in range (0,len(d),bins):
    k=int(i/bins)
    for j in range (bins):
        if (d[i+j]-d[i]) < (d[i+2]-d[i+j]):
            bin3[k,j]=d[i]
        else:
            bin3[k,j]=d[i+2]
bin3

array([[ 0.,  0., 12.],
       [16., 16., 18.],
       [24., 28., 28.]])