
#**Binning in Python**
---
Binning method is used to smoothing data or to handle noisy data. In this method, the data is first sorted and then the sorted values are distributed into a number of buckets or bins. As binning methods consult the neighborhood of values, they perform local smoothing.

There are basically two types of binning approaches –

* **Equal width (or distance) binning**: The simplest binning approach is to partition the range of the variable into k equal-width intervals. The interval width is simply the range [A, B] of the variable divided by k,
$$w = (B-A) / k$$
Thus, ith interval range will be [A + (i-1)w, A + iw] where i = 1, 2, 3…..k. Skewed data cannot be handled well by this method.

* **Equal depth (or frequency) binning**: In equal-frequency binning we divide the range [A, B] of the variable into intervals that contain (approximately) equal number of points; equal frequency may not be possible due to repeated values.

There are three approaches to perform smoothing:

* **Smoothing by bin means:** In smoothing by bin means, each value in a bin is replaced by the mean value of the bin.
* **Smoothing by bin median:** In this method each bin value is replaced by its bin median value.
* **Smoothing by bin boundary:** In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by the closest boundary value.

### **Steps**

1. Sort the array of given data set.
2. Divides the range into N intervals, using equal depth or equal width partitioning.
3. Smooth data by finding either the mean, median or boundaries of each bin.





In [29]:
# equal frequency
def equidepth(arr1, m):
  a = len(arr1)
  n = int(a / m)
  result=[]
  for i in range(0, m):
    arr = []
    for j in range(i * n, (i + 1) * n):
      if j >= a:
        break
      arr = arr + [arr1[j]]
    print(arr)
    result.append(arr)
  return result

# equal width
def equiwidth(arr1, m):
	a = len(arr1)
	w = int((max(arr1) - min(arr1)) / m)
	min1 = min(arr1)
	arr = []
	for i in range(0, m + 1):
		arr = arr + [min1 + w * i]
	arri=[]
	
	for i in range(0, m):
		temp = []
		for j in arr1:
			if j >= arr[i] and j <= arr[i+1]:
				temp += [j]
		arri += [temp]
	print(arri)

# data to be binned
data = [80, 10, 72, 5, 12, 55, 204, 18, 15, 35, 92, 216, 100, 108, 88]
print('input: ', data)

#Step 1: data is sorted
data.sort()
print('sorted input', data)

# no of bins
m = 3

#Step 2: Partioning in equal depth or equal width bins
print("\nequal depth binning")
arr= equidepth(data, m)

print("\nequal width binning")
equiwidth(data, 3)


input:  [80, 10, 72, 5, 12, 55, 204, 18, 15, 35, 92, 216, 100, 108, 88]
sorted input [5, 10, 12, 15, 18, 35, 55, 72, 80, 88, 92, 100, 108, 204, 216]

equal depth binning
[5, 10, 12, 15, 18]
[35, 55, 72, 80, 88]
[92, 100, 108, 204, 216]

equal width binning
[[5, 10, 12, 15, 18, 35, 55, 72], [80, 88, 92, 100, 108], [204]]


In [28]:
import numpy as np

#Step 3: Data smoothing using mean, median or boundaries of bins
# We will use the equal depth bins created above
print('Bins: ')
for x in arr:
  print(x)

bin1=np.zeros((3,5), dtype=int) 
bin2=np.zeros((3,5), dtype=int) 
bin3=np.zeros((3,5), dtype=int) 
# Bin mean
for i in range(len(arr)):
	bin=arr[i]
	mean=np.mean(bin)
	for j in range(len(bin)):
		bin1[i,j]=mean
print("\nBin Mean: \n",bin1)
	
# Bin boundaries
for i in range(len(arr)):
	b=arr[i]
	for j in range(len(bin)):
		if (b[j]-b[0]) < (b[len(bin)-1]-b[j]):
			bin2[i,j]=b[0]
		else:
			bin2[i,j]=b[len(bin)-1]	
print("\nBin Boundaries: \n",bin2)

# Bin median
for i in range(len(arr)):
  b=arr[i]
  mid = int((len(bin)-1)/2)
  for j in range(len(bin)):
    bin3[i,j]=b[mid]
print("\nBin Median: \n",bin3)


Bins: 
[5, 10, 12, 15, 18]
[35, 55, 72, 80, 88]
[92, 100, 108, 204, 216]

Bin Mean: 
 [[ 12  12  12  12  12]
 [ 66  66  66  66  66]
 [144 144 144 144 144]]

Bin Boundaries: 
 [[  5   5  18  18  18]
 [ 35  35  88  88  88]
 [ 92  92  92 216 216]]

Bin Median: 
 [[ 12  12  12  12  12]
 [ 72  72  72  72  72]
 [108 108 108 108 108]]
