# Data Collection and Bin

When collecting data with either determined or random sampling, it is important to bin the data to ensure that the data is not too sparse. This is especially important when the data is collected from a continuous distribution.
Bin means to group the data into intervals. This is done to reduce the number of unique values in the data. This is important because the number of unique values in the data can be very large, and this can make the data very sparse. This can make the data difficult to work with, and it can also make the data difficult to visualize.
A bin edge is the boundary between two bins, while a bin center is the midpoint of a bin. The bin width is the difference between two bin edges. The bin width is also the size of the interval of a bin.
Bin edges and centers are grouped into an array. The bin edges are the first element of the array, and the bin centers are the second element of the array. The bin width is the difference between the first and second elements of the array.
The bin edges and centers are used to group the data into bins. The bin edges are used to determine which bin a value belongs to, while the bin centers are used to label the bins.

The following example demonstrates how to bin data using numpy and matplotlib. The example bins the data into intervals of 0.01. The example then counts the number of values in each bin, and then plots the counts.

```python
import numpy as np
import matplotlib.pyplot as plt

times_bins = np.arrange(0.59, 0.70, 0.01)
times_counts = np.zeros(np.size(times_bins) - 1)

for time in times:
    for i in range(np.size(times_bins) - 1):
        if time >= times_bins[i] and time < times_bins[i + 1]:
            times_counts[i] += 1
```

For plt.bar(), the x and y arrays have to have the same number of elements. We have to make a new array with the bin centers, and then plot using that as our x-axis.

```python

times_shift = (times_bins[1] - times_bins[0])
times_bin_centers = np.arrange(times_bins[0] + times_shift / 2, times_bins[-1], times_shift)

print("times_bin_centers = ", times_bin_centers ", with length = ", np.size(times_bin_centers))

```

To find the mean of a given set of data, we can use the following code:

```python

mean = np.mean(times)

```

For the sake of the excersize, we will use the following data:

```python

mean = sum(data) / len(data)

```

To calculate the standard deviation of a given set of data, we can use the following code:

```python

std = np.std(times)

```

For the sake of the excersize, we will use the following data:

```python

std = np.sqrt(1/(len(data) - 1) * sum([(x - mean)**2 for x in data]))

```

To calculate the median of a given set of data, we can use the following code:

```python

median = np.median(times)

```

For the sake of the excersize, we will use the following data:

```python

data.sort()
n = len(data)
if n % 2 == 0:
    median = (data[n//2 - 1] + data[n//2]) / 2
else:
    median = data[n//2]

```

Carrying out the above functions using numpy is more efficient and less prone to errors.

```python

mean = np.sum(data) / np.size(data)
std = np.sqrt(np.sum((data - mean)**2) / (np.size(data) - 1))
np.sort(data)
n = np.size(data)
if n % 2 == 0:
    median = (data[n//2 - 1] + data[n//2]) / 2
else:
    median = data[n//2]

```