## Bassel's Correction 

**Bessel's correction** is a statistical formula that adjusts the sample standard deviation to more accurately reflect the population standard deviation. The formula involves dividing the sum of squared deviations by `(n-1)` instead of `n`, where `n` is the `sample size`.

The sample standard deviation is a measure of how spread out the data is from the mean value, and is calculated as the square root of the sum of squared deviations from the mean, divided by the number of observations minus one `(n-1)` for a sample, or just n for a population.

The reason for using Bessel's correction is to account for the fact :
>That the `sample standard deviation` tends to underestimate the `population standard deviation`, especially for `small sample sizes`. By using `(n-1)` instead of `n` in the denominator, Bessel's correction increases the value of the sample standard deviation, making it a better estimate of the population standard deviation.


In [11]:
import numpy as np 
import pandas as pd

In [12]:
# Define the data
data = [1, 2, 3, 4, 5, 6, 7, 8, 9]

In [24]:
# Calculating the standard deviation using Pandas
df = pd.DataFrame(data)
std_pandas = float(df.std())
std_pandas

2.7386127875258306

In [23]:
# Calculating the standard deviation using Numpy
arr = np.array(data)
std_numpy  = np.std(arr)
std_numpy

2.581988897471611

In [25]:
# Correction
print(std_pandas - std_numpy)

0.1566238900542194


The difference in the standard deviation calculated by pandas and numpy is because of the difference in the formula used to calculate the standard deviation.

- By default, `pandas` calculates the `sample standard deviation` using `Bessel's correction`, which divides the sum of squared deviations by `(n-1)` instead of `n`. The idea behind this correction is to adjust for the fact that using a sample to estimate the population standard deviation will tend to underestimate it.

- On the other hand, numpy's `np.std()` function by default uses the population standard deviation formula, which divides the sum of squared deviations by `n`.

In the given code, the data contains only 9 values, so using Bessel's correction will result in a slightly higher standard deviation than using the population formula. The difference between the two values is what is printed at the end of the code.


In Python, Pandas calculates the standard deviation with Bessel's correction by default, while Numpy does not. However, both libraries provide an option to turn Bessel's correction off by setting the "ddof" (degrees of freedom) parameter to 0.