## Describing a Distribution

The goal of this notebook is to use the tools available in Python to describe the center, shape, and spread of a distribution.

In [None]:
import numpy as np
import pandas as pd

First you will generate a sample of 100 values using NumPy's random number generator. To ensure that everyone gets the same results every time, we will use 2 as the seed for the random number generator.

In [None]:
n = 100
np.random.seed(2)
normal_sample = pd.DataFrame(np.random.normal(size=(n,1)))
normal_sample.head()

## Center and Spread

In order to describe the center and spread, we can either calculate the mean and standard deviation, or the median and IQR. Let's do the mean and stdev. first as that is more straightforward.

### Mean & Standard Deviation

In [None]:
mean = normal_sample.mean()
std = normal_sample.std()
print(mean)
print(std)

Let's extract the values from the results and print them using formatting:

In [None]:
mean = mean[0]
std = std[0]
print(f"Mean = {mean:.3f}, Standard Deviation = {std:.3f}")

### Median and IQR

Pandas doesn't have a function to get IQR directly, but it does have a quantile function that you can pass in the desired percentile (like .25 for Q1), and it will return the value at that location. We will use that to identify Q1 and Q3, and then find the difference to get the IQR.

In [None]:
median = normal_sample.median()[0]
Q1 = normal_sample.quantile(q=.25)[0]
Q3 = normal_sample.quantile(q=.75)[0]
IQR = Q3 - Q1

print(f"Median = {median:.3f}, Interquartile Range = {IQR:.3f}")

## Shape and Outliers

The easiest way to describe the shape of a distribution is by visualizing it, so we will generate a quick histogram, and a boxplot to see if there are any outliers.

In [None]:
normal_sample.hist(bins=10)

In [None]:
normal_sample.boxplot(vert=False)

## Conclusion

Now that we have identified the center, shape, and spread, we should write up a quick conclusion:

The center and spread of this distribution are a mean of -0.104 and a standard deviation of 1.042. The shape is roughly symmetric, and bell-shaped. There is one outlier at approximately -2.8.