# 4.18 Intro to Data Science: Measures of Dispersion
* Considered the measures of central tendency—mean, median and mode. 
* Help us categorize typical values in a group.
* An entire group is called a **population**. 
* Sometimes a population is quite large, such as the people likely to vote in the next U.S. presidential election, which is a number in excess of 100,000,000 people. 
* For practical reasons, the polling organizations trying to predict who will become the next president work with carefully selected small subsets of the population known as **samples**. 
* Hear we introduce **measures of dispersio**n (also called **measures of variability**) that help you understand how **spread out** the values are. 
* We’ll calculate each measure of dispersion both by hand and with functions from the module `statistics`, using the following population of 10 six-sided die rolls:
> 1, 3, 4, 2, 6, 5, 3, 4, 5, 2

### Variance
* To determine variance, begin with the mean of these values—3.5. 
* Next, subtract the mean from every die value:
> -2.5, -0.5, 0.5, -1.5, 2.5, 1.5, -0.5, 0.5, 1.5, -1.5
* Then, square each of these results (yielding only positives):
> 6.25, 0.25, 0.25, 2.25, 6.25, 2.25, 0.25, 0.25, 2.25, 2.25
* Finally, calculate the mean of these squares, which is 2.25 (22.5 / 10)—this is the **population
variance**. 
* Squaring the difference between each die value and the mean of all die values emphasizes **outliers**—the values that are farthest from the mean—which can be important in data analysis.
* The following code uses the `statistics` module’s `pvariance` function to confirm our manual result:

In [1]:
import statistics

In [2]:
statistics.pvariance([1, 3, 4, 2, 6, 5, 3, 4, 5, 2])

2.25

### Standard Deviation
* The standard deviation is the square root of the variance (in this case, 1.5), which tones
down the effect of the outliers. 
* The smaller the variance and standard deviation are, the closer the data values are to the mean and the less overall dispersion (that is, spread) there is between the values and the mean. 
* The following code calculates the population standard deviation with the `statistics` module’s `pstdev` function, confirming our manual
result:

In [3]:
statistics.pstdev([1, 3, 4, 2, 6, 5, 3, 4, 5, 2])

1.5

In [4]:
import math

In [5]:
math.sqrt(statistics.pvariance([1, 3, 4, 2, 6, 5, 3, 4, 5, 2]))


1.5

### Advantage of Population Standard Deviation vs. Population Variance
* Suppose you’ve recorded the March Fahrenheit temperatures in your area. 
* You might have 31 numbers such as 19, 32, 28 and 35. 
* The units for these numbers are degrees.
* When you square your temperatures to calculate the population variance, the units of the population variance become **“degrees squared.”**
* When you take the square root of the population variance to calculate the population standard deviation, the units once again become **degrees**, which are the same units as your temperatures.