# Lesson 14: Chebyshev's Bounds

Welcome to Lesson 14! Throughout the course you will complete assignments like this one. You can't learn technical subjects without hands-on practice, so these assignments are an important part of the course.

Collaborating on labs is more than okay -- it's encouraged! You should rarely remain stuck for more than a few minutes on a question, so ask a post to the discussion board or ask your instructor for help. Explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it. You should **not** just copy/paste someone else's code, but rather work together to gain understanding of the task you need to complete. 

To receive credit for this assignment, answer all questions correctly and submit before the deadline.

**Due Date:** 

**Collaboration Policy:** Data science is a collaborative activity. While you may talk with others about the labs, we ask that you **write your solutions individually**. If you do discuss the assignments with others **please include their names below** (it's a good way to learn your classmates' names).

**Collaborators:** 

List collaborators here.

## Today's Lesson

In today's lab, you'll learn about:

- Chebyshev's bounds.

Let's get started!

## Words of Caution

Remember to run the cell below. It's for setting up the environment so you can have access to what's needed for this lesson. For now, don't worry about what it means: we'll learn more about what's inside of it in the next few lessons.

In [None]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

## Average (Mean) ##

In [None]:
values = make_array(2, 3, 3, 9)

**Question 1.** Calculate the average value in the `values` table.

In [None]:
sum(values)/len(values)

**Question 2.** Use a function to find the average value in the `values` table.

In [None]:
np.mean(values)

In [None]:
np.average(values)

You could even type out your data, but that's probably not wise.

In [None]:
(2 + 3 + 3 + 9)/4

In [None]:
values_table = Table().with_columns('value', values)
values_table

**Question 3.** Let's visualize out data set. 

In [None]:
bins_for_display = np.arange(0.5, 10.6, 1)

values_table.hist(bins=bins_for_display)
plt.scatter(np.mean(values), -1e-2, color="red", marker="o", zorder=2);
plt.scatter(percentile(50, values), -1e-2, color="green", marker="o", zorder=2);

## NBA

In [None]:
nba = Table.read_table('data/nba2013.csv')
nba

**Question 4.** Make a histogram of the `nba` table.

In [None]:
my_bins = np.arange(65.5, 90.5) 
nba.hist('Height', bins=my_bins)

**Question 5.** Find the 50th percentile of heights. 

In [None]:
heights = nba.column('Height')
percentile(50, heights)

**Question 6.** Find the average of height. 

In [None]:
np.mean(heights), np.median(heights)

## Standard Deviation

In [None]:
sd_table = Table().with_columns('Value', values)
sd_table

In [None]:
average_value = np.average(values)
average_value

In [None]:
sum(sd_table.column('Value')-average_value)

In [None]:
deviations = (sd_table.column('Value')-average_value)**2
sd_table = sd_table.with_column('deviations', deviations)
sd_table

**Question 7.** What is the sum of all the deviations. 

In [None]:
sum(deviations)

In [None]:
sd_table = ...
sd_table

**Question 8.** What is variance?

In [None]:
variance = sum(deviations)
variance

**Question 9.** What is standard deviation?

In [None]:
sd = (variance/4)**0.5
sd

**Question 10.** Use the `numpy` function `std` to find the standard deviation in the values table. 

In [None]:
np.std(sd_table.column('Value'))

## Chebyshev's Inequality

[Chebyshev's Inequality](https://en.wikipedia.org/wiki/Chebyshev%27s_inequality)

In [None]:
births = Table.read_table('data/baby.csv').drop('Maternal Smoker')
births.show(3)

In [None]:
births.hist(overlay=False)

In [None]:
mpw = births.column('Maternal Pregnancy Weight')
mean = np.mean(mpw)
sd = np.std(mpw)
mean, sd

**Question 11.** Which observations are within 3 standard deviations of the mean.

$$1-\frac{1}{z^2}, \text{ where } z > 1$$

**Hint:** Use one of the `.are` methods.

In [None]:
lower_bound = mean-3*sd
upper_bound = mean+3*sd
within_3_SDs = births.where('Maternal Pregnancy Weight', are.between(lower_bound,upper_bound))
within_3_SDs.show(3) 

**Question 12.** What proportion of observations are within 3 standard deviations of the mean.

In [None]:
within_3_SDs.num_rows/births.num_rows

In [None]:
1-1/(3**2)

In [None]:
births.labels

**Question 13.** Let's see if Chebyshev's bounds work for distributions with various shapes.

In [None]:
for feature in births.labels:
    values = births.column(feature)
    mean = np.mean(values)
    sd = np.std(values)
    print()
    print(feature)
    for z in make_array(2, 3, 4, 5):
        chosen = births.where(feature, are.between(mean - z*sd, mean + z*sd))
        proportion = chosen.num_rows / births.num_rows
        percent = round(proportion * 100, 2)
        print('Average plus or minus', z, 'SDs:', percent, '% of the data')