In [1]:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('seaborn-dark-palette')

**GOALS:**

* Model Arithmetic and Geometric Sequences with Python

* Use Pandas to Investigate Population Growth

## Population Growth and Sequences

According to a 1992 headline *"This Decade Will Add a Billion"*, the world population was increasing by about a quarter of a million people per day or 91.25 million people per year.

In 1990, the world population was approximately 5.23 billion people.  Accordingly, some population estimates could look like:

| Year | 1990 | 1991 | 1992 | 1993 | 1994 | 
| --- | --- | --- | --- | --- | --- | 
| **Population (millions)** | 5230 | 5321.25 | 5412.50 | 5503.75 | 5595.0 | 

The table here is built with an additive pattern, where each year is 91.25 million more than the year prior.  We will refer to this behavior as an **arithmetic sequence**.

Further, we will use the notation $a_n$ to denote the $n^{th}     $         term of the sequence.  For example, here we would have:

$$a_0 = 5320 $$

$$a_1 = 5321.25$$

$$a_2 = 5412.50$$

$$a_n = a_{n-1} + 91.25$$

This is similar to our work with lists, and we can easily generate sequences with the `for` function in Python.

#### Example 1: Arithmetic Sequence

There are many ways to do this, but for the purpose of simplicity we will only introduce one such method here.  The idea is this:

- create an empty list of length 20

- identify a starting term for our sequence (i.e. $a_0 = 5230$)

- construct the remaining terms by continuously adding 91.25

In [2]:
N = 20
x = np.zeros(N+1)
for i in range(N):
    x[0] = 5320
    x[i+1] = x[i]+ 91.25
    print(x[i])

5320.0
5411.25
5502.5
5593.75
5685.0
5776.25
5867.5
5958.75
6050.0
6141.25
6232.5
6323.75
6415.0
6506.25
6597.5
6688.75
6780.0
6871.25
6962.5
7053.75


In [3]:
plt.plot(x, 'o', markersize = 10, label = 'Arithmetic')
plt.xticks([0, 5, 10, 15, 20], ['1990', '1995', '2000', '2005', '2001'])
plt.title('Population Growth')

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x1080b7a20>

In [140]:
x[-1]

7145.0

### Geometric Sequences

In [4]:
N = 20
x2 = np.zeros(N+1)
for i in range(N):
    x2[0] = 5320
    x2[i+1] = x2[i]*1.0174
    print(x2[i])

5320.0
5412.568
5506.7466832
5602.56407549
5700.0486904
5799.22953761
5900.13613157
6002.79850026
6107.24719416
6213.51329534
6321.62842668
6431.6247613
6543.53503215
6657.39254171
6773.23117194
6891.08539433
7010.99028019
7132.98151106
7257.09538936
7383.36884913


In [5]:
plt.plot(x2, 'o', color = 'orange', markersize = 10, label = 'Geometric')
plt.legend(loc = 'best')

<matplotlib.legend.Legend at 0x105a1def0>

### Problems

Ask any amount of problems dealing with sequences, populations, creating DataFrames with simple sequences or random numbers.

# Reading Files into Pandas
---

![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1c/Robert_Mitchum_1949_%28no_signature%29.jpg/440px-Robert_Mitchum_1949_%28no_signature%29.jpg)

Below we load a csv file into a Pandas DataFrame.  This is how many of our files will be loaded, and one of the easiest ways to get data into a Jupyter notebook so we can examine and plot it.  The following dataset comes from [*The Data and Story Library at statlib*](http://lib.stat.cmu.edu/DASL/Datafiles/beefrefdat.html)

---

**Datafile Name**: Beef Council Check-off

**Datafile Subjects:** Agriculture , Consumer

**Story Names: ** Beef Council Check-off

**Reference:** The Missoulian, Missoula, MT, May 28, 1988. U.S. Bureau of the Census, City and County Data Book, 1986

**Authorization:** free use

**Description:** A dollar a head promotion checkoff for cattle growers supports the advertising campaigns of the American Beef Council. Their current TV campaign features a voice-over by the actor, Robert Mitchum, using the theme "Beef - its what's for dinner." The Missoulian reported the percent of growers voting "yes" for the check-off by Montana's 56 counties. Characteristics of farms by counties were found in the City and County Data Book.

**Number of cases:** 56

---

**Variable Names:**

YES: Percent of farmers voting "yes" for check-off

BIG: Percent of farms with 500 acres or more

PRIN: Percent of operators whose principle income is farming

SIZE: Average size of farm (hundreds of acres)

VAL: Average value of products sold (\$ thousands)

LIVE: Percent of products sold from livestock and poultry

SALES: Percent of farms with sales of $100,000 or more



<a href="http://www.youtube.com/watch?feature=player_embedded&v=XhxhiffTFwE
" target="_blank"><img src="http://img.youtube.com/vi/XhxhiffTFwE/0.jpg" 
alt="IMAGE ALT TEXT HERE" width="240" height="180" border="10" /></a>

In [28]:
import pandas as pd

In [31]:
df = pd.read_csv('BEEF.csv')

In [32]:
df.head()

Unnamed: 0,YES,BIG,PRIN,SIZE,VAL,LIVE,SALE
0,85.9,64.6,78.4,45.2,122,83.9,32.7
1,65.0,61.0,79.9,58.5,121,60.2,24.2
2,74.9,70.8,79.6,52.5,81,43.1,25.5
3,72.5,57.0,79.2,24.0,74,45.4,25.1
4,76.6,36.0,71.5,10.5,54,68.8,11.6


## Plotting from Pandas

We can use both Pandas and the familiar matplotlib commands to plot things from the DataFrame.  

To use Pandas directly, we call `df.plot()`.  We can further adjust and specify the kind of plot we would like.  Pandas plotting gives the following types as options:


    - 'line' : line plot (default)
    - 'bar' : vertical bar plot
    - 'barh' : horizontal bar plot
    - 'hist' : histogram
    - 'box' : boxplot
    - 'kde' : Kernel Density Estimation plot
    - 'density' : same as 'kde'
    - 'area' : area plot
    - 'pie' : pie plot
    - 'scatter' : scatter plot
    - 'hexbin' : hexbin plot


In [33]:
df.plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x10b05cf28>

In [45]:
plt.figure()
plt.plot(df['YES'], df['BIG'], 'o', c = 'orange', markersize = 10)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x109e189e8>]

In [52]:
df.plot(kind = 'box')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x10ab17898>

In [51]:
plt.figure()
df['SIZE'].plot(kind = 'hist')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x10a915dd8>

## Problems

Describe the outcome of the survey for the add campaign with Robert Mitchum that resulted from the dollar a head check off.  Did many farmers support this?  What kind of farms did they own?  Explore the data with a variety of visualizations and describe what you find.