<img src="https://datasciencedegree.wisconsin.edu/wp-content/themes/data-gulp/images/logo.svg" width="300">


# Assignment 4

This assignment has you play with such things as random data generation, and computation on data stored in a Python list.  
<img src="https://upload.wikimedia.org/wikipedia/commons/3/36/Two_red_dice_01.svg" width="150">

We're gonna use functions to do as much of it as we can, because functions are totally awesome.

Might I recommend the use of [Numpy](http://www.numpy.org/) for random numbers and other useful things?

## Problem 1.  Revenue models.

You're an analyst for *Farm2Table*, a chain restaurant that sources its food locally whenever possible.  You are tasked to forecast the financial payoff of an ad campaign.  

Currently, average monthly revenue at each store is \$100,000, with a standard deviation of \$12,000.  An advertising firm claims that with clever advertisement, they can increase monthly revenue for *Farm2Table* to \$120,000, but the standard deviation will be increased as well, to \$25,000.  Assume a **normal distribution**.

---

🎯 Write a Python function `simulate_revenue(average, std_dev, months)`.
- it produces simulated revenue data according to a normal distribution with shape parameterized by `average` and `std_dev`, for a given number of months.   
- It returns a list of length `months`. 
- Round each item to the nearest cent.  No fractions of a cent allowed. 

Use `simulate_revenue` to generate two lists of random numbers which model potential revenue: 

1. one list `before` with 24 months of revenue using the current mean and standard deviation, 
2. another list `after` with 12 months of revenue using the predicted mean and standard deviation.

Then, concatenate `before` and `after` to produce a third list `all_months` containing the revenue of all 36 months.  

---

🎯 Write a function `print_monthly_revenue(revenue, name)` that prints an arbitrary list to the screen, with these formatting requirements:
- round each number *when printing* to the nearest $100.  
- do not modify the original list.
- prints a two-column output, with `month: revenue` (the month is implicitly given). Pad the month value so it is always of width 2.
- right-align the revenue value

Example: 
`print_monthly_revenue(before, "before")` produces
```
Revenue for period 'before'

Mo: revenue
-----------
01: 123100
02:  98288
...
```

Call `print_monthly_revenue` on each of your concatenated list, and be sure to commit the output.

In [1]:
import numpy as np

In [2]:
# Simulate revenue function
def simulate_revenue(average, std_dev, months):
    # Generate empty list
    revenue = []
    
    # For each iteration in month, append a random number to list
    for i in range(months):
        monthlyRev = np.random.normal(average, std_dev)
        revenue.append(monthlyRev)
        revenue = [round(elem,2) for elem in revenue]
    
    return revenue

# Before and after
before = simulate_revenue(100000,12000,24)
after = simulate_revenue(120000,25000,12)

print("Before: ",before,'\n')
print("After: ", after, '\n')

# All months
all_months = before + after
print("All: ", all_months, '\n')

Before:  [94071.76, 105524.39, 106181.89, 64638.02, 83864.45, 118771.92, 103673.42, 90878.22, 91168.22, 89659.26, 120043.4, 88851.75, 67372.53, 93454.46, 79333.96, 112085.09, 114737.91, 103260.34, 110746.77, 91210.2, 82209.84, 97440.12, 100791.18, 120705.34] 

After:  [90118.06, 122847.8, 117940.71, 101972.68, 148347.02, 111442.22, 102248.41, 103254.43, 110619.97, 156458.15, 91998.04, 118771.79] 

All:  [94071.76, 105524.39, 106181.89, 64638.02, 83864.45, 118771.92, 103673.42, 90878.22, 91168.22, 89659.26, 120043.4, 88851.75, 67372.53, 93454.46, 79333.96, 112085.09, 114737.91, 103260.34, 110746.77, 91210.2, 82209.84, 97440.12, 100791.18, 120705.34, 90118.06, 122847.8, 117940.71, 101972.68, 148347.02, 111442.22, 102248.41, 103254.43, 110619.97, 156458.15, 91998.04, 118771.79] 



In [3]:
def print_monthly_revenue(revenue, name):
    # Empty list and dashes for table
    monthlyRevRound = []
    dash = '-' * (len(name) + 10)
    
    # Print first top of table
    print(name, ":", "revenue")
    print('{:<0}'.format(dash))
    # Loop through revenue, round it to nearest 100, convert to string, print formatted
    for i in range(len(revenue)):
        monthlyRev = int(round(revenue[i],-2))
        monthStr = str(i+1)
        monthlyRevStr = str(monthlyRev)
        print('{:<2}'.format(monthStr), '{:>10}'.format(monthlyRevStr))

    return

In [4]:
print_monthly_revenue(before, 'Before')

Before : revenue
----------------
1       94100
2      105500
3      106200
4       64600
5       83900
6      118800
7      103700
8       90900
9       91200
10      89700
11     120000
12      88900
13      67400
14      93500
15      79300
16     112100
17     114700
18     103300
19     110700
20      91200
21      82200
22      97400
23     100800
24     120700


In [5]:
print_monthly_revenue(after, 'After')

After : revenue
---------------
1       90100
2      122800
3      117900
4      102000
5      148300
6      111400
7      102200
8      103300
9      110600
10     156500
11      92000
12     118800


In [6]:
print_monthly_revenue(all_months, "All")

All : revenue
-------------
1       94100
2      105500
3      106200
4       64600
5       83900
6      118800
7      103700
8       90900
9       91200
10      89700
11     120000
12      88900
13      67400
14      93500
15      79300
16     112100
17     114700
18     103300
19     110700
20      91200
21      82200
22      97400
23     100800
24     120700
25      90100
26     122800
27     117900
28     102000
29     148300
30     111400
31     102200
32     103300
33     110600
34     156500
35      92000
36     118800


---

## Problem 2(a).  Bus arrival times.

Shuttle buses arrive at an airport to fetch passengers with an average interval of 15 minutes.  Their actual interarrival times follow an **exponential distribution**.  

---

🎯 Write a function `simulate_busses(mean, num_busses)` that simulates bus arrival times

- Use a ```random``` module to generate the exponentially distributed bus intervals.
- Round your raw data to the nearest tenth of a minute.  Realize that rounding is generally scary, and can cause serious problems downstream if not done only when appropriate.

Call your function to generate a list of 50 arrival times with mean 15; capture the result in a variable called `bus_times`.  Print your list, and be sure to commit the output.

For example, your list might begin ```[11.2, 34.1, 18.8, 23.5, ...```.  

---

🎯 Use Python to answer the following questions:  
1. What is the shortest waiting time in your list?  
2. What is the longest waiting time?  

These answers must be programmatically determined and the output that proves you computed them (namely, the values) must be committed.

---

🎯 When answering the previous question, did you write a function?  Why or why not?

In [7]:
import numpy as np

In [8]:
def simulate_busses(mean, num_busses):
    
    busGen = np.random.exponential(mean, num_busses)
    bus_times = np.round(busGen,1)
    #print(bus_times)    
    return bus_times

In [9]:
busList = simulate_busses(15,50)
print(busList)

[14.8 24.5  5.  14.   2.7  4.2 42.5 14.1  3.8  5.   9.2  7.   0.9 14.4
  7.8  6.7 11.   1.9 18.2  0.8  5.3  6.  39.2  6.1 10.6  6.8 12.3  8.4
  9.4  0.2  0.7  5.2  3.9 10.  21.1  0.3  3.  38.   8.5 14.8  1.3 15.5
  9.8  1.8  5.  28.9 22.   0.9 62.2 20.1]


In [10]:
np.min(busList)

0.2

In [11]:
np.max(busList)

62.2

I did not write a function. I figured since I was already using NumPy I should just have it do the legwork for me and not reinvent the wheel.

# Problem 2(b).  Cumulative waiting times.

In this problem, you'll interpret the data you generated in Problem 2(a) as a sequence of consecutive arrival times.  Suppose the first bus arrives at the measured number of minutes after midnight.  The bus company wants to track the time each bus arrived, measured in minutes after midnight.  

🎯 Write a function that transforms the bus arrival times into cumulative times.  The function takes in a list of arrival times, and returns a list of the number of minutes after midnight that each bus arrived at, using the list you generated in part (a).

- This assumes the first bus arrived at the airport terminal at midnight *plus* its arrival time (the time at bus_times[0]).  The second arrived at the arrival time of the first *plus* its arrival time, etc.
- I'm deliberately not naming your function for you, here.  You get to choose!  Make it descriptive!

With the data from our example in part (a), the answer would start ```[11.2, 45.3, 64.1, 87.6, ...]```, where 45.3 = 11.2+34.1.  Call your function on your variable that you already have in memory.  Print your cumulative waiting time list.

---

🎯 Using the list generated in 2(a), at what time does the 50th bus arrive? Print the time in the format `HH:MM AM/PM` where `HH` is the hour and `MM` is the minute.  

###### On the printing of the times

* Note that HH should be between 01 and 12.
* You must print that leading 0 in the hour and minute, if it is less than 10.
* Do *not* generate a new list of times; be sure to re-use the list of intervals you already generated in 2(a).
* I strongly suggest you write a **function** to stringify the floating point number.  
  * It eats the floating point number, interpreted as number of minutes past midnight.  
  * It returns a string, composed of the hours, minutes, and morning/afternoon indicator.  
  * To solve this problem, take a random number of minutes, and do the computation yourself, and write down the steps you take to do it.  That's what you should make the computer do!
  * Test the function as you develop it, with some known inputs and the times they should map to; e.g., 125.0 is "02:05 AM", and so is 125+24*60.

In [12]:
def BusTimeSum(bus_times):
    
    timeInMinutes = sum(bus_times)
    timeInMinutes = np.round(timeInMinutes,1)
    #print(x)
    return timeInMinutes

In [13]:
minutes = BusTimeSum(busList)
print(minutes)

585.8


In [14]:
def minutesToTime(minutes):
    ap = "AM"
    hours = int(minutes / 60)
    print(hours)
    minutes = minutes % 60
    minutes =  int(minutes)
    if minutes < 10:
        minutes = ('0{}'.format(minutes))
    if hours > 12:
        ap = "PM"
        hours = hours - 12
    if hours < 10:
        hours = ('0{}'.format(hours))

    print("{}:{} {}".format(hours, minutes,ap))
    return

In [15]:
minutesToTime(minutes)

9
09:45 AM


---

## Problem 3.  Chocolate and the Nobel

This problem also uses generation of random numbers to simulate.  

Researchers have observed a (presumably spurious) correlation between per capita chocolate consumption and the rate of Nobel prize laureates: see [Chocolate Consumption, Cognitive Function, and Nobel Laureates](http://www.nejm.org/doi/full/10.1056/NEJMon1211064).  In this problem, we will create some sample data to simulate this relationship.

I have not told you what to name your functions, or even when to write them.  But know that the person who is authoring this assignment often writes one-line functions with descriptive names.  There's power in naming your actions, no matter how simple!!!
### Problem 3(a).  A first pass at simulation



🎯 Write Python code to produce a list of 50 ordered pairs $(c,n)$, where $c$ represents chocolate consumption in kg/year/person and $n$ represents the number of Nobel laureates per 10 million population, for some country.  The values for $c$ should be random numbers (not necessarily integers!) **uniformly distributed** between 0 and 15.  You may assume that $c$ and $n$ are related by

$n = 0.4\cdot c-0.8$.

However, it is impossible for a nation to have a negative number of Nobel laureates, so if your predicted value of $n$ is less than 0 for a country, replace that value by 0.

🎯 Print your list of ordered pairs; report your values of $c$ and $n$ to 2 decimal places.

In [16]:
import numpy as np

In [17]:
# Return a list of random uniformly distributed numbers
def RandomUniform(sample):
    numList = [] 
    # Create list with random uniformly distributed values between 0 and 15
    for i in range(sample):
        sample = np.random.uniform(0,15)
        numList.append(sample)
    return numList

In [18]:
# Return a list of Nobel Prize winners based on how much chocolate was eaten
def NobelPerChocolate(c):
    nobel = []
    # Create a list by manipulating each element from RandomUniform
    for i in range(len(c)):
        n =  (0.4 * c[i]) -.8
        if n < 0:
            n = 0
        nobel.append(n)
    return nobel

In [19]:
# Function calls and print statements
c = RandomUniform(50)
n = NobelPerChocolate(c)

print("Chocolate:", c)
print("\nNobel: ", n)

Chocolate: [2.039420922408267, 0.4385412419685669, 3.829137820258577, 9.99116057546959, 6.72047403974242, 11.14717695303612, 4.157629123065477, 8.98275371528634, 13.671935542712697, 11.751879261969446, 3.590816168238276, 6.69340269611882, 2.2910644707361523, 6.052617729443012, 3.027037970285286, 2.7875267594195297, 8.31088030347022, 2.622396342278278, 8.259512072698382, 10.637032841214241, 7.028646596335229, 8.0787766709966, 3.982324822711333, 6.788736704314689, 8.461758461716641, 14.525217615621742, 7.417756704650074, 13.982115454264326, 7.419238277458218, 13.065932631843635, 3.7504994366403226, 3.242469405078325, 12.082597652420919, 13.477177034805695, 0.5571089378318572, 8.450124067171451, 2.6388793378983206, 11.659927624641774, 8.58077598336137, 5.164734682260544, 3.6424541304559805, 3.4182204022322416, 12.036560663444671, 12.208958666858932, 3.196101107200165, 7.7341345786567475, 4.56043035220268, 4.139055494687497, 6.996353310551985, 2.232535213551414]

Nobel:  [0.015768368963306

In [20]:
# Concatenate my lists
chocAndNob = list(zip(c,n))
print(chocAndNob)

[(2.039420922408267, 0.0157683689633068), (0.4385412419685669, 0), (3.829137820258577, 0.7316551281034307), (9.99116057546959, 3.1964642301878357), (6.72047403974242, 1.8881896158969684), (11.14717695303612, 3.658870781214448), (4.157629123065477, 0.8630516492261908), (8.98275371528634, 2.7931014861145362), (13.671935542712697, 4.66877421708508), (11.751879261969446, 3.900751704787779), (3.590816168238276, 0.6363264672953104), (6.69340269611882, 1.8773610784475283), (2.2910644707361523, 0.11642578829446093), (6.052617729443012, 1.621047091777205), (3.027037970285286, 0.41081518811411444), (2.7875267594195297, 0.3150107037678118), (8.31088030347022, 2.5243521213880884), (2.622396342278278, 0.2489585369113112), (8.259512072698382, 2.5038048290793524), (10.637032841214241, 3.454813136485697), (7.028646596335229, 2.0114586385340916), (8.0787766709966, 2.43151066839864), (3.982324822711333, 0.7929299290845333), (6.788736704314689, 1.9154946817258758), (8.461758461716641, 2.584703384686657),

### Problem 3(b).  Error term.

Our list of data from part (a) is not a good simulation of real-world data, because it is perfectly linear.  That is, even though the per capita chocolate variable is random, the number of Nobel laureates is 100% predicted from that value.  So, we'll randomly perturb the number of laureates for each country.  

🎯 Using the 50 $c$ and $n$ values you generated in part (a), generate new $n_e$ values, using the following formula:

$n_e = n + \epsilon.$

Here $\epsilon$ should be a random variable with **normal distribution**, mean 0, and standard deviation 1.  Using the list of ordered pairs generated in 3(a), create a new list of 50 ordered pairs $(c,n_e)$.  Each $n$ should be perturbed by a distinct and randomly generated value -- do not use the same $\epsilon$ for all $n$.

Again, your simulated data should not predict negative numbers of Nobel laureates. Again, do *not* generate a new list of $(c,n)$ values; you must re-use the list of ordered pairs already generated in 3(a).  Data you create as the result of evaluating a cell is available for use in other cells.  Check it out -- try running the Python command `who` if you want to prove it to yourself (a good habit to be in!).

🎯 Print your new list of ordered pairs.

In [21]:
# Perturb each value in n
def NobError(n):
    nEps = []
    # for each value add(or subtract) a random epsilon value and append to list
    for i in range(len(n)):
        pert = n[i]+np.random.normal(0,1)
        if pert < 0:
            pert = 0
        nEps.append(pert)
    return nEps

In [22]:
# Function call, list concatenate and print
nEps = NobError(n)

pertChocNob = list(zip(c, nEps))
print(pertChocNob)

[(2.039420922408267, 0.5066008714880245), (0.4385412419685669, 0), (3.829137820258577, 0), (9.99116057546959, 1.9365673300435702), (6.72047403974242, 2.400725760894348), (11.14717695303612, 3.104993095670492), (4.157629123065477, 0), (8.98275371528634, 2.7642322839966735), (13.671935542712697, 5.465743598110979), (11.751879261969446, 3.809180446500627), (3.590816168238276, 0.8960144140048814), (6.69340269611882, 0.482704545977243), (2.2910644707361523, 0.7633135251814078), (6.052617729443012, 2.4876797208896457), (3.027037970285286, 1.143475660793561), (2.7875267594195297, 0), (8.31088030347022, 2.449513447862985), (2.622396342278278, 0), (8.259512072698382, 3.4488590971145743), (10.637032841214241, 3.5753786211498846), (7.028646596335229, 1.880323704989523), (8.0787766709966, 2.9237176165188443), (3.982324822711333, 1.028237892523525), (6.788736704314689, 2.021641875267559), (8.461758461716641, 3.34977080304668), (14.525217615621742, 6.758955510759703), (7.417756704650074, 3.023627400

### Problem 3(c).  Winners and losers.

🎯 Make a new list consisting of all of the ordered pairs $(c,n_e)$ from your list from part (b) such that $n_e > 0.4\cdot c -0.8$ (the $n$ value increased upon perturbation). 

🎯 Print this new (shorter) list. 
Use Python to calculate how many items are in your list of winners.  Hint to help you know when you have the answer correct: since the perturbation has mean 0, we expect about half to have gone up...

In [23]:
# Determine which tuples saw an increase from perturbation and make a new list
def Winners(pertList):
    winnerList = []
    # List comprehension that I still don't fully understand the syntax of
    winnerList = [i for i in pertList if i[1] > (0.4*i[0]-0.8)]
    return winnerList

In [24]:
# Function calls and print
winning = Winners(pertChocNob)
print(winning)
print(len(winning))

[(2.039420922408267, 0.5066008714880245), (0.4385412419685669, 0), (6.72047403974242, 2.400725760894348), (13.671935542712697, 5.465743598110979), (3.590816168238276, 0.8960144140048814), (2.2910644707361523, 0.7633135251814078), (6.052617729443012, 2.4876797208896457), (3.027037970285286, 1.143475660793561), (8.259512072698382, 3.4488590971145743), (10.637032841214241, 3.5753786211498846), (8.0787766709966, 2.9237176165188443), (3.982324822711333, 1.028237892523525), (6.788736704314689, 2.021641875267559), (8.461758461716641, 3.34977080304668), (14.525217615621742, 6.758955510759703), (7.417756704650074, 3.0236274002530115), (13.982115454264326, 5.579080403669058), (13.065932631843635, 5.299340749797961), (0.5571089378318572, 0.06867583247106165), (2.6388793378983206, 1.0208967031433567), (8.58077598336137, 2.8937973780031796), (5.164734682260544, 1.4226876655524117), (3.4182204022322416, 1.201771826553507), (12.036560663444671, 5.161932147318792), (12.208958666858932, 4.9456127379664

# Problem 3(d). Preparing data to transfer to R.

🎯 For future use, split your list of ordered pairs from Problem 3(b) (not 3(c)!) into two lists.  

1. The first list should contain the $c$ values, and 
2. the second list should contain the $n_e$ values.

To actually transfer data to R, we would write it to a file, perhaps as a csv file.  We'll leave that for later.

In [25]:
c,ne = list(zip(*pertChocNob))

print("c: ",c)
print("\nne: ",ne)

c:  (2.039420922408267, 0.4385412419685669, 3.829137820258577, 9.99116057546959, 6.72047403974242, 11.14717695303612, 4.157629123065477, 8.98275371528634, 13.671935542712697, 11.751879261969446, 3.590816168238276, 6.69340269611882, 2.2910644707361523, 6.052617729443012, 3.027037970285286, 2.7875267594195297, 8.31088030347022, 2.622396342278278, 8.259512072698382, 10.637032841214241, 7.028646596335229, 8.0787766709966, 3.982324822711333, 6.788736704314689, 8.461758461716641, 14.525217615621742, 7.417756704650074, 13.982115454264326, 7.419238277458218, 13.065932631843635, 3.7504994366403226, 3.242469405078325, 12.082597652420919, 13.477177034805695, 0.5571089378318572, 8.450124067171451, 2.6388793378983206, 11.659927624641774, 8.58077598336137, 5.164734682260544, 3.6424541304559805, 3.4182204022322416, 12.036560663444671, 12.208958666858932, 3.196101107200165, 7.7341345786567475, 4.56043035220268, 4.139055494687497, 6.996353310551985, 2.232535213551414)

ne:  (0.5066008714880245, 0, 0, 1