# Problem Set 3

CBE 60258, University of Notre Dame. © Prof. Alexander Dowling, 2022

You may not distribution homework solutions without written permissions from Prof. Alexander Dowling.

**Your Name and Email:**

## Import Libraries

In [None]:
import math
import numpy as np
import scipy.stats as stats
from scipy import optimize

## 1. Overflow and Algorithm Issues

Over the past few years the lottery has reached amazing heights, in part due to the decrease in the individual odds of any set of numbers winning from 1/175M to 1/292.2M which begain in the fall of 2015. The all-time record payout drawing was on 1/13/16 and was worth $\$1.586B$ which, after taxes, was roughly $\$590M$. About $\$1B$ in tickets were sold at $\$2$ per ticket.

What is the probability there will be exactly $k$ winners for the lottery? If we assume that everyone choose to use the "computer option" to pick their numbers randomly, these probabilities can be computed from the binomial distribution.

Here is the probability mass function (PMF) of the binomial distribution:

\begin{equation}
x(p,k,N) = \binom{N}{k} p^{k}(1-p)^{N-k}
\end{equation}

It uses the binomial coefficient, which is another name for combinations:

\begin{equation}
\binom{N}{k} = \frac{N!}{k!(N-k)!} \label{eq:2}
\end{equation}

$N$ is the number of events (e.g., coin flips), $k$ is the number of "successes" (e.g., heads) and $p$ is the probability of success for a single event (e.g., heads for a single coin flip).

In this problem, you are given a function to calculate n choose r, shown below.

In [None]:
# define a function to calculate n choose r
def nCr(n,r):
    ''' Calculates n choose r
    Args:
        n : total number of items to choose from
        r : number of items to choose
    
    Return:
        Total number of combinations, calculated with factorial formula
    
    '''
    f = math.factorial
    # Note: Using integer division, //, prevents overflow errors in Python 3
#     return f(n) // f(r) // f(n-r)
    return f(n) / f(r) / f(n-r)
# End: define a function to calculate n choose r

### Setup

In [None]:
# Given data
# Number of people that purchased tickets
N = 500e6;
# Odds of winning
p = 1/(292.2e6);
# Net payout
net_pay = 590e6 # $

### 1a. Naive Calculation

Let's start with something simple: calculate the probability of exactly 2 winners.

We will first start with computing the binomial coefficient.

\begin{equation}
\binom{N}{k} = \frac{N!}{k!(N-k)!} 
\end{equation}


Let's divide this calculation into three pieces.

In [None]:
k = 2 # exactly 2 winners
print("k! = ",math.factorial(k))

Good, that worked. For the two remaining factorial calculations, we are going to add some extra code that will time out our calculation after 10 seconds.

In [None]:
# Configure the timer. You do not need to worry about anything in this box.
import signal
MAX_TIME = 1 # seconds. You may change this to 1 so that restart-run alls go faster

def signal_handler(signum, frame):
    raise Exception("Timed out! This cell took longer than "+str(MAX_TIME)+ " seconds.")

signal.signal(signal.SIGALRM, signal_handler)

In [None]:
signal.alarm(MAX_TIME)
print("N! = ",math.factorial(N))
signal.alarm(0)

Suggest at least one reason why `math.factorial` did not work for $N$ and $N-k$. Submit this discussion via **Gradescope**.

Do not fear, we can use Stirling's approximation (https://en.wikipedia.org/wiki/Stirling%27s_approximation).

$$
n! \approx \sqrt{2 \pi n} \left( \frac{n}{e} \right)^n
$$

Finish the function below.

In [None]:
def stirling(n):
    ''' Approximate n!
    
    Args:
        n
        
    Returns:
        n_factorial    
    '''
    
    # Add your solution here

Now test your code by computing $10!$. Store your answer in the dictionary `unit_test` below.

In [None]:
unit_test = {}
unit_test['math.factorial'] = 0
unit_test['stirling'] = 0

# Add your solution here

In [None]:
# Removed autograder test. You may delete this cell.

We can now calculate try to calculate the probability of exactly 2 winners. We will break the calculation into several steps. Store your answer and the intermediates in the dictionary `naive_calc`.

In [None]:
## Naive calculation

k = 2

# Give our function stirling a nickname
f = stirling

naive_calc = {}
# k winners. Student update
naive_calc['p**k'] = 0

# (1-k) losers. Student update
naive_calc['(1-p)**(N-k)'] = 0

print(naive_calc)

# Binomial coefficient. Student update
naive_calc['nCk'] = 0

# Add your solution here

Alright, now let's compute $\binom{N}{k}$

In [None]:
# N!. Student update
naive_calc['k!'] = 0

# Give our function stirling a nickname
f = stirling

# N!. Student update
naive_calc['N!'] = 0

# (N-k)!. Student update
naive_calc['(N-k)!'] = 0

# Add your solution here

print(naive_calc)

In a sentence, describe what happens (for example: did you see any warning messages?).

**Discussion:**

This problem prompts a brief discussion about floating point numbers. Computers use binary (0 or 1) to store all numbers. A bit is a single binary number.

A double-precision floating-point using 64 bits to store a decimal in scientific notation on a computer.

![floating_point](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/IEEE_754_Double_Floating_Point_Format.svg/1920px-IEEE_754_Double_Floating_Point_Format.svg.png)

1 bit is used for the sign, 11 bits are used for exponent, and 52 bits are used for the fraction. Let's focus on the exponent for a minute.

$$2^{11} = 2048$$

For many computers, this means that double-precision floating-point numbers must be between $10^{-1022}$ and $10^{1023}$. Here is the main take away: a computer cannot store arbitrarily large or small decimals with standard floating-point numbers. Python returns an overflow error if we exceed these bounds.

Determine the maximum number of lottery tickets ($N$) before Stirling's approximation gives an overflow error. Store your answer as an integer in `N_max`. Hint: Start at 165 and start incrementing.

In [None]:
# Add your solution here
print(N_max)

In [None]:
# Add your solution here

### 1b. First Approximation

Alright, we need to new strategy to approximate the binomial coefficient. We will now build an asymptotic approximations. You'll see this idea again in Transport.

A mathematician would write:

\begin{equation}
N >> k \implies N - k \approx N
\end{equation}

Let's take a minute to unpack this notation:

\begin{equation}
\underbrace{N}_{\mathrm{lottery \ participants}} \qquad
\underbrace{>>}_{\mathrm{much \ greater \ than}} \qquad
\underbrace{k}_{\mathrm{lottery \ winners}} \qquad
\underbrace{\implies}_{\mathrm{implies}} \qquad
N - k 
\underbrace{\approx}_{\mathrm{approximately \ equals}} 
N
\end{equation}

This allows us to make the following simplifications:

\begin{align}
(N-k)! &= (N-k) \times (N-k-1) \times ... \times 1 \\
& = (N-k) \times (N-k-1)  \times ... \times 1 \times
\left(\frac{N \times (N-1)  \times ... \times (N-k+1)}{N \times (N-1)  \times ... \times (N-k+1)} \right)\\
& = \frac{N \times (N-1) \times ... \times (N-k+1) \times (N-k) \times (N-k-1)  \times ... \times 1 }
{N \times (N-1) \times ... \times (N -k + 1)} 
\end{align}

Notice that the numerator above is the definition of $N!$. We can use $N-k \approx N$ to simplify the demoniation to $N$ multiplied by itself $k$ times. Thus we have:

\begin{align}
(N-k)! \approx \frac{N!}{N^{k}} 
\end{align}

**Instructions:** Starting with the definition for the binomial coefficient, show that:

\begin{equation} \label{finapp1}
\binom{N}{k} \approx \frac{N^k}{k!}
\end{equation}

In other words, write out all of the mathematical steps to go from $\binom{N}{k}$ to $\frac{N^k}{k!}$, similar to a proof. Submit your written work via **Gradescope**.

### 1c. Second Approximation

Evaluation of $(1-p)^{N-k}$ can also be problematic. Construct a mathematical argument (i.e., proof) to establish that:

\begin{equation}
(1-p)^{N-k} \approx e^{-Np}
\end{equation}

*Hint*: Consider the following Taylor series expansion:

\begin{equation}
\log_e (1 - p) = -p - \frac{p^2}{2} - \frac{p^3}{3} - ...
\end{equation}

and truncate as follows:

\begin{equation} \label{tsep}
\log_e (1 - p) \approx -p
\end{equation}

The higher order terms ($p^2$, $p^3$, ...) are incredibly small and can be ignored with negligible consequence.

Submit your written work via **Gradescope**.

### 1d. Take Two: Calculating the Odds

Use the approximations from 1b and 1c to estimate the probability of exactly 0, 1, 2, 3, and 4 winners of the jackpoint.

Solve for the simplified, approximate governing formula on pencil and paper and submit via **Gradescope**.

Implement your equation in Python and save your answers as a list or numpy array named `ref_prob`.

In [None]:
## Refined calculation
# Add your solution here

for i in range(5):
    print("Probability of {0} winners: {1:0.4f}".format(i,ref_prob_fun(i)));

In [None]:
# Removed autograder test. You may delete this cell.

### 1e. Expected Return

What is your expected return (after taxes) from purchasing just one of the 500 million tickets? Assume you elect for the lump sum payment. If more than one ticket "hits", the jackpot is split evenly among the holders of the winning tickets.

*Hint*: You can calculate the expectation using $k=0$ through $k=4$ without introducing significant errors.

Solve for the governing formula on pencil and paper and submit via **Gradescope**.

Implement your equation in Python and save your answers as `exp_pay`.

In [None]:
## Expected return

# Add your solution here

# Display results
print('The expected return is $ {0:0.2f}'.format(exp_pay));

In [None]:
# Removed autograder test. You may delete this cell.

### 1f. Qualitative Analysis

Interestingly, research has shown that if people pick their own numbers rather than letting the computer do it, the choices are highly non-random. In particular, people usually pick numbers from 0-31 (corresponding to birthdays) while the lottery numbers go from 0-69. In view of this, qualitatively answer the following questions: 

1. Does the probability of any individual’s lottery ticket winning change?
2. Does the probability of getting multiple winners change (e.g., do you expect the payout to a winning ticket decrease due to splitting the prize)?
3. Does the probability of no one winning the lottery change?
4. For financial reasons, why are you better off letting the computer pick the numbers than picking birthdays?
5. Can you think of an even better strategy which would lead to higher winnings?

Write 1-3 sentences for each question. Submit your answer via **Gradescope**.

**Discussion:**
1. Fill in here.
2. Fill in here.
3. Fill in here.
4. Fill in here.
5. Fill in here.

## 2. Good and Bad Days at the Factory

A bioreactor and separator produce 1000 batches of therapeutic proteins each day. Each batch has a 90% chance of meeting specifications. Using this information, answer the following:

### 2a. Probability Model

Which probability distribution can you use to model the manufacturing process? What assumptions do you need to make? 

**Answer:**

### 2b. Good Day

If more than 890 batches in a day meet specifications, we will call it a good day.

Compute the probability of a good day and store it as the variable `good`. *Hint*: First you will need to write a function which calculates the total number of combinations of good and bad batches in a day.

In [None]:
# Define function
# Add your solution here

In [None]:
# probability of more than 890 batches meet specifications
# Add your solution here

In [None]:
# Removed autograder test. You may delete this cell.

### 2c. Bad Day

Likewise, we will call a day with 890 or fewer on specification batches a bad day.

Compute the probability of a bad day and store your answer in the variable `bad`. *Hint*: How can you do this quickly using your previous answer?

In [None]:
# Add your solution here

In [None]:
# Removed autograder test. You may delete this cell.

### 2d. Bad Days in a Work Week

Using your answer from above, compute the probability of 0, 1, 2, ..., 5 bad days in a 5-day work week. Store your answer as a list called `weekly`.

In [None]:
# Add your solution here

In [None]:
# Removed autograder test. You may delete this cell.

## 3. Numeric Integration of Partial Differential Equations with Pyomo

In [None]:
%matplotlib inline

# Import plotting libraries
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.axes3d import Axes3D 

import sys
if "google.colab" in sys.modules:
    !wget "https://raw.githubusercontent.com/IDAES/idaes-pse/main/scripts/colab_helper.py"
    import colab_helper
    colab_helper.install_idaes()
    colab_helper.install_ipopt()

# Import Pyomo
import pyomo.environ as pyo

# Import Pyomo numeric integration features
from pyomo.dae import DerivativeVar, ContinuousSet

During your time at Notre Dame, you will likely want (or at least need) to solve a partial differential equation (PDE) system. In this problem, we will practice using Pyomo to numerically integrate a simple and common PDE. (Special thanks to Prof. Kantor for this problem.)

Transport of heat in a solid is described by the familiar thermal diffusion model:

$$
\begin{align*}
\rho C_p\frac{\partial T}{\partial t} & = \nabla\cdot(k\nabla T)
\end{align*}
$$

We will assume the thermal conductivity $k$ is a constant, and define thermal diffusivity in the conventional way

$$
\begin{align*}
\alpha & = \frac{k}{\rho C_p}
\end{align*}
$$

We will further assume symmetry with respect to all spatial coordinates except $x$ where $x$ extends from $-X$ to $+X$. The boundary conditions are

$$
\begin{align*}
T(t,X) & = T_{\infty} & \forall t > 0 \\
\nabla T(t,0) & = 0 & \forall t \geq 0 
\end{align*}
$$

where we have assumed symmetry with respect to $r$ and uniform initial conditions $T(0, x) = T_0$ for all $0 \leq r \leq X$. 

### 3a. Rescaling and Dimensionless Model

We would like a dimensionless model for two reasons: first, we only need to solve the dimensionless model once, i.e., it becomes independent of input data. Second, the dimensionless models are often scaled better for numerical solutions.

Let's consider the following proposed scaling procedure:

$$
\begin{align*}
T' & = \frac{T - T_0}{T_\infty - T_0} \\
x' & = \frac{r}{X} \\
t' & = t \frac{\alpha}{X^2}
\end{align*}
$$

Show this scaling procedure gives the following dimensionless system:

$$
\begin{align*}
\frac{\partial T'}{\partial t'} & = \nabla^2 T'
\end{align*}
$$

with auxiliary conditions

$$
\begin{align*}
T'(0, x') & = 0 & \forall 0 \leq x' \leq 1\\
T'(t', 1) & = 1 & \forall t' > 0\\
\nabla T'(t', 0) & = 0 & \forall t' \geq 0 \\
\end{align*}
$$

Turn in your work (pencil and paper) via **Gradescope**. *Important:* Here the prime $'$ indicates the scaled variables and coordinates. It does not indicate a derivative. Thus $T'$ is scaled temperature, NOT the derivative of temperature (which begs the question of "with respect to what?").

### 3b. Numeric Integration via Pyomo

For simplicity, let's consider planar coordinates. For a slab geometry, we want to numerical integrate the following PDE:

$$
\begin{align*}
\frac{\partial T'}{\partial t'} & = \frac{\partial^2 T'}{\partial x'^2}
\end{align*}
$$

with auxiliary conditions

$$
\begin{align*}
T'(0, x') & = 0 & \forall 0 \leq x' \leq 1 \\
T'(t', 1) & = 1 & \forall t' > 0\\
\frac{\partial T'}{\partial x'} (t', 0) & = 0 & \forall t' \geq 0 \\
\end{align*}
$$

Complete the following Pyomo code to integrate this PDE.

In [None]:
# Create Pyomo model
m = pyo.ConcreteModel()

# Define sets for spatial and temporal domains
m.x = ContinuousSet(bounds=(0,1))
m.t = ContinuousSet(bounds=(0,2))

# Define scaled temperature indexed by time and space
m.T = pyo.Var(m.t, m.x)

# Define variables for the derivates
m.dTdt   = DerivativeVar(m.T, wrt=m.t)
m.dTdx   = DerivativeVar(m.T, wrt=m.x)
m.d2Tdx2 = DerivativeVar(m.T, wrt=(m.x, m.x))

# Define PDE equation
def pde(m, t, x):
    if t == 0:
        return pyo.Constraint.Skip
    elif x == 0 or x == 1:
        return pyo.Constraint.Skip
    # Add your solution here

m.pde = pyo.Constraint(m.t, m.x, rule=pde)

# Define first auxilary condition
def initial_condition(m, x):
    if x == 0 or x == 1:
        return pyo.Constraint.Skip
    # Add your solution here

m.ic = pyo.Constraint(m.x, rule = initial_condition)

# Define second auxilary condition
def boundary_condition1(m, t):
    # Add your solution here

m.bc1 = pyo.Constraint(m.t, rule = boundary_condition1)

# Define third auxilary condition
@m.Constraint(m.t)
def boundary_condition2(m, t):
    # Add your solution here  

m.bc2 = pyo.Constraint(m.t, rule=boundary_condition2)

# Define dummy objective
m.obj = pyo.Objective(expr=1)

# Discretize spatial coordinate with forward finite difference and 50 elements
pyo.TransformationFactory('dae.finite_difference').apply_to(m, nfe=50, scheme='FORWARD', wrt=m.x)

# Discretize time coordinate with forward finite difference and 50 elements
pyo.TransformationFactory('dae.finite_difference').apply_to(m, nfe=50, scheme='FORWARD', wrt=m.t)
pyo.SolverFactory('ipopt').solve(m, tee=True).write()

### 3c. Visualize Solution 

In [None]:
# Extract indices
x = sorted(m.x)
t = sorted(m.t)

# Create numpy arrays to hold the solution
xgrid = np.zeros((len(t), len(x)))
# Hint: define tgrid and Tgrid the same way
# Add your solution here

# Loop over time
for i in range(0, len(t)):
    # Loop over space
    for j in range(0, len(x)):
        # Copy values
        xgrid[i,j] = x[j]
        tgrid[i,j] = t[i]
        # Hint: how to access values from Pyomo variable m.T?
        # Add your solution here

# Create a 3D wireframe plot of the solution
# Hint: consult the matplotlib documentation
# https://matplotlib.org/stable/gallery/mplot3d/wire3d.html

# Add your solution here

**Submission Instructions and Tips:**
1. Answer discussion questions in this notebook.
2. When asked to store a solution in a specific variable, please also print that variable.
3. Turn in this notebook via Gradescope.
4. Also turn in written (pencil and paper) work via Gradescope.
5. Even if you are not required to turn in pseudocode, you should always write pseudocode. It takes only a few minutes and can save you *hours* of time.
6. For this assignment especially, read the problem statements twice. They contain important information and tips that are easy to miss on the first read.