# Workshop 02:  Python Arithmetic, Functions, Numpy, and Plotting
Welcome to the 2nd Workshop for Physics 77/88! 

**In this workshop you will look at:**
 - Refresher of simple arthmetic with Python.
 - Python function defintions.
 - Python numpy array operations.
 - Python plotting.


## Submission
To submit this assignment, rerun the notebook from scratch (by selecting `Kernel` -> `Restart & Run all`), and then save as a pdf (`File` -> `Download as` -> `PDF via LaTeX (.pdf)`) **AND** save as a iPython Notebook (`File` -> `Download as` -> `Notebook (.ipynb)`). Submit **both** of these files to bCourses.

**For full credit, this assignment should be completed and submitted before Friday, September 9, 2022 at 11:59 PM. PST**

# Setup
Let's begin by importing the libraries we will use. You can find the documentation for the libraries here:
* math: https://docs.python.org/3/library/math.html 
* numpy: https://docs.scipy.org/doc/
* scipy: https://docs.scipy.org/doc/scipy/reference/index.html
* matplotlib: https://matplotlib.org/3.1.1/contents.html

In [None]:
import math
import numpy as np
import scipy.special
import matplotlib.pyplot as plt

# Question 1: Python `math` and the Poisson function.
A Poisson function is given by the expression below: $$Pois(k, \lambda) = \lambda^{k}\frac{e^{-\lambda}}{k!}$$

It has two input arguments: $k$ and $\lambda$. In a "counting" experiment, $k$ represents the number of occurences of certain experimental outcomes, while $\lambda$ represents the expected number of occurrences of the same experimental outcome. The output of the Poisson function is the probability of observing $k$ events while $\lambda$ events are expected.

For example, at a hypothetical airport coded ABC, on average, the number of departing flights that are delayed for more than 30 minutes is 10 per day ($\lambda = 10$). One can then can ask what is the probability of having exactly ten flights ($k = 10$) delayed by over 30 minutes in a single day? What is the probability of having exactly zero flights ($k = 0$) delayed by over 30 minutes in a single day?

Before we answer those questions, let's check out a few functions that comes with Python's `math` library.

In [None]:
# You do not need to change any code here; just run this cell
# Although you can always check documentation, in practice, it is
# nice to have some of the most useful functions memorized.
x = 1.5
y = math.exp(x)
z = math.factorial(4)
w = math.sqrt(x)

print("y: {}, z: {}, w: {}".format(y, z, w))

x = math.log(10)
y = math.cos(x)
z = math.log(100,10)
w = math.pi

print("x: {}, y: {}, z: {}, w: {}".format(x, y, z, w))

Fill in the following code cell to answer the two aforementioned questions:

Given that $\lambda = 10$

1. What is the probability of having exactly ten flights ($k = 10$) delayed by over 30 minutes in a single day?
    
2. What is the probability of having exactly zero flights ($k = 0$) delayed by over 30 minutes in a single day?

In [None]:
# TODO: Fill out this code cell to print out the probabilities to the questions above.

# Question 1
lambda_value = ...
k = ...
prob1 = ...
print(prob1)

# Question 2
lambda_value = ...
k = ...
prob2 = ...
print(prob2)

# Question 2: Python `for` loops and `numpy` arrays.
As a continuation to the flight delay problem in Question 1, keeping $\lambda = 10$, what is the **probability of having *at least eight* flights delayed over 30 minutes in a single day?**

We will determine the answer to this question in two steps:

1. Define a Poisson function that takes $k$ and $\lambda$ as input arguments and returns $Pois(k, \lambda)$, the Poisson probability (refer above for formula)

2. Since the question is about the probability of having at least 8 delayed flights, one would, in principle, need to do an integral from 8 to infinity, $\int_{8}^{+\infty} Pois(k, \lambda) \,dk$. The infinity is a problem, but we can use a probabilty trick and take the **complement**, allowing us to instead calculate $1 - \int_{0}^{7} Pois(k, \lambda) \,dk$.

Additionally, since we are only working with discrete values, $k = 0, 1, ... 7$ we can take the **sum** as opposed to the integral. This means all we need to do is write code to compute $1 - \sum_{k=0}^{7} Pois(k, 10)$

We will implement this expression in **two different ways**.

## Part 2.a: Calculate the probability using `for` loop
In this first approach, we will calculate the Poisson probability for each $k \in [1, 2, \dots, 7]$, and then use a `for` loop to sum up the Poisson probabilities. To start, we will define a function that takes single-valued quantities (expectation and observation) as input arguments and return the Poisson probability.

In [None]:
# TODO: Define your own Poisson function. Do not use any pre-defined Poisson functions from Python libraries

# This function takes in x, y (expectation and obseration) as the input arguments,
# and returns a Poisson probability.
def Poisson_function(x, y):
    ...

Before we write our own `for` loop to sum up the Poisson probabilities, let's first get an idea of their structure and hwo they work.

In [None]:
# Just run this cell. You are encouraged to tinker and change the inputs and see how that
# changes the values that are printed out.

# Basic for loop example
for i in range(8):
    print("Value within for loop:", i)
    
# How do you sum up values using a for loop?
# A common coding practice is to instantiate an intial variable to 0 outside of the for loop
# and then add to this variable while within the for loop.
mySum = 0 # initial variable
for i in range(8):
    mySum += i
    
print("Value of mySum:", mySum)

With the `Poisson_function` defined and a better idea of how `for` loops work in Python. Let's compute the probability in question and display it using the `print` function.

In [None]:
# TODO: Find the probability of having at least 8 flights delayed by over 30 minutes in a single day.
prob = ...
for ... in range ...:
    prob += ...
    
print("The probability is:", ...)
print("The probability rounded to three decimal points is: {:5.3f}".format(...))

## Part 2.b: Calculate the probability using `numpy` arrays
Now, let's try to solve the same problem with a `numpy` array.

In [None]:
# Just run this cell.
# Here, we are creating our array of observations (our k values).
# To do so, we use the linspace function of numpy
k_s = np.linspace(0, 7, 8)
k_s

In [None]:
# A numpy array has shape, dimension, and size.
# Let's print out these properities to take a closer look at them.
print("k_s array size: {:7}".format(k_s.size))
print("k_s array dimension: ", k_s.ndim)
print("k_s array shape:     ", k_s.shape)
print("k_s array value(s):  ", k_s)

It is often useful to know these properties for a given `numpy` array. Let's define a function that prints these properties for any given `numpy` array.

In [None]:
# TODO: Write a function that takes a numpy array as input argument
# and prints out these three properties as well as the array itself.

def np_analyzer(a):
    print("input array size: {:7}".format(...))
    print("input array dimension: ", ...)
    print("input array shape:     ", ...)
    print("input array value(s):  ", ...)

np_analyzer(k_s)

Now, let's create our array of expectations and inpsect it with our newly defined `np_analyzer` function.

In [None]:
# Just run this cell
# Here, we are creating our array of expectations (our lambda values).
# To do so, we use the ones function of numpy

lambda_ones = np.ones(8) # what does the call to np.ones do?
np_analyzer(lambda_ones)

# How was lambda_s_ones created? What was passed in as the input argument?
print()

lambda_tens = 10*lambda_ones
np_analyzer(lambda_tens)

Now, with our array of observations `k_s` and array of expectations `lambda_tens`, are we able to pass these in as inputs to the previous function that we defined, `Poisson_function`?

In [None]:
# Just run this cell
Poisson_function(k_s, lambda_tens)

It doesn't work. But there is a way around this; we can **vectorize** the function, creating a new function that takes in arrays as inputs. We will do so by using the `vectorize` function of `numpy`.

**Note:** Ignore the intimidating red box of text that appears.

In [None]:
# Just run this cell.
# Here, we vectorize our Poisson_function, returning a new function that takes in numpy arrays as input.
vec_poisson = np.vectorize(Poisson_function)

# Calling vec_poisson on our two arrays will return a numpy array containing each Poisson probability computed from
# all eight pairs of k and lambda.
prob_s = vec_poisson(k_s, lambda_s)
np_analyzer(prob_s)

With the `prob_s` array containing the Poisson probability of all eight pairs of $k$ and $\lambda$, we can now compute the probability in question.

In [None]:
# TODO: Find the probability of having at least 8 flights delayed by over 30 minutes in a single day using prob_s.
prob_numpy = np.sum(...)
print(...)

We have now calculated the probabilty in question using two different ways. Let's go even further and solve it a third way. This time, let's define a function that takes `numpy` arrays as the *input* as opposed to individual values and vectorizing it. But before that, let's familiarize ourselves with some more array operations.

In [None]:
# Just run this cell.
# Observe the output and try to understand what these different operations do.
a = np.ones((2,3))
np_analyzer(a)

b = np.eye(2,3)
np_analyzer(b)

c = 2*a + b
np_analyzer(c)

d = c**2
np_analyzer(d)

We can see that for `d = c**2`, each element in the output array `d` is the squared value of its corresponding element in array `c`. Now let's observe what happens with `e = c**b`.

In [None]:
# Just run this cell.
e = c**b
print(e)

The operation is done element-wise! What about for exponentials?

In [None]:
# Just run this cell.
f = np.exp(-1*d)
print(f)

And factorials?

In [None]:
# Just run this cell.
g = scipy.special.factorial(d)
print(g)

With all of this in mind, let's define our new Poisson function that takes in `numpy` arrays as inputs.

In [None]:
# TODO: Define your own Poisson function. Do not use any pre-defined Poisson functions from Python libraries

# This function takes in arrays exp, obs (expectation and obseration) as the input arguments,
# and returns an array of Poisson probabilities.
def Poisson_function_with_arrays(exp, obs):
    return ...


result = Poisson_function_with_arrays(lambda_tens, k_s)
print("Array of Poisson Probabilities:\n", result, "\n")

prob_third = 1 - np.sum(result)
print("Probability in question:", prob_third)

# Question 3: Data Visualization with Python `matplotlib`.
Let's visualize the Poisson distribution. Assume the expectation $\lambda = 10$; let's draw the probability of observing outcome $k$ as a function of $k$.

In [None]:
# Just run this cell.
# We will reuse many of the code that we've written previously in this workshop.
# Let's start by creating an observation array from 0 to 20.
obs = np.linspace(0, 20, 21)

# We will also create a 1-D expectation array of shape (21,) with all elements = 10
exp = np.ones(21) * 10

# Now we compute the probability
q3_prob = Poisson_function_with_arrays(exp, obs)

print("Array of Poisson Probabilities:\n", q3_prob)

Incidentally, now that the probability array for observations span from 0 to 20, we could also get the same sum of Poisson probabilites from Question 2 using array **slicing**:

In [None]:
# Just run this cell
1 - np.sum(q3_prob[:8])

With probability properties in mind, what is the relationship between `np.sum(prob[8:])` and `1 - np.sum(prob[:8])`, or is there even one at all?

In [None]:
# Just run this cell.
print(sum(q3_prob[8:]))
print(1 - sum(q3_prob[:8]))

Let's move on to making some plots. Let's start by making a scatter plot that depicts the relationship between our observations `obs` and our Poisson probabilities `q3_probs`.

In [None]:
# TODO: Add labels to the x and y axes. The x label should be "Outcome" and the
# y label should be "Probability." If you don't remember the functions, refer to Workshop 1.
plt.scatter(obs, q3_prob)
...

Let's create a **histogram** with the same distribution, but first, an example:

In [None]:
# Just run this cell.

# x values 
x = np.linspace(5.5,35.5,31)
print(x)

# Create a weight array, which is dummy in this example.
np.random.seed(1)
weights = np.random.normal(100, 5, 31)

print(weights)

# Plot the histogram.
plt.hist(x, bins=30, range=(5,35), weights=weights);

**Explanation:**

Above is a weighted histogram. There are only 31 entries, i.e., the 31 elements of array `x`.
The `weights` parameter of the `plt.hist` function is used to set proper values for the y-axis.

The `range` parameter is set to make sure the histogram has x-axis range from 5 to 35.

And finally, the `bins` parameter is used to create 30 uniform bins along the x-axis.

With this example, it is now your turn to create a hisogram for the Poisson distribution of $Poisson(k, 10)$

In [None]:
# TODO: Plot the histogram of a Poisson(k, 10) distribution.
# Don't forget to label your x and y axes.
x = np.linspace(0.5,20.5,21)
...

plt.hist(..., bins=..., range=..., weights=...);

**Congratulations on finishing Workshop 2!**