# Probabilistic Programming

---


---

## **Chapter 1. Introduction**


---
Have you ever wondered the probability of having certain diseases given specific symptoms or how companies select interface designs for their website? Questions like these are often addressed using complicated statistical calculations. This module introduces probabilistic programming, an approach for data analysis that is significantly easier to understand, interpret, and implement. We will cover introductory background information, including the basics of probability and statistics, as well as discuss 'how-to' create and program probabilistic models in Python.

Similar to aspects of human cognition, probabilistic programming relies on inference based on prior beliefs, i.e., what we know or think about something. Probabilistic programming can be used to predict future outcomes and is considered a generative approach, meaning that we reason about the problem in terms of a process that generates the data, which is the main topic discussed in this chapter. We also introduce some of the fundamental concepts of probabilistic programming using intuition, and later elaborate on these concepts.


## What is Probabilistic Programming?


---
<a id='prob prog'>*Probabilistic programming*</a> is a tool that allows one to answer a question using logic, intuition, and <a id='prob'>probability</a> (i.e., the odds of an event occuring, which is expressed as the number of sucesses over the total number of objects or measurements) and can be used for variety of tasks, including prediction, anomaly detection, etc. As a technique, probabilistic programming combines probability theory, statistical inference, and computer programming to model a system and solve a problem. One advantage of probabilistic programming is that it is intuitive to learn and apply compared to traditional (or [frequentist](https://en.wikipedia.org/wiki/Frequentist_probability)) statistics, and provides a coherent understanding of the solutions, compared to many of the popular black box-type algorithms commonly used in machine learning.

<img src="media/chapter1/reference.jpg">

<img src="media/chapter1/kruschke.jpg">

<a id='example1.1'></a>
### Example 1.1: Constructing a Generative Model

A patient goes into the doctor with symptoms of an illness. We can determine the probability that a patient has a certain illness given their symptoms. For simplicity, let's assume that there are three possible illnesses a patient may have, and that a patient only has exactly one of the three.

1. bronchitis
2. common cold
3. seasonal allergies


For these three illnesses, there are three possible symptoms which indicate one of the three illnesses.

1. fever
2. coughing
3. sneezing

Suppose we also know the degree to which these three diseases influence symptoms of fever, coughing, and sneezing. For instance, we know that patients with a cold have a fever 10% of the time, sneeze 50% of the time and cough 40% of the time. We can use this information to determine the probability that the patient has seasonal allergies or the common cold. The first step in tackling the problem is to create a generative model.

## Generative models
---
A model that describes the dependencies between previous knowledge and data with statistics and probability is called a <a id='gen model'>*generative model*</a>. A generative model takes the probability of an event occuring from known data and uses it to generate new data. The simplest representation of a generative model is a graphical one. A graphical generative model illustrates <a id='state'>*states*</a> as boxes and these boxes are connected by **one-sided** arrows to illustrate <a id='dependency'>*dependencies*</a>.

Let's start with just looking at a generative model factoring in only the common cold. In this example, considering 10 people with a cold, 1 has a fever, 4 have a cough, and 5 people are sneezing. Each box represents a state (in this example, the boxes are the potential disease or symptoms) and the numbers next to the the arrows illustrate the probability of having an illness resulting in a symptom.The generative model for the cold and its symptoms (assuming an individual with a cold only has one symptom) is below:

<img src="media/chapter1/gen_model_cold.jpg">

A generative model is typically inferred based on knowledge and understanding of the problem. Generative models are called 'generative' because they illustrate how the data is generated. The code below shows how one can generate a patient with a symptom in Python.

In [1]:
# generative model for cold example
import random


# number of people with symptom out of 10
# (f = fever, s = sneezing, c = coughing)

fever = 1   # number of people with a cold & fever
cough = 4   # number of people with a cold & a cough
sneeze = 5   # number of people with a cold & a sneeze

# create list based off of number of people with cold

cold = ['fever'] * fever + ['coughing'] * cough + ['sneezing'] * sneeze

patient_symptom = random.choice(cold)

print(patient_symptom)

sneezing


We used function `choice()` from <a href="https://docs.python.org/2/library/random.html">`random` package</a> to generate a patient with a symptom. `choice(cold)` returns a random element from the non-empty sequence `cold`. In our example sequence `cold` consists of 1 `'fever'`, 4 `'coughing'` and 5 `'sneezing'`, according to the known proportions of symptoms ($1/10$, $4/10$ and $5/10$ respectively). 

Let's now return to our original <a href='#example1.1'>example</a> and create a generative model for it. lllnesses with symptoms similar to those arising in a patient with a cold can be illustrated using the diagram below:

![](media/chapter1/gen_model_illness.jpg)


As stated above, for the purposes of this problem, we know how much each illness influences the symptoms presented. For instance, we know the probability of a patient having a cold, as well as the probability of a patient who has a cold having sneezing as a symptom (we will discuss how these values, which are called parameters, are derived later).

The probability of a patient having the common cold is 4/10, that is, every 4 out of 10 patients who come to the doctor are diagnosed with a cold. Similarly, we know that every 5/10 patients who come to the doctor have seasonal allergies and every 1/10 patients has bronchitis.

We also know the probability that a certain illness will exhibit symptoms of fever, coughing, and sneezing (again, for the purposes of this example, a patient with an illness will only have one symptom).

**Bronchitis**: 2/10 patients with bronchitis have a fever, and 8/10 have a cough. Sneezing is, in this example, not a symptom (0/10 patients) of bronchitis.

**Common cold**: 1/10 patients with the common cold have a fever, 4/10 patients have a cough, and 5/10 patients are sneezing. 

**Seasonal allergies**: 3/10 patients with seasonal allergies have a cough, and 7/10 patients are sneezing. Fever is not a symptom (again, 0/10 patients) of seasonal allergies.

The following tables summarize the probability that a patient has a certain illness and the probability that that illness would cause a patient to have specific symptoms.

<html>
<table style="width:50%" align="center">
  <tr>
    <th>      </th>
    <th><b><center>Probability that patient has illness</center></b></th>
  </tr>
  <tr>
    <td><b><center>Bronchitis</center></b></td>
    <td><center>1/10</center></td>
  </tr>
  <tr>
    <td><b><center>Common cold</center></b></td>
    <td><center>4/10</center></td> 
  </tr>
    <tr>
     <td><b><center>Seasonal allergies</center></b></td>
    <td><center>5/10</center></td> 
  </tr>
</table>
<br>
<br>

<table style="width:50%">
  <tr>
    <th>      </th>
    <th><b><center>Probability that illness has symptoms of fever</center></b></th>
    <th><b><center>Probability that illness has symptoms of coughing</center></b></th> 
    <th><b><center>Probability that illness has symptoms of sneezing</center></b></th>
  </tr>
  <tr>
    <td><b><center>Bronchitis</center></b></td>
    <td><center>2/10</center></td>
    <td><center>8/10</center></td> 
    <td><center>0/10</center></td>
  </tr>
  <tr>
    <td><b><center>Common cold</center></b></td> 
    <td><center>1/10</center></td>
    <td><center>4/10</center></td>
     <td><center>5/10</center></td>
  </tr>
    <tr>
      <td><b><center>Seasonal allergies</center></b></td> 
    <td><center>0/10</center></td>
     <td><center>3/10</center></td>
     <td><center>7/10</center></td>
  </tr>
</table>
  </html>
    

With this information, we built the graphical generative model above. Now let's observe the data generated from our model, or more specifically, generate a patient with a symptom.


In [None]:
# @hidden_cell
#@title
#code will be hidden
# generative model for disease example
from ipywidgets import widgets, Output
import random
from IPython.display import HTML, display


button = widgets.Button(description = "Run the experiment!")
display(button)
out=Output()

bronch_link='media/chapter1/bronchitis.jpg'
cold_link='media/chapter1/cold.jpg'
al_link='media/chapter1/allergy.jpg'
bronch_guy="<td><img src="+bronch_link+"></td>"
cold_guy="<td><img src="+cold_link+"></td>"
al_guy="<td><img src="+al_link+"></td>"
black_link='media/chapter1/black.jpg'
black_guy="<td><img src="+black_link+"></td>"
arrow_link='media/chapter1/arrowside.jpg'
arrow="<td><img src="+arrow_link+"></td>"

b_c = 'media/chapter1/br_c.jpg'
b_f = 'media/chapter1/br_f.jpg'
b_s = ''
c_c = 'media/chapter1/cold_c.jpg'
c_f = 'media/chapter1/cold_f.jpg'
c_s = 'media/chapter1/cold_s.jpg'
a_c = 'media/chapter1/al_c.jpg'
a_f = ''
a_s = 'media/chapter1/al_s.jpg'
cough_dis = ["<td><img src="+b_c+"></td>","<td><img src="+c_c+"></td>","<td><img src="+a_c+"></td>"]
sneeze_dis = ["<td><img src="+b_s+"></td>","<td><img src="+c_s+"></td>","<td><img src="+a_s+"></td>"]
fever_dis = ["<td><img src="+b_f+"></td>", "<td><img src="+c_f+"></td>", "<td><img src="+a_f+"></td>"]




# number of people with illness out of 10 
#(br = bronchitis, cold = cold, al = allergies)

ppl_br = 1
ppl_cold = 4
ppl_al = 5

# create list based off of number of people with illness
illness = ['bronchitis'] * ppl_br + ['cold'] * ppl_cold + ['allergies'] * ppl_al


# number of people with illness with symptom 
# (f = fever, s = sneezing, c = coughing)

ppl_br_f = 2     # number of people with bronchitis & fever
ppl_br_c = 8     # number of people with bronchitis & a cough

ppl_cold_f = 1   # number of people with a cold & fever
ppl_cold_c = 4   # number of people with a cold & a cough
ppl_cold_s = 5   # number of people with a cold & a sneeze

ppl_al_c = 3     # number of people with allergies & a cough
ppl_al_s = 7     # number of people with allergies & a sneeze

bronchitis = ['fever'] * ppl_br_f + ['coughing'] * ppl_br_c
cold = ['fever'] * ppl_cold_f + ['coughing'] * ppl_cold_c + ['sneezing'] * ppl_cold_s
allergies = ['coughing'] * ppl_al_c + ['sneezing'] * ppl_al_s

def on_button_clicked(b):
    guys=black_guy+arrow
    ill = 0
    patient_illness = random.choice(illness)

    if patient_illness == 'bronchitis':
      patient_symptom = random.choice(bronchitis)
      guys = guys + bronch_guy
      ill = 0
    if patient_illness == 'cold':
      patient_symptom = random.choice(cold)
      guys = guys + cold_guy
      ill = 1
    if patient_illness == 'allergies':
      patient_symptom = random.choice(allergies)
      guys = guys + al_guy
      ill = 2

    guys = guys + arrow
  
    if patient_symptom == 'fever':
      guys = guys + fever_dis[ill]
    if patient_symptom == 'coughing':
      guys = guys + cough_dis[ill]
    if patient_symptom == 'sneezing':
      guys = guys + sneeze_dis[ill]
    
    out.clear_output()
    with out:
        display(HTML("<table><tr>"+guys+"</tr></table>"))
  
display(out)

button.on_click(on_button_clicked)

Later with the help of mathematics and probability theory we will be able to answer questions such as: 

* What is a probability that a patient has a cough? 
* What is the probability that a person who is sneezing has seasonal allergies versus the common cold? 

Such computations can be very time consuming as the model gets increasingly complex. Instead, we can answer our questions with the help of a computer, or more specifically with probabilistic programming. In later chapters, we will discuss the specifics on how to model this in Python.

The example above illustrates how to create a generative model. To solidify the material, try to answer the following questions.

### Quiz 1.1


---

1.   Which of the pictures below could be a graphical representation of a generative model?

<img src="media/chapter1/quiz1_1.jpg" width="400">


2.   New research has shown that out of $20$ people $2$ have a fever, $5$ have a cough, $6$ people are sneezing and $7$ have a sore throat. 


   *   What is the probability that a patient coming to a doctor has a cough?
   
          a)$2/20$   
          
          b)$5/20$   
          
          c)$6/20$   
          
          d)$7/20$
        
        
   *   What is the probability that a patient coming to a doctor has a sore throat?
   
          a)$2/20$
          
          b)$5/20$   
          
          c)$6/20$   
          
          d)$7/20$
        
        
   * Let's update the code from the example above to fit the new model. We had three variables representing the symptoms, and we need to add another variable `sorethroat`. What should be the values of these variables? Recall that they denote the number of patients having the symptom.
       

```
import random

fever =    
cough =    
sneeze =    
sorethroat = 

cold = ['fever'] * fever + ['coughing'] * cough + ['sneezing'] * sneeze
                              + ['sore throat'] * sorethroat

patient_symptom = random.choice(cold)

print(patient_symptom)
```



Let's now look at another example and actually solve it!

<a id="example1.2"></a>
### Example 1.2. A/B testing

---

A company has a website and they want visitors to sign up for email updates. They have hired you to determine which visual is optimal: Version A or Version B.

![](media/chapter1/ab.jpg)

Which version of the graphic will result in more people signing up for email updates? A/B testing is an approach that can be used to answer this question. The method used in A/B testing consists of testing Version A on one group of people and Version B on another, then estimating the number of people that sign up when presented with each version, and comparing them to see which one is better.



Before constructing a generative model for A/B testing, let's first try a simpler problem, namely one person that is tested with Version A. The generative model for one person can be described with the following illustration.

![](media/chapter1/ab_user.jpg)

For the time being, we are fixing $5$/$10$, the probability of one person signing up, because we don't yet have observations that suggest otherwise. A generative model creates data - by clicking the button below (try clicking it 5 times and keep track of the number of positive (green) and negative (red) responses), you are generating data for the model. Each time we run the generative model, we can have different outcomes.

In [None]:
# @hidden_cell
#@title
#this will be hidden

from IPython.display import HTML, display
from ipywidgets import widgets, Output
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st

p=0.5

button = widgets.Button(description = "Run the experiment!")
display(button)
out=Output()
display(out)

green_link='media/chapter1/green.jpg'
red_link='media/chapter1/red.jpg'
black_link='media/chapter1/black.jpg'
arrow_link='media/chapter1/arrow.jpg'
green_guy="<td><img src="+green_link+"></td>"
red_guy="<td><img src="+red_link+"></td>"
black_guy="<td><img src="+black_link+"></td>"
arrow="<td><img src="+arrow_link+"></td>"



def on_button_clicked(b):
    number_guys = 0
    test=st.bernoulli.rvs(p, size=1)
    guys=""
    for item in test:
      if item==1:
        guys=guys+green_guy
      else:
        guys=guys+red_guy
      number_guys+=item
    
    out.clear_output()
    with out:
        display(HTML("<table><tr>"+black_guy+"</tr></table>"))
        display(HTML("<table><tr>"+arrow+"</tr></table>"))
        display(HTML("<table><tr>"+guys+"</tr></table>"))


button.on_click(on_button_clicked)

Let's return to the original problem. For this problem, we are making the following assumptions:

1.   For both Versions A and B, there is an associated probability that an individual will respond positively.
2.   The probability of a positive response does not vary between individuals.

These assumptions allow us simulate people signing up using a generative model.

In order to perform A/B testing for Version A, we need to run this experiment on a testing group of say, 8 people. Instead of running the experiment introduced above on each person from a testing group, we can take an equivalent shortcut using the *binomial distribution*,  which arrives at the same answer, the number of sign ups. 

The <a id='bin dist'>*binomial distribution*</a> describes the probability of all possible outcomes of the experiment. The binomial distribution is performed for a number of objects $n$ (number of people in a testing group, in our example $n=8$) and the probability of success $p$ (that is, the probability of a sign up, so far we assumed $p=5/10=0.5$). Note that the probability of success $p$ is a number between $0$ and $1$.


Let's see how the distribution can create data. By clicking the button once you can observe how one outcome of the binomial distribution is generated. That is, each of 8 people either signs up (green) or not (red) with probability $p$. The binomial distribution returns the number of green users, which is added to a histogram below. A <a id='hist'>*histogram*</a> is a type of graph which shows the frequency or number of counts for numerical data in bins. If you click the button several more times, you will see how different outcomes are added to a histogram. Note that some numbers are generated more often than others.



In [None]:
# @hidden_cell
#@title
#this will be hidden
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st
from ipywidgets import widgets, Output
from IPython.display import HTML, display

button = widgets.Button(description = "Run the experiment!")
display(button)
out=Output()
display(out)

dist_values=[0]*9

green_link='media/chapter1/green.jpg'
red_link='media/chapter1/red.jpg'
green_guy="<td><img src="+green_link+"></td>"
red_guy="<td><img src="+red_link+"></td>"

def on_button_clicked(b):
    p=0.5
    test=st.bernoulli.rvs(p, size=8)
    guys=""
    number_guys = 0
    for item in test:
      if item==1:
        guys=guys+green_guy
      else:
        guys=guys+red_guy
      number_guys+=item
    out.clear_output()
    dist_values[number_guys]+=1
    fig = plt.figure(figsize=(14,5))
    plt.style.use('seaborn-darkgrid')
    x = np.arange(0, 9)
    ns = [8]
    ps = [p]
    for n, p in zip(ns, ps):
        plt.bar(x, dist_values)
    plt.bar(x[number_guys],dist_values[number_guys],color='green')
    plt.xlabel('number of sign ups', fontsize=14)
    plt.ylabel('counts', fontsize=14)
    plt.xticks(range(0,9))
    with out:
        display(HTML("<table><tr>"+guys+"</tr></table>"))
        plt.show()
#plt.yticks([]);
#plt.legend(loc=1)


button.on_click(on_button_clicked)

If one repeats the experiment many times, the graph will look as follows in the long run.

<img src="media/chapter1/longrun.png">


### Quiz 1.2


---

1. A coin is flipped 260 times. It s known that the coin is not fair, and that the probability of getting heads is $0.3$, while the probability of getting tails is $0.7$. At the end of experiment we compute how many flips resulted in **heads**. Such an experiment can be described with the binomial distribution. What are the parameters of this distribution?

  * parameter $n$ that describes the number of objects in a distribution is:
          
       a) $n = 1$
          
       b) $n = 2$
          
       c) $n = 78$
          
       d) $n = 182$
          
       e) $n = 260$
                   
   * parameter $p$ that describes the probability of success is:

       a) $p = 0.3$
          
       b) $p = 0.5$
         
       c) $p = 0.7$




2. Which of the following graphs are histograms?

<img src="media/chapter1/quiz2_1.jpg" width="500">



We know how we can generate this data if we have knowledge about $p$. At this point, however, we donâ€™t yet know $p$ (the probability per individual of a positive response), which is precisely the information we need to decide which version is better!

## Observations

To address this unknown $p$ for our model, we can run an experiment to generate observed data. Version A was tested on $8$ people and after running the test, $5$ out of $8$ users shown the first version of the graphic signed up for email updates. 

We also tested Version B - where the graphic of Version B was shown to $10$ people, and only $2$ out of $10$ people signed up for email updates. Knowing these experimental results, which version is better for recruiting more new users? Clearly, 5 out of 8 users is better than 2 out of 10, but we are interested not only in a single answer, as it also would be better to quantify our confidence that Version A is better than Version B. Probabilistic programming can help with that. 


![](media/chapter1/ab_test.jpg)





So, now we know how our model works. But we don't know anything about the parameter $p$ of the binomial distribution for each version, which is important for us to know in order to compare the two versions. We want to learn about this parameter. 


## Rejection Sampling

For simplicity, let's only consider Version A for the time being. How can we determine the most likely value of parameter $p$ for Version A? For that we will use a simple algorithm, or recipe, for Bayesian Inference, called <a id='rej'>*rejection sampling*</a>.  Rejection sampling can be applied using the following steps:

1.   Sample a parameter value from the set of possible values of $p$, or the prior (in our example set of possible values of $p$ is between $0$ and ,$1$).
2.   Generate an outcome, given that parameter value (in our example, run the test that is described by binomial distribution with parameter $p$ and number of people $8$)

3.  *   If the generated outcome is the observed data, record the value of the parameter $p$
   *   If the generated outcome is not the observed data, ignore the parameter $p$
4.   Repeat the procedure many times.

![](media/chapter1/rejection.jpg)

After we run the above procedure many times, say 1,000 times, we look at all the parameters $p$ that we accepted (recorded), and look at its histogram. At first, we assumed that $p$ can be any number between $0$ and $1$. Now using rejection sampling, we can see that some values of $p$ occur more often when we have information about the observed data.

Now we know how to infer the parameter for one of the strategies. The generative model for the Version B is similar, except it was tested on $10$ people, and $2$ responded positively. So if we want to infer about parameter $p$ for Version B we have to follow the same algorithm, but with $10$ people instead of $8$ in the binomial distribution.

Since we want to compare the parameters for each of the strategies, we need to combine both procedures. Before, each procedure recorded possible parameter $p$ that satisfied the observed data. Now we are going to record a pair of parameters ($p_A$ and $p_B$) and the observed data becomes a pair of numbers ($5$ and $2$). The algorithm becomes the following.

1.   Sample a parameter value for method A ($p_A$) and parameter value for method B ($p_B$) randomly between $0$ and $1$. In this example (and examples to follow where we generate data from a probability distribution, we use the [NumPy](http://www.numpy.org/) Python package). 
2.   Generate an outcome for Version A, given that parameter value (in our example, run the test that is described by the binomial distribution with parameter $p_A$ and number of people $8$), and generate an outcome for Version B, given that parameter value (in our example, run the test that is described by the binomial distribution with parameter $p_B$ and number of people $10$).
3.    *   If both generated outcomes are the same as the observed data, record the parameter values $p_A$ and $p_B$
     *   Otherwise, throw the parameters away
4.   Repeat the procedure many times.

This gives us two histograms that describe the behavior of parameters for both strategies $p_A$ and $p_B$. 



In [None]:
# import packages
import numpy as np
import matplotlib.pyplot as plt

# Version A
n_A = 8        # number of people Version A was tested on
obs_A = 5      # observed number of sign ups for Version A
# Version B
n_B = 10       # number of people Version B was tested on
obs_B = 2      # observed number of sign ups for Version B

p_A_recorded = []
p_B_recorded = []

num_of_repeats = 10000

for i in range(num_of_repeats):
  # Sample parameter values p_A and p_B using numpy from the uniform distribution 
  p_A = np.random.uniform(0, 1)
  p_B = np.random.uniform(0, 1)

  # For each p_A and p_B generate an outcome from the binomal distribution using numpy 
  v_A = np.random.binomial(n_A, p_A)
  v_B = np.random.binomial(n_B, p_B)

  # Compare generated outcomes to the observed data
  if v_A == obs_A and v_B ==obs_B:
    p_A_recorded.append(p_A)
    p_B_recorded.append(p_B)

nbins = 10

  # plot histogram 
fig = plt.figure() 
plt.xlim(0, 1)
plt.hist(p_A_recorded, histtype='stepfilled', bins = nbins, alpha = 0.75,
        label = "Version A", color = "#A60628", normed = False)
plt.hist(p_B_recorded, histtype='stepfilled', bins = nbins, alpha = 0.75
         label = "Version B", color = "#467821", normed = False)
plt.title("Number of Recorded Values for Version A vs Version B")
plt.legend(loc = 'upper right', fontsize=14)
plt.show()


The histograms for Versions A and B show that Version A results in a greater number of people signing up for email updates compared to Version B. Version A is thus the best strategy.

In this example we were able to apply a very simple method of Bayesian inference to determine the optimal solution. We sampled parameter values for $p_A$ and $p_B$ based off of our observed data, used those parameters to generate results from the binomial distribution, and repeated the experiment many times (we chose 10,000) to arrive at our final answer.

### Quiz 1.3


---



1.   According to the rejection sampling algorithm, when is the parameter $p$ recorded?

        a) we record all the parameters $p$ that were generated
        
        b) when generated data is the same as the observed data
        
        c) when generated data is not too far from the observed data
        
        
2.   A computer runs rejection sampling for Version A as described above. It generates parameter value $p=0.187$ and then generates an outcome using the binomial distribution with parameters $n=8$ and $p=0.187$. The generated outcome is $4$. Recall that observed data is $5$. According to the rejection sampling algorithm, does computer keep(accept) the parameter value $p=0.187$ or ignore(reject) it?

        a) accept    
        
        b) reject
        
        
3.   On the second day of testing Version A and Version B it was observed that $10$ out of $30$ users signed up after looking at Version A while $158$ out of $200$ users signed up after looking at Version B. Adjust the code above to satisfy this new information and look at the histograms. Which version results in more sign-ups?

        a) Version A
        
        b) Version B
          



## **Bayesian Inference**

---
<img src="media/chapter1/bayesiansteps_intro-01.jpg" width="300" align="right">

<a id='bayes'>*Bayesian inference*</a> is a means to mathematically  calculate what is represented graphically in the generative model. Using probability theory, one can assign probabilities to <a id='param'>*parameters*</a> to arrive at an answer. This is exactly what we did above with rejection sampling.

<br>


**Application of the Bayesian approach can be summarized in five general steps:**

**(1)** Establish what data are relevant to answering your problem. Determine how these data relate to/influence one another, what are your predictors and what are you predicting (i.e. your model parameters)?

**(2)** Describe your problem with a generative model. It is useful to start with a graphical approach and then fill in with a mathematical description (i.e. probability distributions for your data).

**(3)** Choose an appropriate distribution for your data, parameters, and problem. In the <a href='#example1.2'>A/B testing example</a> above, we used the binomial distribution.

**(4)** Run the Bayesian inference. 

**(5)** Check your model results with the previously observed data to see if the model is a valid representation of your problem. We did this as step 3 in the rejection sampling algorithm.




This module will describe how these steps can be achieved through probabilistic programming in Python and the PyMC3 Python package.



## PyMC3 as the probabilistic programming language


---


So far we have an idea about how to construct a generative model and how to use observed data to estimate the parameters, but how is this actually implemented? We will use PyMC3, which is a Python package for constructing generative models.

[PyMC3's documentation website](https://docs.pymc.io/) provides more information about [installation](http://docs.pymc.io/notebooks/getting_started#Installation) of the package.

Here is how you would install PyMC3 using [Anaconda](https://www.anaconda.com/):

```
>>> conda install pymc3
```

Let's solve the A/B testing problem using PyMC3. First, we need to import the PyMC3 package.

In [None]:
import pymc3 as pm

Next we need to gather the information we have about our model:

*   data is generated by the binomial distribution (Version A: parameter `p_A`, number of people $8$; Version B: parameter `p_B`, number of people $10$)
*   observed data (Version A: 5 out of 8 signed up; Version B: 2 out of 10 signed up)
*   unknown parameters are `p_A` and `p_B` that at first are chosen randomly between $0$ and $1$

Let's feed all the information into PyMC3 model.

In PyMC3 all of the variables are usually handled inside the model within the context of Model object. Any variables that are created within the Model context are assigned to the model. We can test our variables outside of the model context, but to add more variables to the model, we need to work within the model context.

Let's start with introducing unknown parameters `p_A` and `p_B`, that are generated randomly between $0$ and $1$.

<b> Quick note about PyMC3 vs. NumPy </b>: PyMC3 is used to construct models for analysis and generate results. We use PyMC3 when generating data within a probablistic model, whereas NumPy is better suited for generating data outside of a model, as we did in previous examples. 

In [None]:
with pm.Model() as AB_testing_model:
  #this means that p_A and p_B are chosen randomly between 0 and 1
  p_A = pm.Uniform('p_A', 0, 1)
  p_B = pm.Uniform('p_B', 0, 1)

Now we can add information about how the data is generated using the binomial distribution. Since the observed data is also generated by the same binomial distribution, we can add it to the model as well using argument `observed`.

In [None]:
with AB_testing_model:
  test_A = pm.Binomial('test_A', n = 8, p = p_A, observed = 5)
  test_B = pm.Binomial('test_B', n = 10, p = p_B, observed = 2)

After feeding the information to the model, we can run Bayesian Inference. PyMC3 supports various algorithms. The following code runs as Bayesian inference algorithm, which is explained later in Chapter 3.


In [None]:
with AB_testing_model:
  step = pm.Metropolis()
  trace = pm.sample(20000, step=step)
  burned_trace = trace[1000:]

### Quiz 1.4


---


1.   Which of the following options are correct?
             
        a)
```
import pymc3 as pm
with pm.Model() as AB_testing_model:
  p_A=pm.Uniform('p_A',0,1)
  p_B=pm.Uniform('p_B',0,1)
  test_A=pm.Binomial('test_A',n=8,p=p_A, observed=5)
  test_B=pm.Binomial('test_B',n=10,p=p_B, observed=2)
  step = pm.Metropolis()
  trace = pm.sample(20000, step=step)
  burned_trace=trace[1000:]
```
        
        b)

```
import pymc3 as pm
with pm.Model() as AB_testing_model:
  p_A=pm.Uniform('p_A',0,1)
  p_B=pm.Uniform('p_B',0,1)
test_A=pm.Binomial('test_A',n=8,p=p_A, observed=5)
test_B=pm.Binomial('test_B',n=10,p=p_B, observed=2)
step = pm.Metropolis()
trace = pm.sample(20000, step=step)
burned_trace=trace[1000:]
```
        c)  
          
```
import pymc3 as pm
with pm.Model() as AB_testing_model:
  p_A=pm.Uniform('p_A',0,1)
  p_B=pm.Uniform('p_B',0,1)
  
with AB_testing_model:
  test_A=pm.Binomial('test_A',n=8,p=p_A, observed=5)
  test_B=pm.Binomial('test_B',n=10,p=p_B, observed=2)
  
with model:
  step = pm.Metropolis()
  trace = pm.sample(20000, step=step)
  burned_trace=trace[1000:]
```
        d)  
          
```
import pymc3 as pm
with pm.Model() as AB_testing_model:
  p_A=pm.Uniform('p_A',0,1)
  p_B=pm.Uniform('p_B',0,1)
  
with AB_testing_model:
  test_A=pm.Binomial('test_A',n=8,p=p_A, observed=5)
  test_B=pm.Binomial('test_B',n=10,p=p_B, observed=2)
  step = pm.Metropolis()
  trace = pm.sample(20000, step=step)
  burned_trace=trace[1000:]
```


We collect samples of parameters `p_A`, `p_B` and visualize them using histograms. 

In [None]:
p_A_samples = burned_trace["p_A"]
p_B_samples = burned_trace["p_B"]

In [None]:
import matplotlib.pyplot as plt
from IPython.core.pylabtools import figsize
%matplotlib inline
figsize(12.5, 10)

#histogram of posteriors

ax = plt.subplot(211)

plt.xlim(0, 1)
plt.hist(p_A_samples, histtype='stepfilled', bins=25, alpha=0.85,
         label="posterior of $p_A$", color="#A60628", normed=False)
plt.legend(loc="upper right")
plt.title("Posterior distributions of $p_A$, $p_B$, and bins")

ax = plt.subplot(212)

plt.xlim(0, 1)
plt.hist(p_B_samples, histtype='stepfilled', bins=25, alpha=0.85,
         label="posterior of $p_B$", color="#467821", normed=False)
plt.legend(loc="upper right")


Histograms show how parameters `p_A` and `p_B` behave. Upon observation, Version A gives larger values of `p_A` than Version B of `p_B`. Now we are able to confidently determine that Version A is more likely to result in a greater number of people signing up for emails compared to Version B.

Inference can be modeled with statistics, and with modern programming languages and computing power, allows for a wider audience to address more complex problems using these approaches. Simple algorithms, such as rejection sampling, can work for some problems, such as those that use the binomial distribution; however oftentimes different distributions and more complicated algorithms must be used. In the following chapters, we will introduce how to solve more complicated problems with probabilistic programming.

## <u>Definitions</u>

<a href='#prob prog'>**Probabilistic programming**</a>: a technique that combines probability theory, statistical inference, and computer programming to model a system and solve a problem. 

<a href='#prob'>**Probability**</a>: the odds of an event occuring, which is expressed as the number of sucesses over the total number of objects or measurements.

<a href='#gen model'>**Generative model**</a>: a model that describes the dependencies between data with statistics and probability. It is generative in that the model generates new data from the probability associated with previously known data.

<a href='#state'>**State**</a>: Known events or outcomes. In a graphical generative model, these are illustrated as boxes.

<a href='#dependency'>**Dependency**</a>: demonstrates how independent states are related, or dependent, on one another. In a graphical generative model, these are illustrated as one-sided arrows.

<a href='#bin dist'>**Binomial distribution**</a>: describes random experiment in which $n$ objects can have positive or negative response, and the probability of positive response for each object is $p$. Depends on two parameters: $n$ and $p$.

<a href='#hist'>**Histogram**</a>: a type of graph that shows the number of counts (or frequency of occurence) of numerical data in bins.

<a href='#rej'>**Rejection sampling**</a>: an algorithm for Bayesian Inference, that compares generated data to observed data and, if they coincide, remembers information about priors that led to such data.

<a href='#bayes'>**Bayesian inference**</a>: a means to mathematically calculate what is represented graphically in the generative model. 

<a href='#param'>**Parameter**</a>: describes a probability distribution numerically.