### Defining Random Variables in =python=                       :code_example:



In [None]:
# Define "tag" the display of which hides the cell
from IPython.display import HTML
from IPython.display import display

# Taken from https://stackoverflow.com/questions/31517194/how-to-hide-one-specific-cell-input-or-output-in-ipython-notebook
def tag(marker):
    s = HTML('''<script>
                  code_show=true; 
                  function code_toggle() {
                     if (code_show){
                        $('div.cell.code_cell.rendered.selected div.input').hide();
                     } else {
                        $('div.cell.code_cell.rendered.selected div.input').show();
                     }
                     code_show = !code_show
                   } 
                  $( document ).ready(code_toggle);
                 </script>
<a href="javascript:code_toggle()">%s</a>.''' % marker)
    return s
display(tag(''))

The distinguishing feature of variables in a field such as the
   reals or the complex plane is their *value*; the distinguishing
   feature of random variables is their *distribution*.  The `python`
   package `scipy.stats` is well-engineered and offers many different
   distributions, and tools to construct others, while the package
   `pacal` is perhaps less well engineered, but defines arithmetic
   operations over random variables which allows for more elegant
   semantics.

There are two main classes of random variables to consider: discrete
and continuous.  The distinction is worth drawing because different
classes are handled differently in many mathematical operations.  

For example, here we instantiate a scalar $\mathsf{x}$:



In [None]:
# import "distributions" from the stats module of scipy
# this is literally a file inside a subfolder of the scipy package
# name it "iid" for our use here
# See list of available distributions here: https://docs.scipy.org/doc/scipy/reference/stats.html
from scipy.stats import distributions as iid

x = iid.norm() # Fix normal distribution

And here we instantiate a discrete random variable which is defined
 over an event space $\{-1,0,1\}$ with corresponding probabilities $(1/3,1/2,1/6)$:



In [None]:
# values of the discrete distribution (in order!)
Omega = (-1,0,1)

# probabilities of the above values (in order!)
Pr = (1/3.,1/2.,1/6.)

s = iid.rv_discrete(values=(Omega,Pr))

Now, here are some things we can do with these random variables.
 First, the continuous  $\mathsf{x}$:



In [None]:
print("E(x) = %6.4f" % x.mean()) # print the mean of our continuous normal dist, x

print("\nSome (central) moments of x:") # \n pushes us to a new line for printing!

# here we print the pair (moment #, value) for the moments 1, 2, 3, and 4 of the distribution
# essentially we loop over the values in [1,2,3,4], and print each one together with the value of that moment
print([(m,x.moment(m)) for m in [1,2,3,4]])

print("\n95%% confidence interval: (%f,%f)\n" % x.interval(0.95)) 
# note that % is used to insert values into a string for printing, so to print the literal "%", 
# we put it 2x to tell python to not try to interpret it as a value

print(x.cdf(0),x.pdf(0)) # print cdf, pdf values

Next, the discrete r.v., \rv{s}:



In [None]:
print("E(s) = %6.4f" % s.mean())

print("\nSome moments of x:")
print([(m,s.moment(m)) for m in [1,2,3,4]])
print("\n95%% confidence interval: (%f,%f)\n" % s.interval(0.95))

# Note! Not pdf, but pmf for discrete rv.
print(s.cdf(0),s.pmf(0))

If we want *realizations* of these random variables:



In [None]:
N=3
print(x.rvs(N)) # N realizations; returned as a np array. no longer random

We&rsquo;d like to be able to combine different random variables, say
by addition, yielding a new random variable.  For instance, we&rsquo;d like
to be able to construct



In [None]:
y = x + s

But this fails.  Can you explain why?  What do you suppose the cdf of
$\rvy$ looks like?  Does it have a density, or does the addition of a
random variable that *lacks* a density ($\rv{s}$) to a random variable
that has one ($\rvx$) mess things up?



In [None]:
# display(tag("+")
# Code to convolve a random variable with a pmf and another having a cdf
# Exploits =scipy.stats= base rv_continuous class.

class ConvolvedContinuousAndDiscrete(iid.rv_continuous):

    """Convolve (add) a continuous rv x and a discrete rv s,
       returning the resulting cdf."""

    # first create a "constructor" that instantiates the class w/ inputs f and s and inherits from another class
    # note that being inside __init__ makes these inputs instance-specific!
    def __init__(self,f,s):
        self.continuous_rv = f
        self.discrete_rv = s
        super(ConvolvedContinuousAndDiscrete, self).__init__(name="ConvolvedContinuousAndDiscrete")
        
    # define other callable characteristics of the class     
    def _cdf(self,z):
        F=0
        s = self.discrete_rv
        x = self.continuous_rv
        
        for k in range(len(s.xk)):
            F = F + x.cdf(z-s.xk[k])*s.pk[k]
        return F

    def _pdf(self,z):
        f=0
        s = self.discrete_rv
        x = self.continuous_rv
        
        for k in range(len(s.xk)):
            f = f + x.pdf(z-s.xk[k])*s.pk[k]
        return f

       
# Create new convolved rv:
y = ConvolvedContinuousAndDiscrete(x,s)

In [None]:
# See how we can now call the methods defined above! Cool!
y.cdf(0)

In [None]:
import plotly.graph_objects as go
import numpy as np

# initiate a list that starts at -4, ends at 4, and has 100 observations 
# (output of np.linspace() is a np array; cast to list type)
X = np.linspace(-4,4,100).tolist()

# plot of the pdf of our new distribution plotted at the points in the vector X
# here we create the vector of points for the pdf of y calculated at each point in X right in the plot structure,
# no need to define it beforehand!
fig = go.Figure(data=go.Scatter(x=X, y=[y.pdf(z) for z in X])) 
fig.show()

#### Exercise



Prove that $\rvy$ is continuous (in the sense that it has a density),
     as suggested by the figure *or* establish that the figure is
     wrong or misleading.



#### Proof



Let $F_x$ denote the cdf of \rvx.  We want to establish that the cdf
of $\rvy$, say $F_y(y)=\Pr(\rvy\leq y)$ is a continuously differentiable
function of $y$.  We use the fact that the distribution of $\rvy$ is a
convolution of $\rvx$ and $\rvs$, so that

\begin{equation} 
\begin{split}
    \Pr(\rvy\leq y) &= \Pr(\rv{s} + \rv{x}\leq y ) \\
                    &= \sum_{s\in\Omega}\Pr(\rvx\leq y-s|s)\pi_s\\
                    &= \sum_{s\in\Omega}F_x(y-s)\pi_s,
\end{split}
\end{equation}

which is continuously differentiable in $y$, as required.

