<a href="https://colab.research.google.com/github/kevindavisross/stat305-f20/blob/master/HW03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stat 305 HW3, Part II

**TYPE YOUR NAME(S) HERE.** I encourage you to work in a team of up to 3 people; submit one per team.

## Introduction

This Colab/Jupyter notebook provides a template for you to fill in. **Read the notebook from start to finish, completing the parts as indicated.**  To run a cell, make sure the cell is highlighted by clicking on it, then press SHIFT + ENTER on your keyboard.  (Alternatively, you can click the "play" button.)

Some code has already been provided.  Be sure to run this code and view the output to understand what it does.  In other parts, you will need to provide the code yourself; use the examples in the textbook as a guide. 

You will use the Python package [Symbulate](https://github.com/dlsun/symbulate).  A few specific links to the documentation are provided below.  **You should use only use Symbulate commands, not general Python code.**

Be sure to run the following lines first.

In [None]:
# If in Colab, uncomment the following line and run this cell
!pip install symbulate



In [None]:
from symbulate import *
%matplotlib inline

## A few notes about Colab/Jupyter notebooks

- Each cell contains either Code or text (Markdown)
- You can run an individual cell by clicking on it and holding SHIFT-ENTER or by clicking the play button.  Any output from the cell will be displayed underneath it.
- Cells are evaluated in the order in which you run them.  After a cell is run it should have `Out[xx]` to the left, indicating the order in which the cells were run.
- When you select "Run all", cells will be evaluated in sequence starting with the first cell at the top of the notebook.
- You can use objects evaluted in one cell in others.  Just keep in mind that cells are evaluated in order.  So if you call something `x` in one cell, but redefine `x` as something else in another cell, it's essential that you evaluate the cells in the proper order.
- Any plots should be displayed automatically (as long as you run the `%matplotlib inline` command).
- While all the code in a cell is evaluated, only the output of the last line of code in a cell is displayed automatically (aside from plots).  If you want to display multiple pieces of output, you can either use `print` or just add cells and put each line in a separate cell. 
- You can add cells with Insert or with the + sign on the menu,


In this HW you will:

- Define and simulate values of random variables
- Display simulated values in plots to visualize distributions, in particular, **joint distributions**
- Take a closer look at covariance and correlation, numbers which measure the association between two random variables
- Investigate transformations of random variables

Hints about thinking about/sketching distributions:

1. Figure out an appropriate type of plot to sketch (Is the variable discrete or continuous?  One variable or two? Joint or marginal distribution?)
1. Consider just one possible example value.  Or a few possible values.  Remember to consider *pairs* of values for a joint distribution.
1. Determine all possible values and label the plot axes. Remember to consider possible *pairs* of values for a joint distribution.
1. Then start to think about which values will be more/less likely than others.  This step is the hardest, but just doing the first three steps gets you a long way toward the answer.

## 1) Two continuous random variables

Spin the Uniform(0, 1) spinner twice and let $U_1$ be the result of the first spin and $U_2$ the result of the second.  Define $X=-\log(U_1)$ and $Y=(2U_2-1)X$.

**Note:** This scenario is also on the written homework.  You should do that part first and sketch the plots by hand first before doing this part.

Define an appropriate probability space and RVs.  (Hint: remember `** 2` for "spin twice".)

In [None]:
# Enter your Symbulate code here

Simulate many $(X, Y)$ pairs, store the results, and summarize the simulated values in a plot.  Note: you should definitely try a scatterplot first for this one, because the scale of values in this particular problem obscures the histogram and density plots.  So try scatter first, and then you can try other plots.

In [None]:
# Enter your Symbulate code here

Create a plot displaying both joint and marginal distributions.  Hint: add `'marginal'` to whatever plot you created above, e.g., `.plot(['scatter', 'marginal'])` or `.plot(['hist', 'marginal'])`

In [None]:
# Enter your Symbulate code here

Approximate $Cov(X, Y)$ and $Corr(X, Y)$.

In [None]:
# Enter your Symbulate code here

You already tried sketching the plots in HW.  Does the simulation match with your sketches?  Write a few sentences describing your simulation results.  Consider:
    
- Can you explain why the plots look the way they do?
- Can you suggest an expression for the joint pdf? Would you say that the joint distribution is uniform over the range of possible values?
- Can you suggest expressions for the marginal pdfs?
- Can you explain the value of correlation?

**TYPE YOUR RESPONSE HERE.**

## 2) Standardization (Mostly just read this and run; this will preview some things we'll discuss in week 4.)


Assume that SAT Math ($X$) and Reading ($Y$) scores follow a Bivariate Normal distribution with mean 500 for each score, SD 100 for each score, and a correlation of 0.8.

In [None]:
P = BivariateNormal(mean1 = 500, sd1 = 100, mean2 = 500, sd2 = 100, corr = 0.8)
X, Y = RV(P)

Simulate many $(X, Y)$ pairs, summarize in a plot, and use the simulated results to estimate $\text{Cov}(X, Y)$.

In [None]:
xy = (X & Y).sim(10000)
xy.plot(['hist', 'marginal'])

In [None]:
xy.cov(), xy.corr()

`xy` stores (x, y) pairs.  The following just picks off the x values and y values separately.

In [None]:
x = xy[0]
y = xy[1]

We can compute $SD(X), SD(Y)$ and see that $Corr(X, Y) = \frac{Cov(X, Y)}{SD(X)SD(Y)}$.

In [None]:
x.sd(), y.sd()

In [None]:
xy.cov() / (x.sd() * y.sd())

A standardized RV is defined as $\frac{X-E(X)}{SD(X)}$.

**Thought questions:** Will standardizing a RV change the *shape* of its distribution?  What will the expected value and SD of a standardized RV be?

A standardized RV has expected 0 and standard deviation 1.

In [None]:
zx = (x - x.mean()) / x.sd()
zx.plot()
zx.mean(), zx.sd()

In [None]:
zy = (y - y.mean()) / y.sd()
zy.plot()
zy.mean(), zy.sd()

Correlation is the covariance between the standardized versions of the RVs.

$$
Corr(X, Y) = Cov\left(\frac{X-E(X)}{SD(X)}, \frac{Y-E(Y)}{SD(Y)}\right)
$$

In [None]:
(zx * zy).mean() - (zx.mean() * zy.mean())

You can also use `.standardize`.

In [None]:
xy.standardize().plot(['hist', 'marginal'])

In [None]:
xy.standardize().cov(), xy.corr()

# 3) Two more continuous random variables

Spin the Uniform(0, 1) spinner twice and let $U_1$ be the result of the first spin and $U_2$ the result of the second.  Define $X=-\log(U_1)$ and $Y=X + U_2$.

First, on your own
- Sketch a plot of the joint distribution of $X$ and $Y$ and determine if the covariance will be positive, negative, or zero.
- Sketch the marginal distributions.


Define appropriate random variables and use simulation to approximate the joint distribution of $X $and $Y$ and display it in a plot.

In [None]:
# Enter your Symbulate code here

Approximate the covariance and correlation of $X$ and $Y$.

In [None]:
# Enter your Symbulate code here

Use simulation to approximate the marginal distribution of $Y$ and its mean and standard deviation, and plot the distribution.

In [None]:
# Enter your Symbulate code here

 Write a few sentences describing your simulation results.  Consider:
    
- Can you explain why the plots look the way they do?
- Can you suggest an expression for the joint pdf? Would you say that the joint distribution is uniform over the range of possible values?
- Can you suggest expressions for the marginal pdfs?
- Can you explain the value of correlation?

**TYPE YOUR RESPONSE HERE.**

## 4)

Reflection: Write a paragraph, or some bullet points, of what you learned from this lab

**TYPE YOUR RESPONSE HERE.**

## Submission Instructions

- After you have completed the notebook, select **Runtime > Run all**
- After the notebook finishes rerunning check to make sure that you have no errors and everything runs properly.  Fix any problems and redo this step until it works.
    - Careful: there is a bug and sometimes 2-d histograms throw errors.  The plot works fine, it just triggers an error for some reason.  Unfortunately, that might stop your notebook from running.  So if you use 2-d histograms, pay special attention to this step.
- Make sure you typed the names of any partners at the top of the notebook where it says "Type your names here".
- Click Share in the top right and share with stat305cp@gmail.com 
- Save a PDF version: File > Print > Save as PDF
- Download the notebook: File > Download .ipynb
- Submit the notebook and PDF in Canvas.  Remember, only one submission per team.  (Either partner can submit; put the names of the partners in the comments.)