<table align="left" style="border-style: hidden" class="table"> <tr><td class="col-md-2"><img style="float" src="../logo.png" alt="Data 88S Logo" style="width: 120px;"/></td><td><div align="left"><h3 style="margin-top: 0;">Probability for Data Science</h3><h4 style="margin-top: 20px;">UC Berkeley, Spring 2024</h4><p>Shobhana Stoyanov</p>CC BY-NC 4.0</div></td></tr></table><!-- not in pdf -->

This content is protected and may not be shared, uploaded, or distributed.

In [None]:
from datascience import *
import numpy as np
from scipy import stats

# Homework 3 #

### Instructions

In all Data 88S homework, you will be providing written answers, not code. We encourage you to hand-write your answers. It is your responsibility to ensure that the homework is submitted completely and properly to Gradescope. **Make sure to assign each page of your pdf to the correct question.** Refer to the bottom of the notebook for submission instructions.

In some of the problems below, you have to do a numerical calculation in a code cell. All the libraries you might need have been imported for you. Run the import cell at the top before you start the homework.

**Note about terminology:** Once and for all: If a question says "$k$ successes" it means exactly $k$ successes.

## 1. Balls in Boxes ##

Students in probability classes sometimes wonder by probabilists seem to have a peculiar fascination with balls being thrown at random into boxes or drawn at random from urns. It's not just because probabilists are weird. It's because the image of balls and boxes is one unified way of thinking about repeated trials in diverse settings.

For example, balls thrown at random into boxes independently of each other is a way of visualizing independent repeated trials that have equally likely outcomes. 

- Tossing a coin 10 times and keeping track of heads and tails is like throwing 10 balls independently at random into two boxes labeled H and T. The number of heads is the number of balls that fall in Box H.
- Rolling a die 15 times and keeping track of the faces that appear is like throwing 15 balls independently at random into six boxes labeled $1, 2, 3, 4, 5, 6$.
- Sampling $n$ times at random with replacement from a population of $N$ individuals is like throwing $n$ balls independently at random into $N$ boxes labeled $1, 2, \ldots, N$. 


**a)** A random number generator that draws digits at random with replacement from $\{0, 1, 2, 3, 4, 5, 6, 7, 8, 9\}$ is run three times. Find the chance that the digits drawn can form the number 510, by rearrangement if necessary. 

Start by filling in the blanks in the following sentence; the first two blanks should be filled with integers. "This experiment is like throwing $\underline{~~~~~~~~~~}$ balls independently at random into $\underline{~~~~~~~~~~}$ boxes labeled $\underline{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}$." Then imagine where the balls must land to make the event occur. 

**b)** The mayor of Bogotá, Colombia, implemented a traffic regulation scheme called [Pico y Placa](https://en.wikipedia.org/wiki/Pico_y_placa) in 1998 to help decrease traffic congestion during rush hour in the city. The original scheme was a bit complicated for our purposes, so let's take a simplified model of this idea. Suppose that all cars in Bogotá only have license plates that end in numbers 1 through 7, and cars with license plate numbers ending in:
- 1 can only drive on Mondays
- 2 can only drive on Tuesdays
- 3 can only drive on Wednesdays
- 4 can only drive on Thursdays
- 5 can only drive on Fridays
- 6 can only drive on Saturdays
- 7 can only drive on Sundays.

Also assume that the last digit of each car's license plate number is assigned by the government at random, independently from all other cars.
Find the chance that five cars picked at random in the city are all allowed to drive on different days.

Start by filling in the blanks in the following sentence; the first two blanks should be filled with integers. "This experiment is like throwing $\underline{~~~~~~~~~~}$ balls independently at random into $\underline{~~~~~~~~~~}$ boxes labeled $\underline{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}$." Then imagine where the balls must land to make the event occur. 

Then explicitly find the chance.

**c)** A roulette wheel is split into 38 equally-sized arcs. 18 are red, 18 are black, and 2 are green. 

The wheel is spun three times, where the results of each spin are independent of each other. Using the scheme developed in parts **a)** and **b)**, find the chance that the wheel never lands on black.

**d)** In part **c)**, are the events "the wheel never lands on black" and "the wheel never lands on red" independent? Explain your answer.


\newpage

## 2. Faces of a Tetrahedron ##

A tetrahedron is a regular four-sided figure, like a four-sided die. Assume that when I roll a tetrahedron, each of the four sides is equally likely to appear, independent of all other rolls. 

Suppose the sides of the tetrahedron are labeled 1, 2, 3, and 4. Define two random variables:

- $X$ is the label that I see on the first roll
- $D$ is the number of times I roll till I see a label I have seen before

For example, if the first few labels are 3, 4, 1, 3, 1, 2, 1 then $X = 3$ and $D = 4$.

**a)** What is the distribution of $X$?

**b)** What are the possible values of $D$?

**c)** Find the distribution of $D$. Remember to explain your answer, and write arithmetic expressions for the probabilities involved. 

**d)** Find the decimal values of the probabilities in Part **c**. What should their sum be? Check this numerically using a calculator or the code cell below.

**e)** What is the most likely value of $D$, and what is its probability?

**f)** Find $P(X = 2, D = 3)$. Write it as an arithmetic expression and find its decimal value.

In [None]:
# Calculations in Exercise 2

...

\newpage

## 3. Counting Cards ##

A 5-card poker hand is dealt from a standard deck of cards (with 4 suits, and 13 kinds for each suit (2, 3, 4, 5, 6, 7, 8 , 9, 10, J, Q, K, A). How many distinct ways are there to form a hand with two pairs? 

A hand consists of five cards. A hand with two pairs has 2 cards of one kind, 2 cards of another kind, and a fifth card of a third kind. An example of a hand with two pairs is a King of $\heartsuit$, a King of $\spadesuit$, a 5 of $\clubsuit$, a 5 of $\heartsuit$, and a Jack of $\heartsuit$.

\newpage

## 4. Detecting Breast Cancer ##

A 2011 [article](https://ebm.bmj.com/content/16/6/163) in the British Medical Journal attempts to elucdiate Bayes' Rule for the medical profession. It's well worth reading and has some illuminating graphics. In this exercise you will confirm a result stated in the article. Useful terminology:

- The *sensitivity* of a test for a medical condition is the proportion of correctly diagnosed patients among those who have the condition.
- The *specificity* of the test is the proportion of correctly diagnosed patients among those who do not have the condition.

Read those definitions a couple of times and note that a good test should have high values of both sensitivity and specificity.

In the section Special Cases, the authors consider a 45-year-old woman who has a 1% chance of getting breast cancer in the subsequent five years. The article says, "The sensitivity of routine screening mammography ranges from 71% to 96% and the specificity ranges from 94% to 97%. Using values of 80% for sensitivity and 96% for specificity, a positive test increases the probability to 17%."

For this problem, show your work in finding each probability and box the final decimal answer.

**a)** Use the same values as in the authors' calculation and confirm their result that "a positive test increases the probability to 17%."

**b)** Find all of the following; percents are fine too:

(i) the chance that the woman won't get breast cancer in the subsequent five years

(ii) the chance that the woman won't get breast cancer in the subsequent five years, if her test result is negative

(iii) the chance that the woman won't get breast cancer in the subsequent five years, if her test result is positive

(iv) the chance that the woman's test result is negative


**c)** Suppose a woman went to get tested because she believed she had a 10% chance of getting breast cancer in the subsequent five years. What would be her probability of getting the disease, based on a positive test result? Use a sensitivity value of 80% and a specificity value of 96%.

## Submission Instructions ##

Please follow the directions below to properly submit both portions.

### Written Work ###
*  Scan all the pages into a PDF. You can use any scanner or a phone using applications such as CamScanner. Please **DO NOT** simply take pictures using your phone. 
* Please start a new page for each question. If you have already written multiple questions on the same page, you can crop the image in CamScanner or fold your page over (the old-fashioned way). This helps expedite grading.
* It is your responsibility to check that all the work on all the scanned pages is legible.
    
### Submitting ###
* Submit the assignment to Homework 3 on Gradescope. 
* **Make sure to assign each page of your pdf to the correct question.**
* **It is your responsibility to verify that all of your work shows up in your final PDF submission.**

If you have questions about scanning or uploading your work, please post a follow-up to the Ed thread on [Gradescope Assignment Submission](https://edstem.org/us/courses/53099/discussion/4133183).