In [None]:
library(ggplot2)
library(dplyr)

**(1)** Simulate $10^6$ samples of the binomial random variable $$X\sim\text{Bin}(n = 12, p = 0.2)$$ and plot a histogram of the simulated values (also known as realizations of the random variable $X$)

In [None]:
n = 12
p = 0.2
samplesize = 1e06
simulatedData = rbinom(samplesize, n, p)
dfBinom = as.data.frame(simulatedData)
colnames(dfBinom) = c('Output')
p1 = ggplot(data = dfBinom, aes(x = factor(Output))) +
  geom_bar(stat = 'count', width = 0.7, fill = 'steelblue')+
  theme(axis.text = element_text(size = 12),
  axis.text.x = element_text(size = 14),
  axis.text.y = element_text(size = 14),
  axis.title =element_text(size = 14, face = "bold")) +
  labs(x = 'j',
       y = 'Count',
       title = sprintf('Mean = %f, Median = %f, Variance = %f', mean(simulatedData), median(simulatedData), var(simulatedData)))
p1 

**(2)** Plot the Probability Mass Function (PMF) of the binomial random variable $$X\sim\text{Bin}(n=12, p =0.2).$$

$$\underbrace{P_X}_\text{function name}\left(\underbrace{j}_{\text{input}}\right) = \underbrace{P(X=j)}_\text{output}.$$


In [None]:
n = 12
p = 0.2
j = c(0:n)
dfBinom = as.data.frame(cbind(j, dbinom(j, n, p)))
colnames(dfBinom) = c('j', 'Prob')
p2 = ggplot(data = dfBinom, aes(x = j, y = Prob)) +
  geom_point(size = 5) +
  scale_x_continuous(breaks = seq(0, n, by = 1)) +
  theme(axis.text = element_text(size = 12),
  axis.text.x = element_text(size = 14),
  axis.text.y = element_text(size = 14),
  axis.title = element_text(size = 14, face = "bold"))
p2

**(3)** Plot the Cumulative Distribution Function (CDF) of the binomial random variable $$X\sim\text{Bin}(n=12, p =0.2).$$

$$\underbrace{F_X}_\text{function name}\left(\underbrace{j}_{\text{input}}\right) = \underbrace{P(X\leq j)}_\text{output}.$$


In [None]:
# Plot the Cumuluative Distribution Function (CDF) of a
# binomial random variable with parameters n=12 and p=0.2
# assign the probabilities
n = 12
p = 0.2
j = 0:n
df = as.data.frame(cbind(j, pbinom(j, n, p)))
colnames(df) = c('j', 'CProb')
p3 = ggplot(data = df, aes(x = j, y = CProb)) +
  geom_point(size = 5) +
  scale_x_continuous(breaks = seq(0, n, by = 1)) +
  theme(axis.text = element_text(size = 12),
  axis.text.x = element_text(size = 14),
  axis.text.y = element_text(size = 14),
  axis.title =element_text(size = 14, face = "bold"))
p3

**(4)** Simulate $10^6$ samples of the negative binomial random variable $$X\sim\text{NegBin}(r = 3, p = 0.2)$$ and plot a histogram of the simulated values (also known as realizations of the random variable $X$)

**(5)** Plot the Probability Mass Function (PMF) of the negative binomial random variable $$X\sim\text{NegBin}(r=3, p =0.2).$$

$$\underbrace{P_X}_\text{function name}\left(\underbrace{j}_{\text{input}}\right) = \underbrace{P(X=j)}_\text{output}.$$


**(6)** Plot the Cumulative Distribution Function (CDF) of the negative binomial random variable $$X\sim\text{NegBin}(r=3, p =0.2).$$

$$\underbrace{F_X}_\text{function name}\left(\underbrace{j}_{\text{input}}\right) = \underbrace{P(X\leq j)}_\text{output}.$$

**(7)** An oil company has a $20\%$ chance of striking oil when drilling a well. What is the probability the company drills $7$ wells to strike oil $3$ times?

In [None]:
# Theoretical probability
r = ?
p = ?
j = ?
?(?, ?, ?)

# Monte-Carlo approximation
samplesize = 1e06
mean(?)

# PMF highlighting the value of interest
j = c(?:?)
dfnBinom = as.data.frame(cbind(j, dnbinom(j-r, r, p)))
colnames(dfnBinom) = c('j', 'Prob')
dfnBinom = dfnBinom %>% mutate(Failures = ifelse(j == ?, ?, "other"))
p5 = ggplot(data = dfnBinom, aes(x = factor(j), y = Prob, fill = Failures)) +
  geom_col() +
  geom_text(
    aes(label = round(Prob,2), y = Prob + 0.001),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  theme(axis.text = element_text(size = 12),
  axis.text.x = element_text(size = 14),
  axis.text.y = element_text(size = 14),
  axis.title =element_text(size = 14, face = "bold"))
p5

**(8)** A person conducting telephone surveys must get 4 more completed surveys before their job is finished. On each randomly dialed number, there is a 90% chance of the participant rejecting the call. What is the probability that the person will finish their job at the 10th call?

**(9)** A deck of cards contains 20 cards: 6 red cards and 14 black cards. 5 cards are drawn randomly *without replacement*. What is the probability that exactly 4 red cards are drawn?

$$X\sim\text{HypGeom}(n_s = 6, n_f = 14, n = 5)$$

$$P(X=4) = \frac{n_sC_4\times n_fC_{5-4}}{(n_s+n_f)C_5}$$

In [None]:
n = 5
ns = 6
nf = 14
dhyper(4, ns, nf, n)

**(10)** A deck of cards contains 20 cards: 6 red cards and 14 black cards. 5 cards are drawn randomly *with replacement*. What is the probability that exactly 4 red cards are drawn?

$$X\sim\text{Bin}(n = 5, p = 6/20)$$

In [None]:
n = 5
ns = 6
nf = 14
p = ns/(ns+nf)
dbinom(4, n, p)

**(11)** A small voting district has 1010 female voters and 950 male voters. A random sample of 10 voters is drawn. What is the probability exactly 5 of the voters will be female?

$$X\sim\text{HypGeom}(n_s=1010, n_f = 950, n = 10)$$

In [None]:
dhyper(500, 1010, 950, 1000) # P(X=5)
dbinom(500, 1000, 1010/(950+1010)) # P(X=5)

**(12)** There are 40,000 gates on an integrated circuit (IC) chip. If the probability of a gate being defective is 1/100,000, independently of all other gates, what is the probability that exactly 10 gates are defective?

In [None]:
n = 40000
p = 1/100000 # (ns=1, nf = 100000-1) or (ns = 2, nf = 2(100000-1))
dbinom(10, n, p)
dhyper(10, 1, 100000-1, 40000)

**(13)** Suppose that a batch of 100 items contains 6 that are defective and 94 that are not defective. If a random sample of 10 items is drawn from the batch, what is the probability of finding more than 2 defective items?

**(14)** A purchaser of transistors buys them in lots of 20. It is his policy to randomly inspect 4 components from a lot and to accept the lot only if all 4 are non-defective. If each component in a lot is, independently, defective with probability 0.1, what proportion of lots is rejected?

**(15)** It is known that diskettes produced by a certain company will be defective with probability .01, independently of each other. The company sells the diskettes in packages of size 10 and offers a money-back guarantee that at most 1 of the 10 diskettes in the package will be defective. The guarantee is that the customer can return the entire package of diskettes if he or she finds more than one defective diskette in it. If someone buys 3 packages, what is the probability that he or she will return exactly 1 of them?

**(16)** In a forest that has 100 tigers, 20 are captured, tagged and released. A few weeks later, a sample of 10 tigers from the forest is captured. What is the probability that at least 5 of those captured tigers are tagged?

**(17)** At an airport, it is know that approximately 2 out of 10 passengers have a metallic object. If left undetected at the manual security check at the airport entrance,  such a metallic object will raise an alarm when the passenger walks through an automated screening machine. It is considered a security breach when the alarm gets raised 20 times a day. What is the probability of a security breach on a particular day when the 100th passenger walks through the automated screening machine?