# Assignment 08

## Due: See Date in Moodle

To receive a **full credit** for this assignment, you must complete all exercises.

## This Week's Assignment

In this week's assignment, you will 

- set up and run simulations.

- write user-defined R functions

### Notes

## Guidelines

- Follow good programming practices by using descriptive variable names, maintaining appropriate spacing for readability, and adding comments to clarify your code.

- Ensure written responses use correct spelling, complete sentences, and proper grammar.

**Name:**

**Section:**

**Date:**

Let's get started!

## Simulation

A simulation using programming is a way to create a virtual model of a real-world process, system, or experiment using code. The goal is to replicate how something would behave over time by mimicking its processes, often under varying conditions.

The key elements of writing code to perform a simulation include:

- replicating outcomes from a real-life scenario.

- running multiple trials to observe a range of potential outcomes.

- incorporating randomness to capture real-life unpredictability.

- generating data for analysis to gain insights or make predictions.


### Why use simulations?

Simulations allow for experimentation without real-world consequences, making them especially valuable for testing scenarios that would be costly, dangerous, or impractical to conduct physically. They can forecast outcomes under various conditions; for example, businesses might simulate market conditions to predict profits, while epidemiologists use simulations to model disease spread under different interventions.

## Coin Flip

Simulating a coin flip involves using code to mimic the randomness of flipping a coin, where each flip has an equal chance of landing on heads or tails. By running multiple simulated flips, we can observe the distribution of outcomes and calculate the proportion of heads or tails over time. This type of simple simulation is useful for exploring probability concepts, randomness, and how outcomes converge to expected values with repeated trials.

**Question 1.** Create a vector named `coin` representing the outcomes `'Heads'` and `'Tails'`.

In [None]:
...

**Question 2.** Write code to randomly select a single sample from the `coin` vector.

In [None]:
...

**Question 3.** Write a `for` loop to randomly select a single sample from the `coin` vector, repeating this process 10 times, and printing the result.

In [None]:
...

### Setting a seed for reproducibility.

Setting a seed in programming means initializing the random number generator to start from a specific point. This ensures that any random operations (like sampling or generating random numbers) produce the same results each time the code is run. It's done to make analyses reproducible, so others (or you) can rerun the code and get identical results, which is especially important in research, testing, and debugging.

In [None]:
## Set seed value
s <- ...

Notice that by using the seed we get the same results everytime.

In [None]:
set.seed(s)

for (i in 1:10){
    print(sample(coin, 1))
}

What is the expected outcome from flipping a fair coin?

### User-defined Functions in R

In R, user-defined functions allow you to create reusable blocks of code for specific tasks. You define a function using the `function` keyword, specifying any inputs (parameters) the function needs to operate. 

Once created, you can call the function multiple times with different inputs, which is especially helpful for repetitive tasks.

For example,

```
## Define a function to simulate flipping a coin
flip_coin <- function() {
  
  ## Create a vector with "H" and "T"
  coin <- c("H", "T")
  
  ## Randomly sample one outcome from the coin vector
  result <- sample(coin, 1)
  
  ## Return the result
  return(result)
}
```

In [None]:
## Define a function to simulate flipping a coin
flip_coin <- function() {
  
  ## Create a vector with "H" and "T"
  coin <- c("H", "T")
  
  ## Randomly sample one outcome from the coin vector
  result <- sample(coin, 1)
  
  ## Return the result
  return(result)
}

In [None]:
set.seed(s)

trials <- 10 

## Test the function by flipping the coin 10 times
for (i in 1:trials) {
  cat("Trial", i, ":", flip_coin(), "\n")
}

### Visualizing the Results

Our experiment consisted of only 10 trials, so counting the number of "Heads" outcomes is straightforward. However, if we increased this to, say, 100,000 trials, manual counting would become impractical. While there are many ways to analyze results, visualizing them often provides an efficient and effective way to gain valuable insights. Currently, our results are simply printed to the screen and not stored, making it difficult to analyze them further. To address this, we need to update our function to return a data structure that allows us to easily adjust the number of trials and visualize the results effectively.

In [None]:
## Define a function to simulate multiple coin flips
flip_coin <- function(num_flips = 1) {
  
  ## Set seed
  set.seed(s)

  ## Create a vector with "H" and "T"
  coin <- c("H", "T")
  
  ## Initialize an empty vector to store the results
  ## vector("character", num_flips) creates an empty character vector with length n_flips
  ## This stores each outcome (either "H" or "T") for each flip
  results <- vector("character", num_flips)
  
  ## Loop to perform the specified number of flips
  for (i in 1:num_flips) {
    
    ## Randomly sample one outcome from the coin vector and 
    ## store it in the results vector
    results[i] <- sample(coin, 1)
  }
  
  ## Return the vector of results
  return(results)
}

In [None]:
...

In [None]:
...

In [None]:
outcomes <- ...
outcomes

Now that we have a way to store the outcomes we can visualize the results.

In [None]:
## Load the ggplot2 library
...

In [None]:
g <- ggplot(data=as.data.frame(outcomes), aes(x=outcomes))
g + geom_bar()

In [None]:
outcomes <- flip_coin(100)

g <- ggplot(data=as.data.frame(outcomes), aes(x=outcomes))
g + geom_bar()

# Calculate the proportion of "H" (Heads) in the outcomes vector
# sum(outcomes == "H") counts the number of "H" in the vector
# length(outcomes) gives the total number of flips
# Dividing gives the proportion of Heads, and round(..., 2) rounds it to 2 decimal places
prop_heads <- round(sum(outcomes == "H") / length(outcomes), 2)

# Calculate the proportion of "T" (Tails) by subtracting the proportion of Heads from 1
# Since the outcomes are either "H" or "T", prop_tails is simply 1 - prop_heads
prop_tails <- 1 - prop_heads

## PRint the proportion results
print(paste("The proportion of heads is", prop_heads))
print(paste("The proportion of tails is", prop_tails))

**Question 4.** How did the simulated proportion change as we increased the number of trials?

**_CLICK HERE TO ENTER YOUR ANSWER, REPLACING THIS TEXT._**

## Was the announcer correct?

During the 2023-2024 season, the Golden State Warriors faced the Oklahoma City Thunder in a regular season game. With 8.4 seconds left, the Thunder, trailing 115 to 118, were inbounding the ball on their end of the court. The common strategy in such situations, when leading by 3 points, is to foul the player before they attempt a shot, ideally forcing only two free throws instead of a potential three-point attempt. This approach minimizes the risk of the opposing team tying the game.

However, Draymond Green of the Warriors fouled Chet Holmgren, an 86.5% free throw shooter for the Thunder, "in the act of shooting" from beyond the three-point line. This sent Holmgren to the line with 7.7 seconds remaining, giving him the chance to tie the game.

[At 1:45 in the video](https://youtu.be/fsgScuAH4NI?feature=shared), the announcer notes, **_"In case you're wondering, .86 times .86 times .86 is 63 percent."_**

**Question 5.** Why do you think the announcer pointed this out?

**_CLICK HERE TO ENTER YOUR ANSWER, REPLACING THIS TEXT._**

### Simulating a Probability Distribution 

A probability distribution describes how probabilities are assigned to each possible outcome in a random process, such as rolling a die or flipping a coin. It essentially shows the likelihood of each outcome occurring in an experiment or process.

Shooting a free throw is a random process, so we can simulate whether Chet would successfully make all three free throws.

**Question 6.** Since Chet has an 86.5% free throw success rate, we need to decide whether he would make 86 or 87 out of 100 free throws. Choose either 86 or 87 and assign it to the variable `make`.

In [None]:
make <- ...

Run the code below to create a vector with either 86 or 87 `1`s (representing successful free throws) and 14 or 13 `0`s (representing missed free throws).

In [None]:
## Calculate the number of missed free throws by 
## subtracting the made free throws from 100
miss <- 100 - make

## Create a vector named ft (free throws) with the 
## specified number of successes (1s) and misses (0s)

## rep(1, make) generates a sequence of 'make' 1s, representing successful free throws
## rep(0, miss) generates a sequence of 'miss' 0s, representing missed free throws
ft <- c(rep(1, make), rep(0, miss))
ft

We can use the `ft` vector to randomly draw three values, simulating Chet taking three free throws. For each trial, we tally the number of successful shots (makes). We then store the result from each trial in a vector, repeating this process across multiple trials. Finally, we can visualize the distribution of successful shots and calculate the proportion of times Chet made all three free throws.

In [None]:
## Function to simulate Chet shooting three free throws across multiple trials
sim_fts <- function(ft_vector=ft, trials=10) {
    
  ## Initialize a vector to store the number of makes for each trial
  results <- integer(trials)
  
  ## Loop through each trial
  for (i in 1:trials) {
    
    ## Draw three random values from the ft_vector to simulate three free throws
    shots <- sample(ft_vector, 3, replace=TRUE)
    
    ## Count the number of makes (1s) and store it in the results vector
    results[i] <- sum(shots)
  }
  
  ## Return the results vector with the number of makes per trial
  return(results)
}

**Question 7.** Ask ChatGPT to explain the code 

```
sample(ft_vector, 3, replace = TRUE)
```

specifically focusing on the `replace = TRUE` parameter. Then, create a Markdown cell below and enter your prompt and ChatGPT’s explanation.

Run the cell below.

In [None]:
outcomes <- sim_fts()
outcomes

**Question 8.** Interpret the output from the previous code cell within the context of the simulation.

**_CLICK HERE TO ENTER YOUR ANSWER, REPLACING THIS TEXT._**

Run the cell below.

In [None]:
outcomes <- sim_fts(trials=1000)

g <- ggplot(data=as.data.frame(outcomes), aes(x=outcomes))
g + geom_bar()

prop_0 <- round(sum(outcomes == 0) / length(outcomes), 2)
prop_1 <- round(sum(outcomes == 1) / length(outcomes), 2)
prop_2 <- round(sum(outcomes == 2) / length(outcomes), 2)
prop_3 <- 1 - (prop_0 + prop_1 + prop_2)

print(paste("The proportion of trials with 0 made free throws is", prop_0))
print(paste("The proportion of trials with 1 made free throw  is", prop_1))
print(paste("The proportion of trials with 2 made free throws is", prop_2))
print(paste("The proportion of trials with 3 made free throws is", prop_3))

**Question 9.** Interpret the output from the previous code cell within the context of the simulation and the number of trials.

**_CLICK HERE TO ENTER YOUR ANSWER, REPLACING THIS TEXT._**

**Question 10.** Not all NBA coaches support the strategy of fouling at the end of a game to prevent a three-point attempt. Imagine you are part of the data analytics team, presenting this simulation to the coach. Based on the results from this simulation and the outcome of the Golden State Warriors vs. Oklahoma City Thunder game, would you recommend fouling in this situation? Enter your recommendation in the Markdown cell below.

**Note:** Despite Stephen Curry’s extraordinary shooting, Golden State ultimately lost the game in overtime, 136 to 138.

**_CLICK HERE TO ENTER YOUR ANSWER, REPLACING THIS TEXT._**

## Submission

Make sure that all cells in your assignment have been executed to display all output, images, and graphs in the final document.

**Note:** Save the assignment before proceeding to download the file.

After downloading, locate the `.ipynb` file and upload **only** this file to Moodle. The assignment will be automatically submitted to Gradescope for grading.