# Exercise 2: Coding Habits & Functions

1. Summary statistics 4/4
2. T-test function 4/4
3. setting default values 2/2

This assignment will give you some practice writing your own functions and using the coding best practices discussed in the tutorial.

---

## 1. Summary statistics (4 pts)

Write a function that takes a vector of numbers `x` and returns the length, mean, and standard deviation of `x` as a new vector.

In keeping with our best practices, give the function a short but descriptive name, and use snake case if it involves multiple words.

Hint: Vectors are defined in R using the `c()` command.

In [1]:
# define a new function
vector_a <- function(x) {

# make sure it's a number
  if (!is.numeric(x)) {
    stop("Input must be a numeric vector")
  }

# Calculate length, mean, and standard deviation of input vector
# Returns: numeric vector [length, mean, sd]

  c(length(x), mean(x), sd(x))
}

numbers <- c(1, 2, 3, 4, 5)
result <- vector_a(numbers)
names(result) <- c("length", "mean", "sd")
print(result)

  length     mean       sd 
5.000000 3.000000 1.581139 


Calculate the summary statistics of vector `v1`.

*Hint*: Note the "NA" in the vector. This is short for "not available" and is a placeholder for missing values. You'll want to look up the _is.na_ function (and the not operator _!_), as well as options for removing NA's in the functions that you will use to summarize the vector. Look at the documentation for the functions that you will use to see how to work with NA's.

In [2]:
v1  <- c(5, 11, 6, NA, 9)

#v1_stats(v1)

# Basic summary with NA handling
summary(v1)  # summary() shows NA count automatically

# Individual statistics with NA handling
length(v1[!is.na(v1)])   # count of non-NA values
mean(v1, na.rm = TRUE)   # mean ignoring NA
sd(v1, na.rm = TRUE)     # standard deviation ignoring NA
median(v1, na.rm = TRUE) # median ignoring NA
min(v1, na.rm = TRUE)    # minimum ignoring NA
max(v1, na.rm = TRUE)    # maximum ignoring NA

# is.na(v1) identifies which elements are NA
# !is.na(v1) identifies which elements are NOT NA
# na.rm = TRUE tells function to remove NA values before calculating

v1_stats <- function(x) {
    c(length = sum(!is.na(x)),
      mean = mean(x, na.rm = TRUE),
      sd = sd(x, na.rm = TRUE),
      median = median(x, na.rm = TRUE),
      min = min(x, na.rm = TRUE),
      max = max(x, na.rm = TRUE))
}

v1_stats(v1)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   5.00    5.75    7.50    7.75    9.50   11.00       1 

---
## 2. T-test function (4 pts)

The formula for a t-test is:

$$ \frac{m- \mu}{ \frac{s}{\sqrt{n}}} $$

Where m is the sample mean, $\mu$ (mu) is the population mean, s is the standard deviation, and n is the sample size.

Using your function above as a starting point, write a new function `ttest_fun` that compares a vector `x` to a given population mean `mu` and calculates the t-statistic. Keep the coding best practices in mind.

Hint: You will need to add another argument for mu.

In [3]:
# t-test function with the formula t = (m - μ)/(s/√n)
# new function ttest_fun compares a vector x to a given population mean mu and calculates the t-statistic

ttest_fun <- function(x, mu) {
  # Calculate t-statistic for one-sample t-test
  # x: numeric vector of observations
  # mu: population mean to test against

  # Input validation
  if (!is.numeric(x)) stop("x must be a numeric vector")
  if (!is.numeric(mu)) stop("mu must be a numeric value")

  # Remove any NA values
  x <- x[!is.na(x)]

  # Calculate components
  n <- length(x)
  x_bar <- mean(x)
  s <- sd(x)

  # Calculate t-statistic: (sample mean - population mean)/(standard error)
  t_stat <- (x_bar - mu)/(s/sqrt(n))

  return(t_stat)
}

# compare vector x to a given population mean mu and calculate the t-statistic




Use your function to compare the mean of v1 to 10.

In [4]:
ttest_fun <- function(x, mu) {
  # Calculate t-statistic for one-sample t-test
  # x: numeric vector of observations
  # mu: population mean to test against

  # Input validation
  if (!is.numeric(x)) stop("x must be a numeric vector")
  if (!is.numeric(mu)) stop("mu must be a numeric value")

  # Remove any NA values
  x <- x[!is.na(x)]

  # Calculate components
  n <- length(x)
  x_bar <- mean(x)
  s <- sd(x)

  # Calculate t-statistic: (sample mean - population mean)/(standard error)
  t_stat <- (x_bar - mu)/(s/sqrt(n))

  return(t_stat)
}

# compare the mean of v1 to 10
v1  <- c(5, 11, 6, NA, 9)
mu <- 10
result <- ttest_fun(v1, mu)
print(result)


[1] -1.634114


---
## 3. Setting default values (2 pts)

Set the default value of mu to 0. Test your modified function below by supplying only `v2` as an argument.

In [5]:
# Write your modified ttest_fun here

ttest_fun <- function(x, mu) {
  # Calculate t-statistic for one-sample t-test
  # x: numeric vector of observations
  # mu: population mean to test against

  # Input validation
  if (!is.numeric(x)) stop("x must be a numeric vector")
  if (!is.numeric(mu)) stop("mu must be a numeric value")

  # Remove any NA values
  x <- x[!is.na(x)]

  # Calculate components
  n <- length(x)
  x_bar <- mean(x)
  s <- sd(x)

  # Calculate t-statistic: (sample mean - population mean)/(standard error)
  t_stat <- (x_bar - 0)/(s/sqrt(n))

  return(t_stat)
}

# Set the default value of mu to 0. Test your modified function below by supplying only v2 as an argument.
v2 <- c(3, 7, 1, NA, 8, 12)
mu <- 0
result <- ttest_fun(v2, mu)
print(result)


[1] 3.205944


In [6]:
v2 <- c(3, 7, 1, NA, 8, 12)
ttest_fun(v2, mu)

How does your result compare to R's built-in `t.test()` function?

In [7]:
# R standard t.test
t.test(v2)

#comparing R standard and my t.test above which hardcoated mu as 0
print(t.test(v2, mu = mu)$statistic)


	One Sample t-test

data:  v2
t = 3.2059, df = 4, p-value = 0.03272
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  0.8306107 11.5693893
sample estimates:
mean of x 
      6.2 


       t 
3.205944 


When you are finished, save the notebook as Exercise2.ipynb, push it to your class GitHub repository (the one you made for Exercise 1) and send the instructors a link to your notebook via Canvas. You can send messages via Canvas by clicking "Inbox" on the left and then pressing the icon with a pencil inside a square.

**DUE:** 5pm EST, Feb 3, 2025

**IMPORTANT** Did you collaborate with anyone on this assignment? If so, list their names here.
> *claude.ai was my tutor for this exercise*