# Data Structures

Now that you're familiar with the basic data types in R, let's explore some of the main structures used for storing these data.

# Vectors

The simplest data structure in R is the vector. Vectors can contain elements such as numbers, characters, factors, or logical values, but all elements within a vector must be of the same type. A vector with a single value (length 1) is known as a scalar. It's important to note that while vectors cannot mix data types, they can include `NA` values, which represent missing data.

In [2]:
# For example
numbers <- c(1, 2, 3, 4, 5)  # Numeric vector
words <- c("apple", "banana", "cherry")  # Character vector

# Matrices and Arrays

Matrices are another common data structure in R, particularly useful in fields like statistics and ecology. A matrix is essentially a vector with added dimensions, forming a two-dimensional table. Arrays extend this concept to more than two dimensions. Like vectors, all elements within a matrix or array must be of the same data type.

Matrices and arrays can be easily created using the `matrix()` and `array()` functions, respectively. You can also assign row and column names to matrices, which can help organize and interpret the data.

In [3]:
# For example
matrix_data <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
print(matrix_data)

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6


# Lists

Lists are a flexible data structure that can store a mixture of different data types. Unlike vectors and matrices, lists can contain elements of different classes, including other lists or data structures. This makes lists ideal for storing irregular or complex data.

You can create a list using the `list()` function and name the elements within the list for easier reference.

In [4]:
# For example

my_list <- list(name = "John", age = 30, married = TRUE)
print(my_list)

$name
[1] "John"

$age
[1] 30

$married
[1] TRUE



# Data Frames

Data frames are perhaps the most commonly used data structure in R. They are two-dimensional tables that resemble matrices but can contain different types of data in each column. Typically, each row in a data frame represents an observation, and each column represents a variable.

Data frames are especially useful for organizing and analyzing data, and they are similar in structure to spreadsheets used in applications like Excel. You can create a data frame using the `data.frame()` function, and it's important to ensure that all columns have the same number of observations. Missing data should be represented as `NA`.

In [6]:
# For example
df <- data.frame(
  id = c(1, 2, 3),
  name = c("John", "Jane", "Doe"),
  age = c(28, 24, 35)
)
print(df)

  id name age
1  1 John  28
2  2 Jane  24
3  3  Doe  35
