# Variables in R

This notebook is a quick reference + practice space for **variables** in R: assigning values, inspecting them, and working with common container types.

## What you'll practice
- Assigning variables (`<-`, `=`) and choosing good names
- Understanding common data types in R
- Working with vectors, lists, and data frames / tibbles
- Accessing values with `[ ]`, `[[ ]]`, and `$`


## 1. Assigning variables

In R, you most often assign with `<-`.

- `x <- 3` assigns 3 to `x`
- `=` also works in most cases, but `<-` is standard in most R (makes it unambiguous from other languages like Python)


In [None]:
# Basic assignment
x <- 3
x

### Printing

Printing variables sends them to the 'console'. This is clearer when running a script in R studio. 

In Jupyter notebooks you can print by just typing the variable name. 

In [None]:
print(z)
z

### Naming variables

Names of variables should indicate what their trying to store. Keeping variable names concise reduces typos.

- **Variable names**:
  - Can contain: letters, numbers, `_`, `.`  
  - Cannot start with a number or `_`  
  - Case-sensitive

General conventions:
- `camelCase`, `snake_case`, and other conventions to delimit words
- Avoid using built-in names like `c`, `T`, `F`, `mean`, `data`, etc.


In [None]:
# Examples of good names


In [None]:
# Possible names, but avoid it:


In [None]:
# BAD IDEA: 


In some programming languages (like C++) this wouldn't be allowed. This is a consequence of R being a _dynamically typed language_. 

A variable can start out as a number (or function) and end up something else by the end of the code.

For example in the code chunk below, `a` starts out as text and then later it's assigned a number. This can happen with functions, too. It might happen as you're learning the language and variables R already provides.

In [None]:
a <- "Hello, world"
print(a)
a <- 1
print(a)

## 2. Inspecting what's inside a variable

Let's take a deeper look as what a variable is

Useful helpers:
- `class(x)` : high-level class
- `typeof(x)` : underlying storage type
- `length(x)` : number of elements
- `str(x)` : compact structure display


## 3. Atomic types and vectors

The most common *atomic* types:
- logical: `TRUE`, `FALSE`
- integer: `1L`
- double (numeric): `1`, `3.14`
- character: `"text"`, `'text'`

A **vector** is one-dimensional and must be *all one type* (R will coerce if needed).

In [None]:
# Atomic values
b <- 
i <- 
n <- 
ch <- 

class(b); typeof(b)
class(i); typeof(i)
class(n); typeof(n)
class(ch); typeof(ch)

# Vectors with c()


### Converting between types

Common conversion helpers:
- `as.integer()`, `as.double()` / `as.numeric()`, `as.character()`, `as.logical()`


In [None]:
as.integer(3.9)        # drops the decimal
as.numeric("3.14")      # character -> numeric
as.character(100)       # numeric -> character
as.logical(0)           # numeric -> logical (0 becomes FALSE)

## 4. Missing / special values

- `NA` : missing value
- `NaN` : not-a-number (result of undefined numeric operation)
- `Inf`, `-Inf` : infinity
- `NULL` : "nothing here" (often means absence of an object/value)

Use `is.na()` for `NA` and `is.null()` for `NULL`.


In [None]:
x <- c(10, NA, 30)
x
is.na(x)

0/0        # NaN
1/0        # Inf

is.nan(0/0)

# NULL is different from NA
nothing <- NULL
is.null(nothing)

## 5. Common container types

- Vectors
     - One type only.
- Lists
    - Can hold *different* types (and even other lists).
- Data frames / tibbles
    - Tabular data (columns can have different types).
    - Can be filled with vectors of the same length.

In [None]:
# Vector
v <- c(38.89987, -77.04616)
v

In [None]:
# Named vector (names are great for readability)
coords <- c(lat = 38.89987, lng = -77.04616)
coords

In [None]:
# List (can mix types)
L <- list(
  lat = 38.89987,
  lng = -77.04616,
  label = "MPA Building",
  coords = coords
)
L

In [None]:
# Data frame
df <- data.frame(
  lat = 38.89987,
  lng = -77.04616,
  label = "MPA Building"
)
df

In [None]:
# We will use the tidyverse data frame 
# Outside of Jupyter Hub you might need to install the package first.
# install.packages("tibble")
library(tibble)
tab <- tibble(
  lat   = c(38.89859285082495,38.8998733051704),
  lng   = c(-77.0461995467069,-77.0461654305575),
  label = c("Bell Hall", "MPA's Building")
)

## 6. Accessing values: `[ ]`, `[[ ]]`, and `$`

- For vectors:
  - `v[1]` gives the first element (R is 1-indexed)
- For lists:
  - `L[["lat"]]` extracts the *value*
  - `L["lat"]` keeps a list
  - `L$lat` is shorthand for named elements
- For data frames:
  - `df[1, 2]` row/column indexing
  - `df$lat` extracts a column


In [None]:
# Vector indexing
v <- c(10, 20, 30)
v[1]   # first index (counting starts at 1)
v[-1]  # everything but the first one
v[2:3] # same as above but by 'slicing'

In [None]:
# Named vector indexing
coords <- c(lat = 38.89987, lng = -77.04616)
coords["lat"]
coords[c("lng", "lat")]

In [None]:
# List indexing
L <- list(lat = 38.89987, lng = -77.04616, label = "MPA Building")


In [None]:
# Table / data frame indexing is alot like list indexing


# Operators: arithmetic, comparisons, and logic

In R, you’ll constantly build logical **expressions** using operators. These are the building blocks for both computation *and* later topics like control flow.

## Arithmetic operators

Arithmetic operators combine numeric values to form expressions.

- `+`, `-`, `*`, `/`, `^` — standard addition, subtraction, multiplication, division, exponentiation  
- `%%` — modulo (the remainder after division)  
- `%/%` — integer division (division rounded down to an integer)

In [None]:
2 + 3 * 4 

Precedence matters. When in doubt, use parentheses. 

In [None]:
(2 + 3) * 4 

In [None]:
2^3

In [None]:
10 %% 3 

In [None]:
10 %/% 3  

> Aside: In none of these code chunks do I assign things to a variable. Did I store anything in memory? 

## Comparisons (return `TRUE` / `FALSE`)

Comparison operators compare two values and return a logical result (`TRUE` or `FALSE`).

- `==` equal to  
- `!=` not equal to  
- `<` less than, `<=` less than or equal to  
- `>` greater than, `>=` greater than or equal to

## Logical operators

Logical operators combine or modify logical values (`TRUE` / `FALSE`).

- `&` and `|` are **vectorized** (element-wise), so they operate on every element of a logical vector  
- `&&` and `||` are **short-circuit** operators, so they only evaluate the first element (best inside `if (...)`; we’ll revisit this in `Control_Flow.ipynb`)  
- `!` negates a logical value (`!TRUE` is `FALSE`)

In [None]:
c(TRUE, FALSE, TRUE) | c(TRUE, TRUE, FALSE)

In [None]:
c(TRUE, FALSE, TRUE) & c(TRUE, TRUE, FALSE)

# Control Flow

Control flow provide instructions for how your code makes decisions and repeats work.

**You'll use this for:**
- branching: doing different things depending on conditions
- looping: repeating steps
- handling special (edge) cases cleanly

## `if`, `else if`, `else`

Use `if` when you want to branch on a **single** boolean expression (`TRUE`/`FALSE`).

```r
if (boolean expression) {
  ...
} else if (boolean expression) {
  ...
} else {
  ...
}
```

The boolean expression is usually based on a variable that can change in your code. It asks a question that can be answered with a `TRUE` or `FALSE`. 

### `if` expects a single TRUE/FALSE

If your condition comes from a vector, use `any(...)` / `all(...)`.


## Vectorized conditional: `ifelse()` (and friends)

If you want to create a new vector based on a condition, `ifelse()` is often the simplest tool.

- `ifelse(test, yes, no)` returns a vector the same length as `test`.
- In tidyverse pipelines, prefer `dplyr::if_else()` (stricter about types).


## `for` loops

`for` loops iterate over a sequence.

Two common patterns:
1) build up a result (**preallocate** if possible)
2) do a side effect (printing, plotting, saving)


In [None]:
squares <- c(1,2,3)

for (i in 1:length(squares)) {
  squares[i] <- i^2
}

squares

Same code but **vectorized**

In [None]:
squares <- c(1,2,3)
squares^2

Vectorization is going to be the preferred method when possible. It's easier to read and often will perform the operation quicker. In some cases it will be necessary to make a `for` loop and go elementwise. Use indices when the position matters, or when each step depends on previous steps.

In [None]:
n <- 10
fib <- c(1:n) #make a container for a placeholder (a.k.a. preallocate)

fib[1] <- 1
fib[2] <- 1

for (i in 3:n) {
  fib[i] <- fib[i - 1] + fib[i - 2]
}

fib

## `while` loops

Use these when you don’t know ahead of time how many iterations you’ll need.

- `while (condition) { ... }` keeps going *while* the condition is TRUE.


In [None]:
# while example


## Mini practice

1) Create a vector `x <- c(-3, -1, 0, 2, 5)`.  
   Use `ifelse()` to label each value `"neg"`, `"zero"`, or `"pos"` (hint: nested `ifelse`).

2) Write a `for` loop that computes the cumulative sum of `1:n` without using `cumsum()`.

3) Write a `while` loop that keeps doubling a number starting at `1` until it exceeds `100`.
