# Variables in R

This notebook is a quick reference + practice space for **variables** in R: assigning values, inspecting them, and working with common container types.

## What you'll practice
- Assigning variables (`<-`, `=`) and choosing good names
- Understanding common data types in R
- Working with vectors, lists, and data frames / tibbles
- Accessing values with `[ ]`, `[[ ]]`, and `$`


## 1. Assigning variables

In R, you most often assign with `<-`.

- `x <- 3` assigns 3 to `x`
- `=` also works in most cases, but `<-` is standard in most R (makes it unambiguous from other languages like Python)


In [None]:
# Basic assignment
x <- 3
x

# `=` also works for assignment
y = 10
y

# You will also see right-assignment sometimes (less common)
5 -> z
z

# You can assign a variable another variable
zee <- z
zee

### Printing

Printing variables sends them to the 'console'. This is clearer when running a script in R studio. 

In Jupyter notebooks you can print by just typing the variable name. 

In [None]:
print(z)
z

### Naming variables

Names of variables should indicate what their trying to store. Keeping variable names concise reduces typos.

- **Variable names**:
  - Can contain: letters, numbers, `_`, `.`  
  - Cannot start with a number or `_`  
  - Case-sensitive

General conventions:
- `camelCase`, `snake_case`, and other conventions to delimit words
- Avoid using built-in names like `c`, `T`, `F`, `mean`, `data`, etc.


In [None]:
# Examples of good names
student_count <- 28
course_name <- "DATS 1001"

student_count
course_name

In [None]:
# Possible names, but avoid it:

# variable names with space (or literally whatever)
`student count` <- 58 # can be done in R but not really done in programming
`student count` # it'll mainly be useful for spreadsheets which often
                # have column names that can be anything

# . in the beginning 
.hidden_variable <- "try this in R studio and check the Environment Pane"
.hidden_variable

In [None]:
# BAD IDEA: 

# overwriting common names (don't do this)
# mean <- 5
# c <- 123

# 'mean' and 'c' are variables that hold functions in R. We'll get into more on what 'functions' are.
# For this context you can think of them like any other R object (number, text, vector, etc) that a variable talk about. 
numz <- c(1,2,3)
mean(numz)

In some programming languages (like C++) this wouldn't be allowed. This is a consequence of R being a _dynamically typed language_. 

A variable can start out as a number (or function) and end up something else by the end of the code.

For example in the code chunk below, `a` starts out as text and then later it's assigned a number. This can happen with functions, too. It might happen as you're learning the language and variables R already provides.

In [None]:
a <- "Hello, world"
print(a)
a <- 1
print(a)

## 2. Inspecting what's inside a variable

Let's take a deeper look as what a variable is

Useful helpers:
- `class(x)` : high-level class
- `typeof(x)` : underlying storage type
- `length(x)` : number of elements
- `str(x)` : compact structure display


In [None]:
x <- 3
class(x)
typeof(x)
length(x)
str(x)

In [None]:
s <- "hello"
class(s)
typeof(s)
str(s)

## 3. Atomic types and vectors

The most common *atomic* types:
- logical: `TRUE`, `FALSE`
- integer: `1L`
- double (numeric): `1`, `3.14`
- character: `"text"`, `'text'`

A **vector** is one-dimensional and must be *all one type* (R will coerce if needed).

In [None]:
# Atomic values
b <- TRUE
i <- 1L
n <- 3.14
ch <- "R"

class(b); typeof(b)
class(i); typeof(i)
class(n); typeof(n)
class(ch); typeof(ch)

# Vectors with c()
v_num <- c(1, 2, 3)
v_chr <- c("a", "b", "c")

v_num
v_chr

# Coercion: mixing types forces a common type
v_mixed <- c(1, "2", 3)
v_mixed
class(v_mixed)

### Converting between types

Common conversion helpers:
- `as.integer()`, `as.double()` / `as.numeric()`, `as.character()`, `as.logical()`


In [None]:
as.integer(3.9)        # drops the decimal
as.numeric("3.14")      # character -> numeric
as.character(100)       # numeric -> character
as.logical(0)           # numeric -> logical (0 becomes FALSE)

## 4. Missing / special values

- `NA` : missing value
- `NaN` : not-a-number (result of undefined numeric operation)
- `Inf`, `-Inf` : infinity
- `NULL` : "nothing here" (often means absence of an object/value)

Use `is.na()` for `NA` and `is.null()` for `NULL`.


In [None]:
x <- c(10, NA, 30)
x
is.na(x)

0/0        # NaN
1/0        # Inf

is.nan(0/0)

# NULL is different from NA
nothing <- NULL
is.null(nothing)

## 5. Common container types

- Vectors
     - One type only.
- Lists
    - Can hold *different* types (and even other lists).
- Data frames / tibbles
    - Tabular data (columns can have different types).
    - Can be filled with vectors of the same length.

In [None]:
# Vector
v <- c(38.89987, -77.04616)
v

In [None]:
# Named vector (names are great for readability)
coords <- c(lat = 38.89987, lng = -77.04616)
coords

In [None]:
# List (can mix types)
L <- list(
  lat = 38.89987,
  lng = -77.04616,
  label = "MPA Building",
  coords = coords
)
L

In [None]:
# Data frame
df <- data.frame(
  lat = 38.89987,
  lng = -77.04616,
  label = "MPA Building"
)
df

In [None]:
# We will use the tidyverse data frame 
# Outside of Jupyter Hub you might need to install the package first.
# install.packages("tibble")
library(tibble)
tab <- tibble(
  lat   = c(38.89859285082495,38.8998733051704),
  lng   = c(-77.0461995467069,-77.0461654305575),
  label = c("Bell Hall", "MPA's Building")
)

## 6. Accessing values: `[ ]`, `[[ ]]`, and `$`

- For vectors:
  - `v[1]` gives the first element (R is 1-indexed)
- For lists:
  - `L[["lat"]]` extracts the *value*
  - `L["lat"]` keeps a list
  - `L$lat` is shorthand for named elements
- For data frames:
  - `df[1, 2]` row/column indexing
  - `df$lat` extracts a column


In [None]:
# Vector indexing
v <- c(10, 20, 30)
v[1]   # first index (counting starts at 1)
v[-1]  # everything but the first one
v[2:3] # same as above but by 'slicing'

In [None]:
# Named vector indexing
coords <- c(lat = 38.89987, lng = -77.04616)
coords["lat"]
coords[c("lng", "lat")]

In [None]:
# List indexing
L <- list(lat = 38.89987, lng = -77.04616, label = "MPA Building")

L[1]              # returns the first value

L["lat"]          # returns the value associated with 'lat' in list form
L[["lat"]]        # returns the value
L$lat             # returns the value (shorthand)

In [None]:
# Table / data frame indexing is alot like list indexing

tab$lat          # returns a vector of doubles
tab$lat[1]       # returns a value