# R Basics
By Shuhei Kitamura

### Outline
1. Hello World
2. Arithmetic Operation
3. Variables and Objects
4. Data Types
5. Data Structures
    - (i) Vectors
    - (ii) Matrices and Arrays
    - (iii) Data Frames
    - (iv) Factors
6. Attributes

## 1. Hello World
- Type `"Hello World"` and execute (Shift + Enter or push "Run Cells" button above).
- Next, do the same for `print("Hello World")`.

- If you want to write a comment, rather than code to execute, use `#` (like Python).
- Write `1 + 2` with and without `#`.

## 2. Arithmetic Operation
- Any arithmetic operation is possible with R.
- Main operators are `+`, `-`, `*`, and `/`.
    - `^` or `**` for (mathematical) power. Power is right associative (like Python).
- Write `-2 ** 4` and execute. 

- `%%` for modulus (the remainder from the division) and `%/%` for floor division (integer divide).
    - Recall: `%` (modulus) and `//` (floor division) in Python.
- Calculate `7 %% 2`.
- Calculate `7 %/% 2`.

## 3. Variables and Objects
- You make a variable and assign data to it. The data are also called **objects** in R.
- You write `<-` or `->` to define a variable in R. 
    - You can also use `=` (like Python), but this notation is uncommon.
- For example, `x <- 1` creates a variable with a name `x`, and assigns object `1` to that variable.
    - (Unlike Python, R does not allow you to directly access the computer's memory.)

- In Jupyter Notebook, you can use `Alt & -` to insert `<-` and spaces. (See Keyboard Shortcuts in the Help tab above.)
- You can also use `( <- )` to assign and print at once.
- Assign `1 + 2` to variable `x` and print it at the same time.

## 4. Data Types
- Basic data types in R are: **real (double), complex, integer, character,** and **logical**.
    - Real is like float in Python.
    - For integer, you should put `L` after a value.
    - Logical is like boolean in Python, but you should write `TRUE` or `FALSE`, instead of `True` or `False`.

- To check the type of an object, use `typeof()`.
- Alternatively, you could use...
    - `mode()`: Mode of an object. **numeric** means **integer** or **double**.
    - `storage.mode()`: Storage mode of an object.
    - See e.g. [this website](https://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html#Objects) for further explanation.
- To check the property of an object that determines how generic functions operate with it, use `class()`.
- Check the types of the following objects.

In [None]:
1 # this is not an integer unlike Python. double means real.
1L # this is an integer
1.0 # .0 is not shown unlike Python
1.5
"TRUE"
TRUE
matrix(1:4,2)

- To change types, use, e.g., `as.double()`, `as.integer()`, `as.character()`, and `as.logical()`.
    - You can also use `as.numeric()`.
- To check types use, e.g., `is.double()`, `is.integer()`, `is.character()`, and `is.logical()`.

In [None]:
print(as.character(1.5))
print(is.integer(1.5))

- `FALSE` is `0` and `0` is `FALSE`. 
    - `TRUE` is `1` and `1` is `TRUE`.
    - The other numerics are also `TRUE`. 
- However, characters are not `TRUE`. (Recall that strings are `True` in Python.)
- Check the logical of `"Hello"`.

- `NULL` means non-existence and its type is `NULL` (like `None` in Python).
- `NaN` (Not a Number) is double (like `NaN` in Python).
- `NA` (Not Available) is logical.

- You ***cannot*** use `+` and `*` for characters or logical (unlike Python).
- Try `"Ha" * 4`.

- This means that you cannot write like `print("Hello" + str(10) + "World")`.
- Instead, use `paste()` or `paste0()`.
    - What is the difference? Try them.

In [None]:
print(paste("I have", 10 ,"bucks."))
print(paste0("I have", 10 ,"bucks."))

## 5. Data Structures
- R has several data structures. Major ones are **vectors**, **matrices**, **arrays**, **data frames**, and **factors**.
    - Recall that matrices are also a type in Python, data frames cannot be used without Pandas, etc. 

### (i) Vectors
### (Atomic) Vectors
- A vector `c()` is a one-dimensional array (like `numpy.array` in Python).
- A vector can contain any type (like `list` or `numpy.array` in Python).
- But a vector itself is not a data type in R.

**Making vectors**
- Make a vector by typing like `c(item1, item2,...)`.
    - A vector can contain a vector, and that vector can contain another vector, etc.
- You can make an empty vector by typing `c()`.

In [None]:
print(c("a", 1, TRUE, 2.5))
print(c())

- There are a few clever ways to make a vector. For example:
    - `:`: a vector of integers.
    - `double()` and `integer()`: a vector of zeros as real and integer, respectively.
        - `numeric()`: a vector of zeros as real.
    - `seq()`: a vector of numerics.
    - `rep()`: a vector of repeated items.

In [None]:
print(-1:4)
print(double(5))
print(seq(-1, 4, by=2))
print(rep("Ha", 4)) 

- `c(,)` can be used in `print()` to show multiple objects at once (like `print(,)` in Python).

In [None]:
print(c(c(1, 2), c(3, 4)))

**Types and length**
- If a vector contains multiple types, the highest one is chosen (like `numpy.array`).
   - Types are ordered: character > complex > real > integer > logical > NULL.
- Check the type of the following vector.

In [107]:
vec1 <- c("a", as.complex(1.0), 1L, TRUE, 2.5, NULL)

- To get the length of a vector, use `length()` (like `len()` in Python).
    - Why, not `6`?

In [None]:
length(c("a", as.complex(1.0), 1L, TRUE, 2.5, NULL))

**Getting items**
- To get a subset of a vector, use `[]` (like Python). 
    - R does ***not*** use zero-based indexing. The index starts from one.
- You can also use vectors, names, slicing, and relational operators to take a subset of a vector.
    - `myvec[inclusive:inclusive]` for slicing.
    - You cannot write like `myvec[-1:]` or `myvec[:4]` unlike Python.
- A minus sign means exclusion, not the index from the end of a vector (unlike Python).

In [None]:
vec1 <- c(1, 10, 100, 1000)
names(vec1) <- c("v1", "v2", "v3", "v4") # add names if you like. you can also write like c(v1=1, v2=10, v3=100, v4=1000).
print(vec1[-1])
print(vec1[c(2, 3)])
print(vec1[1:3])
print(vec1[c("v1", "v4")])
print(vec1[vec1 > 100])

**Checking items**
- You can check if an item is in the list using `in` e.g. `"x" %in% myvec`, which returns `TRUE` or `FALSE` (like `in` in Python).
- Check if `"spike"` is in `vec1`.

In [108]:
vec1 <- c("jerry", "tom", "spike")

**Adding items**
- To add an item to a vector, use `c(myvec, item)` or `append(myvec, item)`.
    - The item can be a vector or have other data structure.
- Add `"tyke"` to `vec1`.

In [109]:
vec1 <- c("jerry", "tom", "spike")

**Changing items**
- To change an item of a vector use `myvec[index] <- newvalue`.
- Change `5` to `10` in `vec1`.

In [110]:
vec1 <- c(1, 5, 100, 1000)

**Deleting items**
- Deleting items of a vector is like taking its subset.
- Make a vector `vec2` by removing `1` and `10` from `vec1` using slicing.

In [111]:
vec1 <- c(1, 10, 100, 1000)

- You can also use operators like `%in%` and `!` to get a subset.

In [112]:
vec1 <- c(1, 10, 100, 1000)
remove <- c(1, 100)
vec1 <- vec1[!vec1 %in% remove]

**Exercise (vectors)**
- 1. Make a vector of integers from 1 to 5.
- 2. Get only odd numbers from the vector using an `%in%` operator.

### Lists
- A list is also a vector (in a broad sense).
    - It can contain any type (like `list` in Python).
    - It preserves the types of its entries unlike an (atomic) vector.
- The type of a list is list.
- You can make a list using `list()`.
- Check the type and length of `list1`. What is the type of `"vec"` in `list1`?

In [6]:
vec1 <- c(1, 2)
mat1 <- matrix(1:4, nrow=2, ncol=2)
df1 <- mtcars
list1 <- list(vec=vec1, mat=mat1, df=df1) 

**Getting items**
- To get an item of a list, use `[[]]` or ` $ `.
    - Check the difference between `list1[1]` and `list1[[1]]`in the below example. Hint: Check their types.
        - Alternatively, you can write like `list1["name"]` and `list1[["name"]]` if names are given.
    - Next, get the fourth element of `mat1`, i.e., `4`. Hint: `mylist[[index1]][index2]`.

In [10]:
vec1 <- c(1, 2)
mat1 <- matrix(1:4, nrow=2, ncol=2)
df1 <- mtcars
list1 <- list(vec=vec1, mat=mat1, df=df1)

**Checking, adding, changing, and deleting items**
- The methods for checking, adding, changing, and deleting items for lists are similar to those for vectors, except that you need to use `[[]]` to select an item.
- Assignment 3 asks you to do such exercises.

**Exercise (lists)**
- 1. Make a list of `c("a", "b")` and `c(3, 4, 5)` and name it `list1`.
- 2. Add names `"chr"` and `"real"` to `list1` (or you can already add names in 1).

### (ii) Matrices and Arrays
- Matrices and arrays are built-in data structures in R (unlike Python).
    - A matrix is a two-dimensional array (like `numpy.ndarray` in Python).
    - An array is a multi-dimensional array (like `numpy.ndarray` in Python).
- The type of a matrix or an array is determined by its entries.
- You can make a matrix and an array using `matrix()` and `array()`, respectively.
- Check the types of the following objects. Try `typeof()` and `class()`.

In [115]:
mat1 <- matrix(c(1, 2, "3", 4L), nrow=2, ncol=2) # 2 by 2 matrix
colnames(mat1) <- c("v1", "v2") # add column names if you like
rownames(mat1) <- c("a", "b") # add row names if you like
array1 <- array(1:12, dim=c(2, 3, 2)) # 2 by 3 by 2 array

#### Getting items
- To get an item, use `[]`. You can use either names or indices.
- To slice data, use `mymat[inclusive:inclusive]`.
    - To get columns, use `mymat[, inclusive:inclusive]`.
    - To get rows, use `mymat[inclusive:inclusive, ]`

In [None]:
mat1 <- matrix(c(1, 2, "3", 4L, 5, 6), nrow=3, ncol=3) 
colnames(mat1) <- c("v1", "v2", "v3") 
rownames(mat1) <- c("a", "b", "c") 
print(mat1[1:2]) 
# columns 
print(mat1[, 1:2])  # In Python, mat1.iloc[:, 0:2]
print(mat1[, 'v1'])  # In Python, mat1.loc[:, 'v1'] 
# rows
print(mat1[1:2, ])  # In Python, mat1.iloc[0:2, :]
print(mat1["a", ])  # In Python, mat1.loc['a', :]

**Checking and changing items**
- The methods for such operations for matrices and arrays are very similar to those for vectors. A difference is that the indices can be multi-dimensional (e.g., `mymat[1, 2]`, or `mymat[rowname, colname]` if names are given).
- Change `"3"` to `3L` in `mat1`.

In [47]:
mat1 <- matrix(c(1, 2, "3", 4L), nrow=2, ncol=2) 
colnames(mat1) <- c("v1", "v2") 
rownames(mat1) <- c("a", "b") 

**Adding items**
- Similar to vectors, you can use e.g. `matrix(c(mymat, new_item), nrow=.., ncol=..)` to add items.
- Alternative options are `rbind()` and `cbind()`.
    - `rbind()` adds new items to rows from the bottom.
    - `cbind()` adds new items to columns from the right.
- In both cases, you have to be careful about the length of new items.
- Add `c(1, 2)` to `mat1`. How about `c(1, 2, 3)`?

In [12]:
mat1 <- matrix(c(1, 2, "3", 4L), nrow=2, ncol=2) 

**Deleting items**
- Deleting items is like taking a subset.
- Delete the first row of `mat1`.

In [73]:
mat1 <- matrix(1:4, nrow=2, ncol=2) 

**Item-by-item calculation**
- For item-by-item calculation of matrices, use `+`, `-`, `*`, and `/`.
- Take the item-by-item summation of `mat1` and `mat2`.

In [14]:
mat1 <- matrix(c(1, 2, 3, 4), nrow=2, ncol=2) 
mat2 <- matrix(c(5, 6, 7, 8), nrow=2, ncol=2)

**Transposing a matrix**
- To transpose a matrix, use `t()`.
- Transpose `mat1`.

In [10]:
mat1 <- matrix(1:6, nrow=3, ncol=2) 

#### Sorting items
- To sort items, use `order()` (like `sort_values` in Python).

In [None]:
mat1 <- matrix(c(1, 3, 5, 6, 6, 9, 2, 7, 4), nrow=3, ncol=3) 
colnames(mat1) <- c("v1", "v2", "v3") 
print(mat1[order(mat1[,"v2"], mat1[,"v3"]), ]) # ascending
print(mat1[order(-mat1[,"v2"], -mat1[,"v3"]), ]) # descending 

**Exercise (matrices and arrays)**
- 1. Make a 3-by-4 matrix of real numbers from -9 to 2 and name it `mat1`.
- 2. Sort the rows of `mat1` in a descending order using the second column and define a new matrix `mat2`.
- 3. Muliply `mat1` by `mat2`.

### (iii) Data Frames
- Data frames are similar to `pandas.DataFrame` in Python.
- Data frames are also like R's matrices, but preserve the types of their entries.
- Data frames are also like R's lists, but cannot include items with different lengths.
- You will often work with data frames when you handle data.
- To make a data frame, use `data.frame()`.
- Check the types of `df1` and `df1[3]`, and `df1[[3]]`. Try `typeof()` and `class()`.

In [None]:
col1 <- c("tom", "jerry", "tom", "jerry")
col2 <- c(1999L, 2000L, 1999L ,2000L)
col3 <- c(NaN, 0.2, 0.4, 0.1)
col4 <- c(TRUE, FALSE, FALSE, TRUE)
df1 <- data.frame(name=col1, year=col2, varA=col3, varB=col4, stringsAsFactors=FALSE) # name, year, etc. are column names
print(rownames(df1))
print(rownames(df1) <- c("a", "b", "c", "d")) # add row names if you like

#### Checking structure, shape, and lengths
- To get the structure of a data frame including the types of its entries, use `str()` (like `.info` in Python).
- To get dimensions, use `dim()` (like `.shape` in Python).
- To get the number of rows and columns, use `nrow()` and `ncol()`, respectively.
    - Alternatively, you can also use `length()` and `lengths()`.

In [17]:
#print(mtcars) # mtcars is built-in data
print(str(mtcars)) 
print(dim(mtcars))
print(nrow(mtcars)) 
print(ncol(mtcars))
print(length(mtcars))
print(lengths(mtcars))

#### Summarizing data
- To get a summary statistics of data, use `summary()` (like `describe()` in Python).

In [None]:
print(summary(mtcars))

**Getting, checking, adding, changing, and deleting items**
- The methods for such operations for data frames are very similar to those for matrices.
- To access each items, use `mydf[]` or `mydf[[]]`.
    - It may be more common to use `mydf[]` because it preserves the `data.frame` structure.
    - You can also write like `mydf$column_name`, which is equivalent to `mydf[[column_name]]`, to get a column.
- Get `"name"` column using `df1["name"]`, `df1[["name"]]`, and `df1$name`.

In [23]:
col1 <- c("tom", "jerry", "tom", "jerry")
col2 <- c(1999L, 2000L, 1999L ,2000L)
col3 <- c(NaN, 0.2, 0.4, 0.1)
col4 <- c(TRUE, FALSE, FALSE, TRUE)
df1 <- data.frame(name=col1, time=col2, varA=col3, varB=col4, stringsAsFactors=FALSE) 

**Item-by-item calculation, transposing a data frame, and sorting items**
- The methods for such operations are very similar to those for matrices.

**Exercise (data frames)**
- 1. Make a data frame `df1` using `c("mickey", "minnie", "pooh")`, `c(175, 181, 162)`, and `c(78, 82, 64)`.
- 2. Add names `"name"`, `"height"`, and `"weight"` to `df1` (or you can already add names in 1).
- 3. See what happens to the type of the `"name"` column if you add `"stringsAsFactors=FALSE"` in 1.

### (iv) Factors
- A factor is like an integer vector with labels.
- Common estimation functions such as `lm()` and `glm()` often use factors.
- A factor has a levels attribute.
- Check the type and summary of `fac1`. Try both `typeof()` and `class()`.

In [1]:
fac1 <- factor(c("tom", "jerry", "spike", "tom", "jerry", "tom"))

## 6. Attributes
- All objects except `NULL` can have some attributes.
    - For example, names are attributes.
- To check the attributes of an object, use `attributes()`.
- Check the attributes of `vec1` and `fac1`.

In [None]:
vec1 <- c(1, 10, 100, 1000)
names(vec1) <- c("v1", "v2", "v3", "v4")
fac1 <- factor(c("tom", "jerry", "spike"))