### <center> Build-in R Features </center>

### Build-in Data Structure Functions

* R contains quite a few useful build-in function to work with data structures. 
* Here are some of the key functions to know: 

| Function | Description| 
| ---- | ---- |
|**`seq(start, end, step size)`** | Create number sequences |
|**`sort(vector.name)`** | Sort a vector |
|**`rev(object.name)`** | Reverse elements of an object |
|**`str(object.name)`** | Show the structure of an object | 
|**`append()`** | Merge object together (works on vectors and lists)|
|**`is.objectType(object)`** | Check the class of an R object |
|**`as.objectType(object)`** | Convert (cast) R object |

* Examples:

In [28]:
seq(from=0, to=10, by=2) # if you put in order explicit call not nessecary

In [16]:
v1 <- c(1, 4, 6, 7, 2, 13, 3)

In [29]:
sort(x=v1)

In [30]:
sort(x=v1, decreasing = TRUE)

In [31]:
v2 <- c(1,2,3,4,5)
v2

In [32]:
rev(x = v2)

In [33]:
str(v1)
str(v2)

 num [1:7] 1 4 6 7 2 13 3
 num [1:5] 1 2 3 4 5


In [24]:
append(v1, v2)

In [34]:
sort(append(v1,v2)) # You can combine functions just like in Excel

In [40]:
v3 <- c(1,2,3)

In [41]:
is.vector(v3)

In [42]:
is.list(v3)

In [43]:
is.matrix(v3)

In [48]:
v4 = as.matrix(v3)
v4

0
1
2
3


In [49]:
is.matrix(v4)

### **`apply()`** functions

* In this lecture we will learn about three different **`apply()`** functions, to apply a function over an iterable object like a list or a vector.
* Note that we are passing a function as an argument, not calling it, so we don't use parentheses **`()`**.

| Function | Description| 
| ---- | ---- |
|**`lapply(list_or_vector, function)`** | Return a list of same length as X, each element is the result of applying function argument. "l" stands for list. |
|**`sapply(list_or_vector, function, simplify = TRUE)`** | User-friendly version of `lapply()`, return a `vector` or a `matrix` if `simplify` is `TRUE`. "s" stands for simplify. |
|**`sample(vector, sample_size)`** | Takes a sample of specified sample size from elements of object x, typically a vector, with or without replacement. |

* First let's show a quick useful function called **`sample()`**. 

In [54]:
sample(x = 1:10, size=2)

* Now let's look at an example where we use **`lapply()`** to apply a function over a vector. 

In [68]:
v <- 1:5 # Create a vector from 1 to 5

add.rand <- function(n){ 
    ran <- sample(x=1:10, 1) # Get a random number
    return (n + ran) # Return x plus a random number
    } 

lapply(v, add.rand) # Returns a list where some random number is added to each vector element

In [67]:
add.rand(v) # add.rand by itself adds the same random number to every element in the list

* Often times you don't want a **`list`** back, but something simpler, like a vector or a matrix.
* This is where **`sapply()`** comes in, which simplifies the process by returning a **`vector`**.

In [69]:
sapply(v, add.rand)

* Let's look at one more example

In [73]:
v <- 1:5

times2 <-function(num){ 
    return(num*2)
    } 

result <- sapply(v, times2) 
result

**`sapply()` limitation:** It won't be able to automatically return a `vector` if your applied function doesn't return something for all elements in that `vector`. It returns a `list` instead, like `lapply`.

In [81]:
even.check <- function(x){ 
    return(x[(x%%2==0)])
    } 
nums <- c(1, 2, 3, 4, 5)

In [83]:
sapply(nums, even.check)
class(sapply(nums, even.check))

In [84]:
lapply(nums, even.check)
class(lapply(nums, even.check))

* There are actually several **`apply()`** functions and you can find all of them by just calling **`help()`** on any one of them.
* The base function of all these **`lapply()`**, so the rest are a wrapper around it.

Expert from the **`help()`** documentation: 

* **`lapply`** returns a `list` of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.

* **`sapply`** is a user-friendly version and wrapper of **`lapply`** by default returning a vector, matrix or, if simplify = "array", an array if appropriate, by applying **`simplify2array()`**. 
* **`sapply(x, f, simplify = FALSE, USE.NAMES = FALSE)`** is the same as **`lapply(x, f)`**.

* **`vapply`** is similar to **`sapply`**, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.

* **`replicate`** is a wrapper for the common use of **`sapply`** for repeated evaluation of an expression (which will usually involve random number generation).

* **`simplify2array()`** is the utility called from **`sapply()`** when simplify is not false and is similarly called from **`mapply()`**.

#### Anonymous Functions

* When functions are simple and we only use once, it better to express them as anonymous functions, rather than formally define them. 
* They are similar to `lambda functions` in Python.
* General format:  **`function(parameter){code to return something here}`**

In [75]:
v <- 1:5

sapply(v, function(num){num*2})

* If you look at the formal definition of the function, you will see the anonymous function, removes unnessessary elements from it: 
    * Removes the function name and assignment operator
    * Removes the return keyword with parentheses

In [None]:
times2 <-function(num){ 
    return(num*2)
    } 

#### **`lapply()`** and **`sapply()`** with more than one arguments

In [77]:
v <- 1:5

add_choice <- function(num, choice){ 
    return(num + choice)
    } 

In [78]:
add_choice(10, 2)

* Here the apply function knows what to use for number, as it uses the vector number. 
* But it doesn't know what to use for the choice.
* Se we have to put them as an additional argument in the apply function.

In [79]:
sapply(v, add_choice)

ERROR: Error in FUN(X[[i]], ...): argument "choice" is missing, with no default


In [80]:
sapply(v, add_choice, choice=100)

### Math Functions with R 

| Function | Description | 
| ---- | ---- |
|**`abs(x)`** | Returns the absolute value of x |
|**`sum(x)`** | Returns the sum of all values present in x | 
|**`mean(x)`** | Returns the average of elements in x |
|**`sqrt(x)`** | Returns the square root of x |
|**`ceiling(x)`** | Returns the ceiling of x | 
|**`floor(x)`** | Returns the floor of x |
|**`trunc(x)`** | Returns the trucation of x | 
|**`round(x, digits=n)`** | Rounds x to n decimal points |
|**`signif(x, digits=n)`** | Returns x with a TOTAL of n digits |
|**`cos(x), sin(x), tan(x)`** | Performs trigonometric operations on x |
|**`log(x)`** | Returns the natural algorithm of x | 
|**`log10(x)`** | Returns the common (base 10) logarithm of x | 
|**`exp(x)`** | Returns the natural exponent of x (e^x) |

Some examples: 

In [85]:
abs(-2)

In [88]:
v <- c(-1, 2, -3, 4, -5, 6, -7, 8, -9, 10)

In [89]:
sum(v)

In [90]:
abs(v)

In [91]:
mean(v)

In [92]:
round(12.3456, 2)

In [93]:
signif(12.3456, 2)

### Regural Expressions in R 

* Regular expressions is a general term which covers the idea of pattern searching, typically in a string (or a vector of strings). 

* We will learn two useful functions of regular expressions and pattern searching, and we will go deeper in the topic later on. 

* For both of these functions you pass in a pattern and then the object you want to search for. 

| Function | Description | 
| ---- | ---- | 
|**`grepl()`** | Returns a logical (TRUE or FALSE) indicating if pattern was found. | 
|**`grep()`** | Returns a vector of index locations of matching pattern instances. | 

* **`grep`** (= global regular expression print) is a command-line utility for searching plain-text data sets for lines that match a regular expression.
* **`grepl`** stands for grep logical.

In [2]:
text <- "Hello there, can you tell in which direction is the port?" 

In [3]:
grepl("port", text)

In [97]:
grep("port", text) # Returns one because there is a single element, the string itself.

In [98]:
v <- c("a", "b", "c", "d", "e")

In [99]:
grepl("a", v)

In [101]:
grep("a", v)

In [103]:
grep("f", v) # returns null if pattern is nowhere

* We'll learn more regular expression functions as we need them when doing exercises or projects. Want more info on regular expressions with R in the meantime? Check out this [link](https://www.regular-expressions.info/rlanguage.html). 

### Dates and Timestamps 

* R gives us a variety of tool for working with timestamp information. 
* Let's start exploring the **`Date`** object.

| Function | Description | 
| ---- | ---- | 
|**`Sys.Date()`** | Get today's date from the system in the ANSI standard format (YYYY-MM-DD). | 
|**`as.Date()`** | You can convert character string in R to a Date object. | 

* Format codes for the **`as.Date()`** function: 
| Code | Description | 
| ---- | ---- | 
|**`%d`** | Day of the month (decimal number) |
|**`%m`**|Month (decimal nuber)|
|**`%b`**|Month (abbreviated) |
|**`%B`**|Month (full name) |
|**`%y`**|Year (2 digit) |
|**`%Y`**|Year (4 digit) |

* Examples: 

In [104]:
today <- Sys.Date()
today

In [118]:
class(today)  # A date object

* Often times we get the date into a character format, so we need to convert it to a Date format.

In [119]:
as.Date("1990-11-03") # No format argument needed because it is in standard ANSI format.

* Otherwise you have to specify in the **`format=`** argument the format the date is in the string:

In [124]:
as.Date("Nov-03-90", format="%b-%d-%y") 

In [125]:
as.Date("November-03-1990", format="%B-%d-%Y")

In [123]:
as.Date("june, 01, 2002", format="%B, %d, %Y")

**Time**

* We can aso convert strings and work with them for time information. 
* R uses a **POSIXct** and **POSIXlt** object types to store time information.
* **`as.POSIXct()`** converts a string to a POSIXcs object type for time series analysis, etc.
* The format codes are best seen through the help documentation for the **`strptime()`** function.

* Notice how today's date was added automatically because we didn't specify it in the format argument

In [116]:
as.POSIXct("11:02:03", format="%H:%M:%S")

[1] "2021-09-13 11:02:03 BST"

In [117]:
as.POSIXct("November-03-1990 11:02:03", format="%B-%d-%Y %H:%M:%S")

[1] "1990-11-03 11:02:03 GMT"

* Usually however, we will be using the **`strptime()`** function.
* In the function documentation, you will hind all the format codes. 
* We will use this function exactly as we used the **`as.Date()`** and **`as.POSIXct()`** functions. 

In [5]:
strptime("11:02:03", format="%H:%M:%S")

[1] "2021-09-24 11:02:03 BST"

In [127]:
strptime("November-03-1990 11:02:03", format="%B-%d-%Y %H:%M:%S")

[1] "1990-11-03 11:02:03 GMT"

In [128]:
strptime("Nov-03-90", format="%b-%d-%y")

[1] "1990-11-03 GMT"