# Operations

Using the analogy of the recipe, we can see that the data in the variable [`n`] can be used and manipulated using different operations
- comparing whether [`n`] exceeds a certain value (capacity of bowl)
- reducing the number of [`n`]

![programming_3.png](images/programming_3.png)

In a program, we can perform operations on data in a few ways
- `Mathemematical` calculations (e.g. +,-,/,*)
- `Comparison` operations (e.g. <,==,>)
- `Boolean` operations (e.g. AND, OR)

Furthermore, these operations can be performed on `single` values or `multiple` values in a vector 

---
## Operations on single values

### 1. Mathematical operators

Mathematical operations can be performed on `numeric/integer` data

In [None]:
a <- 1
b <- 2

a + b  # addition
a - b  # subtraction
a * b  # multiplication
a / b  # division
a ^ b  # power
a %% b # modulo (remainder)

### 2. Comparison operators

Comparisons can be done between `numeric/integer` data

In [None]:
a <- 1
b <- 2

a == b  # equal to
a != b  # not equal to
a > b   # greater than
a >= b  # greater than or equal to
a < b   # less than
a <= b  # less than or equal to

We can also use these operators to compare `strings/characters`

In [None]:
a <- "patient_1"
b <- "patient_2"

a == b # strings are the same
a != b # strings are not the same

### 3. Boolean operators

We can apply Boolean operations on `logical` data types

In [None]:
a <- TRUE
b <- FALSE

a & b # AND
a | b # OR
!a    # NOT

---
## Operations on multiple values in vectors

### 1. Mathematical operations with a scalar value

![op_math_scalar.png](images/op_math_scalar.png)

We can perform apply a mathematical calculation on each element of a vector with a `single`(scalar) value

In [None]:
vector_a <- c(10,20,30,40)

vector_a + 2
vector_a - 2
vector_a * 2
vector_a / 2
vector_a ^ 2
vector_a %% 2

### 2. Comparison operations with a scalar value

![op_comp_scalar.png](images/op_comp_scalar.png)

We can perform compare each element in a vector with a `single`(scalar) value

In [None]:
vector_a <- c(10,20,30,40)

vector_a == 20
vector_a != 20
vector_a >= 20
vector_a > 20
vector_a < 20
vector_a <= 20

### 3. Mathematical operations with another vector

![op_math_vector.png](images/op_math_vector.png)

We can also perform mathematical calculations each element in one vector with the corresponding element in another vector

In [None]:
vector_a <- c(10,20,30,40)
vector_b <- c(1,2,3,4)

vector_a + vector_b
vector_a - vector_b
vector_a * vector_b
vector_a / vector_b
vector_a ^ vector_b
vector_a %% vector_b

### 4. Comparison operations with another vector

![op_comp_vector.png](images/op_comp_vector.png)

We can also perform comparisons of each element in one vector with the corresponding element in another vector

In [None]:
vector_a <- c(10,20,30,40)
vector_b <- c(1,2,3,4)

vector_a == vector_b
vector_a != vector_b
vector_a >= vector_b
vector_a > vector_b
vector_a < vector_b
vector_a <= vector_b

---
## Using comparison operations for selecting items
Now that we can do comparisons, the boolean output ```TRUE``` ```FALSE``` can be used to select elements from a data structure
- Instead of using the index or name within the ```[``` ```]``` selector, we can use the results of a comparison for selection based on comparisons

### 1. Vector

In [None]:
vector_a <- c(10,20,30,40)

vector_a %% 4 == 0            # test if divisible by 4

If we want to get the values that test TRUE for the comparison, we pass the vector of results from the comparison to the `[]` selector

In [None]:
vector_a[c(F,T,F,T)]

Thus, to obtain the values of a comparison, we combine both the `comparison` operation and the `[]` selector

In [None]:
vector_a[vector_a %% 4 == 0]  # use the result to select elements that meet this criteria

### 2. Data Frame

In [None]:
this_is_a_df <- data.frame(id=c(20201,20205,20212,20213,20216),
                           age=c(19,45, 23, 55, 65), 
                           name=c("Alice","Bob","Charlie","David", "Eliza"),
                           risk=c("low", "med", "high", "high","med"))

this_is_a_df

We can select the column containing the values of interest using the `$` selector. This will return a vector that we can compare with a value

- For example, we wish to compare the values in the `age` column to `45` as a cut-off

In [None]:
this_is_a_df$age > 45                  # check if age > 45

To obtain the rows in the data frame that return TRUE for the comparison, we combine the comparison operation with a row selector

- `data_frame[comparison_operation, ]`

In [None]:
this_is_a_df[this_is_a_df$age > 45,]   # use the result to select rows that meet this criteria

---
## Operations on factors for ordered categories

Factors can be used for comparison of different categories that have an order (e.g. low, med, high risks)

To do this, 2 parameters need to be specified in the `factor` function
- `levels` option with a vector of the categories in order desired (left to right)
- `ordered` option set to `TRUE`

### 1. Vector

In [None]:
vector_c <- c("low", "med", "high", "high") # vector of strings
vector_c

We can try applying a comparison operator to this vector of characters representing the different categories 

- For example, we will try to find elements that have a risk greater than medium

In [None]:
vector_c > "med" # no meaningful comparison

In this case, we have not converted the vector to `ordered factors` to represent the categories in the proper order (low, med, high)

We can do this by applying the `factor` function to the vector and include 2 parameters 
- `levels` option to specify the order of the categories
- `ordered` option to TRUE

In [None]:
# convert to ordered factors for categories
vector_c <- factor(vector_c, 
                   levels = c("low", "med", "high"), 
                   ordered=TRUE)
vector_c

Now that we have `ordered factors`, we can apply the same comparison operation to identify elements in the vector that are greater than medium risk

In [None]:
vector_c > "med"

Combining the comparison operation with the `[]` selector with give us the values

In [None]:
vector_c[vector_c > "med"]

### 2. Data Frame

We can use the same concept to select rows in a data frame based on comparisons of a categorical data in a column

For example, we would like to select the patients who have `risk` greater than `med`

In [None]:
this_is_a_df <- data.frame(id=c(20201,20205,20212,20213,20216),
                           age=c(19,45, 23, 55, 65), 
                           name=c("Alice","Bob","Charlie","David", "Eliza"),
                           risk=c("low", "med", "high", "high","med"))

this_is_a_df

The `risk` column is of `character` data type and needs to be converted to an `ordered factor` type so that comparisons can be made

- To do this, we use the `factor` function on the `risk` column with `levels` (low, med, high) and `ordered=TRUE` parameters. 
- By assigning the results to the `risk` column, we convert this column from a `character` data type into `ordered factors`

In [None]:
this_is_a_df$risk <- factor(this_is_a_df$risk, 
                            levels = c("low", "med", "high"), 
                            ordered = TRUE)

When we print the data frame, we can see this conversion reflected in the column as `<ord>`

In [None]:
this_is_a_df

We can perform the comparison on the `risk` column to identify `rows` that have risk greater than medium

In [None]:
this_is_a_df$risk > "med"

To obtain the rows in the data frame that return TRUE for the comparison, we combine the comparison operation with a row selector

- `data_frame[comparison_operation, ]`

In [None]:
this_is_a_df[this_is_a_df$risk > "med",] # select rows that meet the risk critera > "med"

---
## Summary
- Operations can be performed on data in several ways
  - Mathematical (`+,-,/,*,^,%%`)
  - Comparison (`<,<=,==.!=,>=,>`)
  - Boolean (`&,|,!`)
- Operations can be done on single values or multiple values in a vector
- Comparison operations can be combined with selectors `[]` to select elements from data structure depending on the condition
  - `vector[conditional statement]`
  - `dataframe[conditional statement,]` for selecting rows
- Comparison operations can be used to select a range of categories
  - They need to converted to an ordered factor before comparison operations can be done
  - For example: `data <- factor(data, levels=vector of categories, ordered=TRUE)`
 

---
## Exercise - Operations

In [None]:
this_is_a_df <- data.frame(id=c(20201,20205,20212,20213,20216), 
                           age=c(19,45, 23, 55, 65), 
                           name=c("Alice","Bob","Charlie","David", "Eliza"),
                           risk=c("low", "med", "high", "high","med"))
this_is_a_df

### Part 1

Select patients with high risk, then return a vector of their ages

In [None]:
# start here

In [None]:
# solution

this_is_a_df[this_is_a_df$risk=="high",]$age

### Part 2

Select patients with ages greater than or equal to 55, return the risk as a vector and tabulate their counts

In [None]:
# start here

In [None]:
# solution

table(this_is_a_df[this_is_a_df$age >= 55,]$risk)