In [None]:
options(jupyter.rich_display = FALSE)

Data vectors in R
========== 

* The fundamental data type in R is the _vector_.
* The single-number objects we have seen so far are one-element vectors.

Variables with single values are inconvenient if we want to process data in bulk, e.g.:

In [None]:
height1 <- 1.70
weight1 <- 65
bmi1 <- weight1 / height1^2

height2 <- 1.75
weight2 <- 66
bmi2 <- weight2 / height2^2

Better way: Hold related values in a **vector**.

Creating vectors
=========
The most general way to create data vectors is to use the `c()` function (_concatenate_).

In [None]:
heights <- c(1.70, 1.75, 1.62)
weights <- c(65, 66, 61)

In [None]:
heights

In [None]:
weights

Vectors can also be created with the _colon operator_ (:)

In [None]:
x <- 2:10
x

Extending vectors
==========
The function `c()` can also be used to add new elements to vectors.

Suppose initially we have only two pieces of data:

In [None]:
heights <- c(1.70, 1.75)

Then we get another data point, and we extend the vector.

In [None]:
heights <- c(heights, 1.62)

In [None]:
heights

Modes
=====

* R variable types are called _modes_.
* Modes include: "numeric", "character", "logical", "complex", and so on.
* All elements in a vector must be of the same mode.

In [None]:
mode(2)
mode("abc")
mode(TRUE)
mode(2+4i)

In [None]:
typeof("abc")

# Vector arithmetic

If you add two vectors with the same number of elements, they are added elementwise.

In [None]:
c(1,4,9) + c(2,16,5)

Same applies to all basic operations:

In [None]:
c(1,4,9) * c(2,16,5)

In [None]:
c(1,4,9) / c(2,16,5)

In [None]:
c(1,4,9) > c(2,16,5)

If an arithmetic or logic operation involves a vector and a single number, the same number is _recycled_ with every element.

In [None]:
c(1,4,9) + 5  # converted to: c(1,4,9) + c(5,5,5)

In [None]:
c(1,4,9) < 5  # converted to: c(1,4,9) < c(5,5,5)

In [None]:
c(1,4,9)^2   # converted to: c(1,4,9) ^ c(2,2,2)

## Pause to think

What is the output of the operation `2 * c(1,2,3) + 3`?

* `5 7 9`
* `2 4 6 3`
* `4 5 6 4 5 6`
* `1 2 3 1 2 3 3`

# Vectorized functions 

## sum(), cumsum()
Adds up all elements in vector

In [None]:
sum(c(1,4,9))

In [None]:
sum(1:10)

In [None]:
cumsum(1:10)

## prod(), cumprod()
Multiplies all elements in a vector

In [None]:
prod(c(1,4,9))

In [None]:
prod(1:5)  # 5!

In [None]:
cumprod(1:5)

## Pause to think

Which of the following commands can be used to calculate $\sum_{i=1}^{10} i^2$?

* `sum(1:10^2)`
* `sum(1:10)^2`
* `sum((1:10)^2)`
* `sum(1^2:10^2)`

# Mathematical functions
Familiar mathematical functions are designed to apply on vectors elementwise.

In [None]:
sqrt(c(4,9,16))

In [None]:
sin(c(0,pi/4,pi/2,3*pi/4,pi))  # or: sin( 0:4*pi/4 )

In [None]:
exp(1:5)

Missing data
========
* In many data sets, we often have some missing data, i.e., observations for which the values are missing.
* In R, missing values are denoted with `NA`.
* Any vector can contain missing values.

In [None]:
weights <- c(65, NA, 61)
names <- c("Can","Cem",NA)

Vector element names
===========
For readability, we can assign name labels to the elements of a data vector.

In [None]:
heights <- c(Can=1.70, Cem=1.75, Hande=1.62)
heights

In [None]:
weights <- c(Can=65, Cem=66, Hande=61)
weights

We can retrieve these names with the `names()` function.

In [None]:
names(heights)

We can assign names to the elements of a vector that already exists.

In [None]:
heights <- c(1.70, 1.75, 1.62)
names(heights) <- c("Can","Cem","Hande")
heights

If for some reason we want to remove the names, we use the `unname()` function.

In [None]:
unname(heights)

The original vector is not changed with this function call, because we did not assign the result to `heights`.

In [None]:
heights

Vector indexing
=========
We can access a single element of a vector by providing the index of the element in square brackets.

In [None]:
heights

In [None]:
heights[1]  # first element

In [None]:
heights[3] # third element

We can select a slice of the vector by providing a range inside brackets.

In [None]:
heights[1:2]  # select from element 1 to element 2, inclusive.

We can also give a vector consisting of element indices.

In [None]:
heights[c(1,3)]  # select elements 1 and 3.

The indices do not have to be in order:

In [None]:
heights[c(2,1,3)]

We can select the same element more than once.

In [None]:
heights[c(1,1,3,2,3)]

We can provide a Boolean (true/false) vector for indexing. This will select only elements with corresponding `TRUE` values.

In [None]:
heights

In [None]:
heights[c(T,F,F)]  # T is a shorthand for TRUE, F is for FALSE.

We can **exclude** elements using negative indices.

In [None]:
heights[-1]  # exclude first element.

In [None]:
heights[c(-1,-3)]  # exclude 1st and 3rd elements

## Pause to think

Suppose we define a four-element vector

`v <- c(3,6,2,-1)`.

Which of the following CANNOT be used to select the second and third elements of this vector?

* `v[2:3]`
* `v[c(2,3)]`
* `v[c(6,2)]`
* `v[c(F,T,T,F)]`
* `v[c(-1,-4)]`

Using names to select elements
=======================
If the elements are given names consisting of strings, we can use these names in brackets instead of indices.

In [None]:
heights["Can"]

In [None]:
heights[c("Can","Can","Hande","Cem","Hande")]

Modify element values in a vector
=================

In [None]:
heights

In [None]:
heights[1] <- 1.72

In [None]:
heights

In [None]:
heights[1] <- 1.70

Insert values to an existing vector
============
A vector's size is determined at its creation, and its elements are stored contiguously (side-by-side) in memory. Therefore it is really not possible to add or remove an element in a vector. However, we can reassign the identifier to a new one.

In [None]:
heights

In [None]:
heights <- c(heights[1:2], Lale=1.76, heights[3])

In [None]:
heights

Delete elements from vector
==========
Again, we cannot directly remove an element from an existing vector, but we can create a new vector without the element we want to delete, and reassign to the name.

In [None]:
heights

In [None]:
heights <- heights[-3]  # exclude element 3

In [None]:
heights

# Pause to think

Suppose we define a vector with

`v <- c(3,4,5)`

What is the output of the following commands?

    v <- c(5, v, 1:2)
    v <- v[-2]
    v[2:4]
    
* `2 3 4`
* `5 3 4 5 3 4`
* `4 5 3`
* `4 5 1`

Getting the length of a vector
==========
We can get the number of elements in a vector using the `length()` function.

In [None]:
length(heights)

In [None]:
length(10:17)

Vector filtering
===========

* Apply a Boolean function (e.g., greater than, less than, ...) to each element of the vector.
* Returns a Boolean vector according to the result on each element.

In [None]:
heights

In [None]:
heights > 1.65

Using this Boolean vector, we can select data points satisfying the condition.

In [None]:
tall_people <- heights>1.65
heights[tall_people]

Obviously, this can be done in a single line, too.

In [None]:
heights[heights>1.65]

One can also filter a vector according to another vector's values.

In [None]:
heights

In [None]:
weights

In [None]:
weights[ heights > 1.65 ]  # weights of people who are taller than 1.65

## Pause to think

Given the vectors with named values:

    ages <- c(Ali=18, Hasan=21, Fatma=18, Hande=22, Cem=21)
    weights <- c(Ali=75, Hasan=72, Fatma=60, Hande=56, Cem=67)

which of the following commands prints the weights of people who are 18 years old?

* `weights[ages==18]`
* `ages[weights]==18`
* `weights[names(ages==18)]`
* `names(weights[ages==18])`

Modify a vector by filtering
=========
* We can use filtering to selectively change only the elements that satisfy a condition.
* **Example**: For people who weigh more than 65 kg, decrease the weight by 1 kg.

In [None]:
weights

In [None]:
weights > 65

In [None]:
weights[weights > 65]

In [None]:
weights[weights > 65] <- weights[weights > 65] - 1
weights

In [None]:
weights[Cem] <- 66

Get indices of elements that satisfy a condition
===========
The `which()` function returns the indices (and labels, if available) of elements in a vector for which a Boolean function returns `TRUE`.

In [None]:
heights

In [None]:
heights > 1.65

In [None]:
which(heights > 1.65)

Using all() and any()
==========
* We use the `all()` function to check if **all** elements in a vector are `TRUE`.
* We use the `any()` function to check if **any one** of the elements in a vector are `TRUE`.

In [None]:
heights

In [None]:
heights > 1.70

In [None]:
all(heights > 1.60)

all(heights > 1.70)

any(heights > 1.70)

## Pause to think

Suppose a vector named `ages` holds the ages of a group who want to enter a museum. You want to make sure that there is at least one grownup among them. Which command do you use?

* any(ages > 18)
* all(ages > 18)
* any(ages < 18)
* all(ages < 18)

## Pause to think

Suppose a vector named `ages` holds the ages of a group who want to enter a bar. You want to make sure that everybody is of proper age to drink. Which command do you use?

* any(ages > 18)
* all(ages > 18)
* any(ages < 18)
* all(ages < 18)

# Generating vectors with repeated elements
The `rep()` function can be used to replicate values or vectors a specified number of times.

In [None]:
rep(3,10)

In [None]:
rep("abc",5)

In [None]:
rep(c(1,2,3),5)

In [None]:
rep(c(1,2,3),length.out=10)

Generating sequences with seq()
==========
The `seq()` function generates a vector of numbers in arithmetic progression. It is a generalization of the colon(`:`) operator.

In [None]:
seq(4,9)  # same as 4:9

In [None]:
seq(from=12, to=30, by=3)

In [None]:
seq(from=1.1, to=2, length.out=10)

Sorting a vector
=========

In [None]:
sort(heights)

In [None]:
sort(weights, decreasing = TRUE)

* Often we need to sort a vector according to the values of another vector.
* First we compute an _ordering_.

In [None]:
heights

In [None]:
order(heights)

Then we use this ordering with the other vector:

In [None]:
weights[order(heights)]  # return the weights of people ordered by their heights.

# Exercises

(1) Create and store a sequence of values from 5 to −11 that progresses in steps of 0.3.

-----
(2) Create and store a 20-element vector that contains, in any configuration, the following:

(a) A sequence of integers from 6 to 12 (inclusive)
    
(b) A threefold repetition of the value 5.3
    
(c) The number −3
    
(d) A sequence of nine values starting at 102 and ending at the number that is the total length of the vector created in (c).

-----
(3) A set of temperature measurements are given in Fahrenheit scale as follows:

    temperatures <- c(87, 89, 101, 91, 86, 71, 76)

Write an R expression that returns a vector of corresponding Celsius values.

--------
(4) Consider the following data

|Country|Area|Population|
|------|------|------|
|Russia|17,098,242|142,257,519|
|United States|9,833,517|326,625,791|
|China|9,596,960|1,379,302,771|                                    
|Brazil|8,515,770|207,353,391|                
|Australia|7,741,220|23,232,413|
|India|3,287,263|1,281,935,911|
|Turkey|783,562|80,845,215|
|France|643,801|67,106,161|
|Japan|377,915|126,451,398|
|United Kingdom|243,610|65,648,100|

(a) Create two vectors `area` and `population` that hold the data in the respective columns. Label the elements in each vector with the country name.

(b) Create a new vector called `density` that holds the population density of the countries.

(c) Print the names of countries sorted by population density, in descending order (from highest to lowest).