In [1]:
options(jupyter.rich_display = FALSE)

Data vectors in R
==========

* The fundamental data type in R is the _vector_.
* The single-number objects we have seen so far are one-element vectors.
* Vectors allow us to deal with data sets directly, instead of using separate variables for each instance.
    
        height1 <- 1.70
        weight1 <- 65
        bmi1 <- weight1 / height1^2
        height2 <- 1.75
        weight2 <- 66
        bmi2 <- weight2 / height2^2
* We'd rather create data vectors to hold the height and weight values.

Creating vectors
=========
The most general way to create data vectors is to use the `c()` function (_concatenate_).

In [2]:
heights <- c(1.70, 1.75, 1.62)
weights <- c(65, 66, 61)

In [3]:
heights

[1] 1.70 1.75 1.62

In [4]:
weights

[1] 65 66 61

Vectors can also be created with the _colon operator_ (:)

In [5]:
x <- 2:10
x

[1]  2  3  4  5  6  7  8  9 10

Extending vectors
==========
The function `c()` can also be used to add new elements to vectors.

First, suppose we have height data from only two people:

In [6]:
heights <- c(1.70, 1.75)

Now extend the vector with an additional data point.

In [7]:
heights <- c(heights, 1.62)

In [8]:
heights

[1] 1.70 1.75 1.62

Modes
=====

* R variable types are called _modes_.
* Modes include: "numeric", "character", "logical", "complex", and so on.
* All elements in a vector must be of the same mode.

In [9]:
mode(2)
mode("abc")
mode(TRUE)
mode(2+4i)

[1] "numeric"

[1] "character"

[1] "logical"

[1] "complex"

Missing data
========
* In many data sets, we often have some missing data, i.e., observations for which the values are missing.
* In R, missing values are denoted with `NA`.
* Any vector can contain missing values.

In [10]:
weights <- c(65, NA, 61)
names <- c("Can","Cem",NA)

Vector element names
===========
For readability, we can assign name labels to the elements of a data vector.

In [11]:
heights <- c(Can=1.70, Cem=1.75, Hande=1.62)
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [12]:
weights <- c(Can=65, Cem=66, Hande=61)
weights

  Can   Cem Hande 
   65    66    61 

We can retrieve these names with the `names()` function.

In [13]:
names(heights)

[1] "Can"   "Cem"   "Hande"

We can assign names to the elements of a vector that already exists.

In [1]:
heights <- c(1.70, 1.75, 1.62)
heights

In [2]:
names(heights) <- c("Can","Cem","Hande")
heights

If for some reason we want to remove the names, we use the `unname()` function.

In [16]:
unname(heights)

[1] 1.70 1.75 1.62

The original vector is not changed with this function call, because we did not assign the result to `heights`.

In [17]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

Vector indexing
=========
We can access a single element of a vector by providing the index of the element in square brackets.

In [18]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [19]:
heights[1]  # first element

Can 
1.7 

In [20]:
heights[3] # third element

Hande 
 1.62 

We can select a slice of the vector by providing a range inside brackets.

In [21]:
heights[1:2]  # select from element 1 to element 2, inclusive.

 Can  Cem 
1.70 1.75 

We can also give a vector consisting of element indices.

In [22]:
heights[c(1,3)]  # select elements 1 and 3.

  Can Hande 
 1.70  1.62 

The indices do not have to be in order:

In [23]:
heights[c(2,1,3)]

  Cem   Can Hande 
 1.75  1.70  1.62 

We can provide a Boolean (true/false) vector for indexing. This will select only elements with corresponding `TRUE` values.

In [24]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [9]:
heights[c(T,F)]  # T is a shorthand for TRUE, F is for FALSE.

We can select the same element more than once.

In [26]:
heights[c(1,1,3,2,3)]

  Can   Can Hande   Cem Hande 
 1.70  1.70  1.62  1.75  1.62 

We can **exclude** elements using negative indices.

In [27]:
heights[-1]  # exclude first element.

  Cem Hande 
 1.75  1.62 

In [28]:
heights[c(-1,-3)]  # exclude 1st and 3rd elements

 Cem 
1.75 

Using names to select elements
=======================
If the elements are given names consisting of strings, we can use these names in brackets instead of indices.

In [29]:
heights["Can"]

Can 
1.7 

In [30]:
heights[c("Can","Can","Hande","Cem","Hande")]

  Can   Can Hande   Cem Hande 
 1.70  1.70  1.62  1.75  1.62 

Modify element values in a vector
=================

In [31]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [32]:
heights[1] <- 1.72

In [33]:
heights

  Can   Cem Hande 
 1.72  1.75  1.62 

In [34]:
heights[1] <- 1.70

Insert values to an existing vector
============
A vector's size is determined at its creation, and its elements are stored contiguously (side-by-side) in memory. Therefore it is really not possible to add or remove an element in a vector. However, we can reassign the identifier to a new one.

In [35]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [36]:
heights <- c(heights[1:2], Lale=1.76, heights[3])

In [37]:
heights

  Can   Cem  Lale Hande 
 1.70  1.75  1.76  1.62 

Delete elements from vector
==========
Again, we cannot directly remove an element from an existing vector, but we can create a new vector without the element we want to delete, and reassign to the name.

In [38]:
heights

  Can   Cem  Lale Hande 
 1.70  1.75  1.76  1.62 

In [39]:
heights <- heights[-3]  # exclude element 3

In [40]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

Getting the length of a vector
==========
We can get the number of elements in a vector using the `length()` function.

In [41]:
10:17

[1] 10 11 12 13 14 15 16 17

In [42]:
length(10:17)

[1] 8

Data vector filtering
===========

* The idea is to apply a Boolean function (e.g., greater than, less than, ...) to each element of the vector.
* Returns a Boolean vector according to the result on each element.

In [43]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [44]:
heights > 1.65

  Can   Cem Hande 
 TRUE  TRUE FALSE 

Using this Boolean vector, we can select data points satisfying the condition.

In [45]:
tall_people <- heights>1.65
heights[tall_people]

 Can  Cem 
1.70 1.75 

Obviously, this can be done in a single line, too.

In [46]:
heights[heights>1.65]

 Can  Cem 
1.70 1.75 

One can also filter a vector according to another vector's values.

In [47]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [48]:
weights

  Can   Cem Hande 
   65    66    61 

In [49]:
weights[ heights > 1.65 ]  # weights of people who are taller than 1.65

Can Cem 
 65  66 

Modify a vector by filtering
=========
* We can use filtering to selectively change only the elements that satisfy a condition.
* **Example**: For people who weigh more than 65 kg, decrease the weight by 1 kg.

In [50]:
weights

  Can   Cem Hande 
   65    66    61 

In [51]:
weights[weights > 65] <- weights[weights > 65] - 1
weights

  Can   Cem Hande 
   65    65    61 

In [52]:
weights["Cem"] <- 66

Get indices of elements that satisfy a condition
===========
The `which()` function returns the indices (and labels, if available) of elements in a vector for which a Boolean function returns `TRUE`.

In [53]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [54]:
which(heights > 1.65)

Can Cem 
  1   2 

Using all() and any()
==========
* We use the `all()` function to check if **all** elements in a vector are `TRUE`.
* We use the `any()` function to check if **any one** of the elements in a vector are `TRUE`.

In [55]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [56]:
all(heights > 1.60)

all(heights > 1.70)

any(heights > 1.70)

[1] TRUE

[1] FALSE

[1] TRUE

Generating sequences with seq()
==========
The `seq()` function generates a vector of numbers in arithmetic progression. It is a generalization of the colon(`:`) operator.

In [57]:
seq(4,9)  # same as 4:9

[1] 4 5 6 7 8 9

In [58]:
seq(from=12, to=30, by=3)

[1] 12 15 18 21 24 27 30

In [59]:
seq(from=1.1, to=2, length.out=10)

 [1] 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Sorting a vector
=========

In [60]:
sort(heights)

Hande   Can   Cem 
 1.62  1.70  1.75 

In [61]:
sort(weights, decreasing = TRUE)

  Cem   Can Hande 
   66    65    61 

* Often we need to sort a vector according to the values of another vector.
* First we compute an _ordering_.

In [62]:
heights

  Can   Cem Hande 
 1.70  1.75  1.62 

In [63]:
order(heights)

[1] 3 1 2

Then we use this ordering with the other vector:

In [64]:
weights[order(heights)]  # return the weights of people ordered by their heights.

Hande   Can   Cem 
   61    65    66 