# Manipulating Vectors

Now we come to one of the most important manipulations you'll need to know how to do with vectors: subsetting! 

Extracting a subset of elements from a vector is an extremely important task, not least because it generalizes nicely to datasets (which are at the heart of data science). This process --- whether applied to a vector or a dataset --- is often referred to as "taking a subset", "subsetting", or "filtering". If there is one skill you need to master as quickly as possible, it's this.

Subsetting can be accomplished in several ways, but we'll focus on the two most powerful: 

- By index
- With logical vectors (remember I said logicals would be important? :))
- Subsetting by name


## Subsetting By Index

As you've probably already realized, vectors don't just contain a jumble of data -- they also have a concept of "order". When I create a vector with `c(42, 47, -1)`, I have in mind that 42 is the first entry, 47 is the second, and -1 is the third. And we can use that concept of order to subset vectors by passing the index (order number) of an entry we want to our vector in square brackets:

In [1]:
a <- c(42, 47, -1)
a[2]

Note the use of brackets, `[]` --- this is common when filtering, and we'll use it a lot!

But of course, because everything in R is a vector, if I can pass a single index, then I can pass any other numeric vector of indices, either directly:

In [2]:
a[c(1, 3)]

Or as a variable:

In [3]:
subset = c(1, 3)
a[subset]

Also, you don't have to subset entries in order! If you pass indices out of order, you'll get a vector with a new order!

In [4]:
a[c(3, 1, 2)]

## Subsetting with Logicals

Subsetting with logicals is a little hard to explain, so instead let's jump right into an example. 

Suppose we have a character vector with only two elements ("apple" and "banana"). Subsetting it to "apple" could be done by passing a logical vector as follows:

In [5]:
fruits <- c("apple", "banana")
fruits[c(TRUE, FALSE)]

Within these brackets is a vector with the same number of logical elements as there are elements in the vector you want to subset. Elements across the two vectors are matched by order: elements that match with `TRUE` are kept while elements that match with `FALSE` are dropped.

This process is extremely useful when combined with a *logical operation* to combine multiple conditions. For example, you can use:

- the logical "and" (written `&` in R) to say "only be true if both conditions are true", 
- the logical "or" (written `|`) to say "be true if at least one of these conditions is true", or

For example, using a logical operation we can filter a large vector of oranges, apples and bananas:

In [6]:
# Create a vector with 30 fruits 
fruits <- rep(c("orange", "apple", "banana"), 10)
fruits 


In [7]:
# Create a logical vector for dropping bananas

orange_or_apple <- fruits == "orange" | fruits == "apple" # True if orange or apple
not_banana <- fruits != "banana"                            # != means true if not equal
orange_or_apple2 <- fruits %in% c("orange", "apple")

# Carry out the subset
fruits[orange_or_apple]

In [8]:
fruits[not_banana]

In [9]:
fruits[orange_or_apple2]

We applied the same logic as above: We have a vector (`fruits`) that
we want to subset. We do so using a logical vector (`orange_or_apple`, `not_banana`, and `orange_or_apple2`), where elements that match with `TRUE` are kept. The only difference here is that we create the logical vector with a logical operation. The logical operators (e.g., `!=`, `|`) used here are discussed in the link above, with the exception of `%in%`. 

<div class="general-note">

<strong> General note about `%in%`: </strong> This operator is
extremely useful as an alternative for repeated "or" (`|`) statements. For example, say you have a vector with 10 types of fruits and you want to keep elements that are equal to "orange", "apple", "mango",
"mandarin", or "kiwi". You could accomplish this by creating a logical vector like so: `lv <- fruits == "orange" | fruits == "apple" | fruits == "mango" | fruits == "mandarin" | fruits == "kiwi"`.  

<br> What a nighmarishly long statement compared to the `%in%` option that accomplishes the exact same thing: `lv <- fruits %in% c("orange", "apple", "mango", "mandarin", "kiwi")`.

</div>

Of course, subsetting using logicals can also be done on numeric vectors.

Here are a few examples:

In [10]:
# Create a numeric vector
numbers <- seq(0, 100, by = 10)
numbers


In [11]:
# Illustrate three different filters
numbers[numbers <= 50 & numbers != 30]

In [12]:
numbers[numbers == 0 | numbers == 100]

In [13]:
numbers[numbers > 100] #returns an empty vector

## Subsetting by Name

In R, it's possible to assign names to a vector, then use those names to subset entries. To illustrate, let me tell you about the zoo I wished I owned. It's a small zoo, but a happy one, and here was our imagined attendance last week:

| Day of Week | Attendees  |
|-------------|------------|
| Monday      | 132 people |
| Tuesday     | 94 people  |
| Wednesday   | 112 people |
| Thursday    | 84 people  |
| Friday      | 254 people |
| Saturday    | 322 people |
| Sunday      | 472 people |

I can now represent this as a vector by putting attendance in the vector, and then labeling entries by the day of the week:

In [14]:
attendance = c(132, 94, 112, 84, 254, 322, 472)
attendance

In [15]:
names(attendance) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
attendance

Now if I wanted to just get attendance from the weekend I could subset with a vector of names:

In [16]:
attendance[c("Saturday", "Sunday")]

## Using Subsetting to Modifying Vectors

The subsetting logic from above can be used to modify vectors. The
idea here is that instead of keeping elements that meet a logical
condition or occur at a specific index, we can change them. For example,
what if we had mis-entered grandpa's age above? We can fix it using indexing,
a logical statement, or naming. 

In [17]:
# Recreate vector with age values
age <- c(50, 55, 80)

# Three ways of changing grandpa's age
# Note: you'd only need to use one of these
age[age == 80] <- 82 # using a logical statement
age[2] <- 45         # using indexing
age

A logical statement is most efficient when we need to change a lot
of elements.

In [18]:
fruits <- rep(c("orange", "apple", "bamama"), 5) 
fruits #bamamas anyone? 

In [19]:
# Let's fix the misspelled element
fruits[fruits == "bamama"] <- "banana"
fruits

<div style="margin-top: 15px"> </div>

## Exercises

Create a vector that represents the age of at least four different family
members or friends. You can name it whatever you want.

1. What is the mean age of the people in your vector? Find out in two ways,
with and without using the `mean()` command.

2. How old is the youngest person in your vector? (Use an R command to find out.)

3. What is the age gap between the youngest person and the oldest person in your vector?
(Again use R to find out, and try to be as general as possible in the sense that
your code should work even if the elements in your vector, or their order, change.)

4. How many people in your vector are above age 25? (Again, try to make your code
work even in the case that your vector changes.)

5. Replace the age of the oldest person in your vector with the age of someone
else you know.

6. Create a new vector that indicates how old each person in your vector
will be in 10 years.

7. Create a new vector that indicates what year each person in your vector
will turn 100 years old.

8. Create a new vector with a random sample of 3 individuals from your
original vector. What is the mean age of the people in this new
vector?
