# Modifying values

## Retrieving and modifying values by position (index)

### in vectors

First, a little review.

In [None]:
# let's put values into a vector
my_vec <- c(1, 2, 3, 4, 5)

In [None]:
# retrieve a value by index
my_vec[1]

In [None]:
# retrieve multiple values by index
my_vec[2:5]
my_vec[-1]

It is, maybe, unsurprising that we can overwrite values in a vector, using their indices:

In [None]:
my_vec # printing for reference

my_vec[2] <- 3
my_vec

In [None]:
my_vec # printing for reference

my_vec[c(3, 4, 5)] <- c(5, 7, 9)
my_vec

In [None]:
my_vec # printing for reference

my_vec[1:5] <- c(5, 4, 3, 2, 1)
my_vec

This is all very reasonable. We're happy with this. It works like you'd expect, probably. 

Thing is, you can also just ... add stuff to a vector, without using any special syntax or anything. (This is _wildly_ uncomfortable for people coming from most other programming languages. I'm still mad.)

In [None]:
#print it for reference
my_vec

#print its length
length(my_vec)

# this works in R, but not most other places
my_vec[6] <- 42
my_vec

Real talk? If you can bring yourself to do that, without ruining how you approach any other language, I say go for it! I am a more cautious programmer than that, spending more of my time in other languages that don't allow this type of chicanery, so here's how I accomplish the same thing:

In [None]:
my_vec
my_vec <- c(my_vec, 43)
my_vec

In [None]:
# don't miscount
my_vec

my_vec[13] <- 13
my_vec

#### (and erasing)

In [None]:
# REMOVING values from a vector
my_vec  <- my_vec[-(7:12)]
#same
#my_vec <- my_vec[c(1:6,13)]
my_vec

It's also worth pointing out that R repeats values where needed, even when we're just dealing with a subset of a vector.

In [None]:
# putting two values into 4 spots, with repetition!
my_vec[1:4] <- c(1, 42)
my_vec

# adding 1 to each of 3 different values
my_vec[c(1, 3, 5)] <- my_vec[c(1, 3, 5)] + 41
my_vec

### Retrieving and modifying values by index/position in a dataframe

This is, admittedly, not much of a thing. You're not usually going to work on columns, at least, by position, because usually columns have names, which is very convenient. Rows are somewhat less often named, so those you deal with positionally, maybe slightly(?) more often. Still, let's look at it, yeah?

In [None]:
# first, let's get our dataframe
clp_wifi <- read.csv("clp_wifi.csv", stringsAsFactors=FALSE)
head(clp_wifi)

In [None]:
# get the first row
clp_wifi[1, ]

In [None]:
# get the first column
head(clp_wifi[ , 1])

In [None]:
clp_wifi_copy <- clp_wifi

# we can replace the IDs with the row numbers if we want
clp_wifi_copy[ , 1] <- 1:length(clp_wifi[ ,1])

head(clp_wifi_copy)
tail(clp_wifi_copy)

In [None]:
# say we found an error in the wifi minutes for Allegheny Library, in March 2016
# March 2016 is row 3; wifiMinutes is column 6
clp_wifi[3, 6] # shows initial value

clp_wifi[3, 6] <- 424242
head(clp_wifi)

### Using column names for retrieval and value modification

Named columns make it a lot easier to do things with our data, honestly.

In [None]:
# get a column by name, two different ways
vector_of_names <- clp_wifi$Name
vector_of_names2 <- clp_wifi[ , "Name"]

head(vector_of_names)
head(vector_of_names2)

In [None]:
# just making a copy to mess with
clp_wifi_copy <- clp_wifi

# getting a vector the same length as our column is high
column_length_vector <- 1:length(clp_wifi$CLPID)

# and then we can just place it in there
clp_wifi_copy$CLPID <- column_length_vector

# could have done this with a for loop
# for (i in 1:length(clp_wifi$CLPID)) {
#     clp_wifi_copy$CLPID[i] = i
# }

head(clp_wifi_copy)
tail(clp_wifi_copy)

Sometimes, you'll mix name and number to get to things you want to change, and that's fine

In [None]:
# modify some rows in a named column
clp_wifi_copy$Name[c(1, 3, 5)] <- "VERY GOOD LIBRARY"
head(clp_wifi_copy)

#### (and erasing a column)

In [None]:
# want to erase a column? EASY!
clp_wifi_copy$CLPID <- NULL
head(clp_wifi_copy)

#### (and adding a column)

(Yes, I'm mad about this, too, but it seems a little less likely to ruin a programmer's day than the thing where you can just keep writing past the end of a vector.)

In [None]:
# let's make a copy of the original clp_wifi CLPID column
our_col <- clp_wifi$CLPID

# now we can just add it to the frame we removed it from
# (I changed its name just to keep this all straight)
clp_wifi_copy$CLPID2 <- our_col
head(clp_wifi_copy)

## Logic!

Our comparison operators:
* `<`
* `<=`
* `>`
* `>=`
* `==`
* `!=`
* `%in%`

All but the last one do exactly what you'd expect. Mostly. One note: **types get coerced for comparison,** so if you check to see if `1 == "1"` you might get a bit of a surprise. (Gross. I know.)

In [None]:
one <- 1
also_one <- "1"

one == also_one

Anyway, as we were saying. Not a ton of surprises in the first set of logical comparison operators:

In [None]:
sma <- 1 #create a small number
med <- 3 #create a medium number
big <- 5 #create a big number

sma < med #is sma smaller than med?

sma < big #is sma smaller than big?

In [None]:
# just copying these down so we don't have to remember
sma <- 1 
med <- 3 
big <- 5 

big < med #is big smaller than med? - of course not

big < sma #is big smaller than sma? 

In [None]:
# just copying these down so we don't have to remember
sma <- 1 
med <- 3 
big <- 5 

big >= med #is big greater than or equal to medium?

big >= 5 #is big greater than or equal to 5?

In [None]:
# just copying these down so we don't have to remember
sma <- 1 
med <- 3 
big <- 5 

med == 3 #is med equal to 3?

med != 3 #is medium not equal to 3

_Possibly_ unsurprising (you may have already guessed this) fact about the logical operators above (everything except `%in%`): they work element-wise, so if you do `a_vector == "value"` you'll get a vector of TRUE/FALSE values of the same length as `a_vector`:

In [None]:
my_vec <- 1:6
my_vec
my_vec == 3
my_vec < 5
my_vec >= 4

This is super useful, and we will come back to it in a moment.

In [None]:
# ok and also
my_vec == 1:6

# plus
my_vec == 1:3
my_vec == 4:6

Now let's talk about `%in%`.

It works a bit like Python's `in`, if that helps you. Both of these patterns are valid:

```R
one_thing %in% vector_of_many_things   # TRUE if the thing is anywhere in the vector
multiple_things %in% vector_of_many_things  # returns length(multiple_things) vector of TRUEs & FALSEs
```

In [None]:
"CLP01" %in% clp_wifi$CLPID
"SQUIRREL HILL LIBRARY" %in% clp_wifi$Name
"DATA, WOO" %in% clp_wifi$Name

#OK but...
"SQUIRREL" %in% clp_wifi$Name

In [None]:
c(1, 11, 15) %in% clp_wifi$Month

### Boolean operators

We can combine logical statements. It's fun!

* `&` - and
* `|` - or
* `xor` - exclusive or - one or the other, not both
* `!` - not 
* `any` - one or more of
* `all` - every single one of

If you've never seen `xor` before, it's the "or" that parents of small children mean when they ask, "would you like a piece of chocolate or a cookie?" 

In [None]:
# unsurprising?

# and
TRUE & TRUE  # true
TRUE & FALSE # false
FALSE & FALSE # false

# or
TRUE | TRUE # true
TRUE | FALSE # true
FALSE | FALSE # false

# not
!TRUE  #false
!FALSE #true

In [None]:
# xor
xor(TRUE, FALSE) # true
xor(FALSE, FALSE) #false

xor(TRUE, TRUE) #FALSE! OMG!

In [None]:
# any vs. all
any(FALSE, FALSE, FALSE, TRUE) # true
any(FALSE, FALSE, FALSE, FALSE) #false

all(TRUE, TRUE, TRUE, FALSE) #false
all(TRUE, TRUE, TRUE, TRUE) #true

# "TRUE" and "FALSE" no longer look like real words, sorry, I know

In [None]:
# let's use some numerical examples, yes?
# just copying these down so we don't have to remember
sma <- 1 
med <- 3 
big <- 5 

!(med == 3)
!!med == 3

big > sma & med > sma
big == 5 & med == 4
big == 5 | med == 4

xor(big == 5, med == 4)

In [None]:
# how we actually use any/all:
days_of_the_week <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")

some_days <- c("Monday", "Wednesday", "Thursday")

some_days %in% days_of_the_week

all(some_days %in% days_of_the_week)

In [None]:
# we also have isTRUE()

isTRUE(med == 3)
isTRUE(TRUE)
isTRUE(FALSE)
isTRUE(3)

## Logical subsetting

Perhaps you remember, from last week, when I said "we're going to skip over this until it gets useful," about that whole "you can index a vector with logical values" issue?

It's useful now.

You can pull items out of a vector (or, therefore, the column of a dataframe) with a vector of the same length, full of `TRUE`s and `FALSE`s:

In [None]:
tf_vec <- c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)
num_vec <- 1:6

num_vec[tf_vec]

You have, in fact, already done this, even if you weren't really thinking of it that way.

In [None]:
# let's get all of the January 2017 data for the library system

# a t/f vector - only true if month == 1 and year == 2017
jan_2017 <- clp_wifi$Year == 2017 & clp_wifi$Month == 1

# all the data
clp_wifi[jan_2017, ]

# or just the minutes of wifi
clp_wifi$WifiMinutes[jan_2017]

In [None]:
# let's sum up the minutes of wifi used each year at CLP
years <- unique(clp_wifi$Year) # 2016, 2017, 2018

# empty vector for now
wifi_by_year <- vector()

# length(years) is just 3, in this case
for (i in 1:length(years)) {
    # make a vector of trues and falses corresponding to each year
    tf_year_i <- clp_wifi$Year == years[i]
    # pass that true/false vector in to pull out the rows we want from the minutes column
    wifi_i <- sum(clp_wifi$WifiMinutes[tf_year_i])
    wifi_by_year <- c(wifi_by_year, wifi_i)
}
# put the year name with the corresponding sum
names(wifi_by_year) <- years
wifi_by_year

OK, honestly, I'm feeling uninspired at this late hour, and I'd really rather answer the questions you have at this point (even if they're "ok, now show us some of this with the avocado data set") than contrive more examples ahead of time. Let's chat.