# Comparing vectors or factors with NA
Credits: http://www.cookbook-r.com/ (Creative Commons Attribution-Share Alike 3.0 Unported License)

## Problem
You want to compare two vectors or factors but want comparisons with NA’s to be reported as TRUE or FALSE (instead of NA).

## Solution
Suppose you have this data frame with two columns which consist of boolean vectors:

In [1]:
df <- data.frame( a=c(TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,NA,NA,NA),
                  b=c(TRUE,FALSE,NA,TRUE,FALSE,NA,TRUE,FALSE,NA))
df

Unnamed: 0,a,b
1,True,True
2,True,False
3,True,
4,False,True
5,False,False
6,False,
7,,True
8,,False
9,,


Normally, when you compare two vectors or factors containing NA values, the vector of results will have NAs where either of the original items was NA. Depending on your purposes, this may or not be desirable.

In [3]:
df$a == df$b

In [4]:
# The same comparison, but presented as another column in the data frame:
data.frame(df, isSame = (df$a==df$b))

Unnamed: 0,a,b,isSame
1,True,True,True
2,True,False,False
3,True,,
4,False,True,False
5,False,False,True
6,False,,
7,,True,
8,,False,
9,,,


### A function for comparing with NA’s
This comparison function will essentially treat NA’s as just another value. If an item in both vectors is NA, then it reports TRUE for that item; if the item is NA in just one vector, it reports FALSE; all other comparisons (between non-NA items) behaves the same.

In [5]:
# This function returns TRUE wherever elements are the same, including NA's,
# and FALSE everywhere else.
compareNA <- function(v1,v2) {
    same <- (v1 == v2) | (is.na(v1) & is.na(v2))
    same[is.na(same)] <- FALSE
    return(same)
}

### Examples of the function in use
Comparing boolean vectors:

In [6]:
compareNA(df$a, df$b)

In [7]:
# Same comparison, presented as another column
data.frame(df, isSame = compareNA(df$a,df$b))

Unnamed: 0,a,b,isSame
1,True,True,True
2,True,False,False
3,True,,False
4,False,True,False
5,False,False,True
6,False,,False
7,,True,False
8,,False,False
9,,,True


It also works with factors, even if the levels of the factors are in different orders:

In [8]:
# Create sample data frame with factors.
df1 <- data.frame(a = factor(c('x','x','x','y','y','y', NA, NA, NA)),
                  b = factor(c('x','y', NA,'x','y', NA,'x','y', NA)))


In [9]:
# Do the comparison
data.frame(df1, isSame = compareNA(df1$a, df1$b))

Unnamed: 0,a,b,isSame
1,x,x,True
2,x,y,False
3,x,,False
4,y,x,False
5,y,y,True
6,y,,False
7,,x,False
8,,y,False
9,,,True


In [10]:
# It still works if the factor levels are arranged in a different order
df1$b <- factor(df1$b, levels=c('y','x'))
data.frame(df1, isSame = compareNA(df1$a, df1$b))

Unnamed: 0,a,b,isSame
1,x,x,True
2,x,y,False
3,x,,False
4,y,x,False
5,y,y,True
6,y,,False
7,,x,False
8,,y,False
9,,,True
