# Creating Vectors

In [1]:
x <- c(0.5, 0.6) # numeric
y <- c(TRUE, FALSE) ## Logical
z <- c("a", "b", "c") #Character


## Explicit Coercion


In [5]:
x <- 0:6
class(x)

In [6]:
as.numeric(x)

In [7]:
as.logical(x)

In [8]:
as.character(x)

## Matrices

In [9]:
m <- matrix(nrow=2, ncol=3)
m

0,1,2
,,
,,


In [10]:
dim(m)

In [11]:
attributes(m)

Matrices are constructed columnwise

m <- matrix(1:6, nrow=2, ncol=3)
m

matrix can also be constructed from Vectors using the dim() command

In [13]:
m <- 1:10
m

In [15]:
dim(m) <- c(2,5)
m

0,1,2,3,4
1,3,5,7,9
2,4,6,8,10


matrices can be created using column binding and row binding

In [16]:
x <- 1:3
y <- 10:12
cbind(x,y)

x,y
1,10
2,11
3,12


In [17]:
rbind(x,y)

0,1,2,3
x,1,2,3
y,10,11,12


## Factors

Factors are used to represent categorical data and can be unordered or ordered. One can think of
a factor as an integer vector where each integer has a label. Factors are important in statistical
modeling and are treated specially by modelling functions like lm() and glm().
Using factors with labels is better than using integers because factors are self-describing. Having a
variable that has values “Male” and “Female” is better than a variable that has values 1 and 2.
Factor objects can be created with the factor() function.

In [18]:
x <- factor(c("Yes","Yes","No","Yes","No"))
x

In [19]:
levels(x)

In [21]:
table(x)

x
 No Yes 
  2   3 

Often factors will be automatically created for you when you read a dataset in using a function like
read.table(). Those functions often default to creating factors when they encounter data that look
like characters or strings.
The order of the levels of a factor can be set using the levels argument to factor(). This can be
important in linear modelling because the first level is used as the baseline level.

In [23]:
x <- factor(c("Yes","Yes","No","Yes","No"), levels=c("Yes","No"))
levels(x)

## Missing Values

Missing values are denoted by NA or NaN for q undefined mathematical operations.
 is.na() is used to test objects if they are NA
• is.nan() is used to test for NaN
• NA values have a class also, so there are integer NA, character NA, etc.
• A NaN value is also NA but the converse is not true

In [24]:
# create a vector with NAs in it
x <- c(1,2,NA,10,3)
# return logical vector indicating which elements are NA
is.na(x)


## Data Frames

In [25]:
x <- data.frame(f00=1:4, bar=c(T,T,F,F))
x

f00,bar
1,True
2,True
3,False
4,False


In [26]:
nrow(x)
ncol(x)

## Names

In [27]:
x <- 1:3
names(x)

NULL

In [28]:
names(x) <- c("New York", "Seattle", "Los Angeles")
x

## Reading Data Files with read.table()

## Subsetting a Vector

In [30]:
x <- c("a","b","c","c","d","a")
x[1] # first element
x[2]  # second element

In [31]:
x[1:4]  # first to 4th element

In [33]:
u <- x > "a"
u

## Removing NA values

In [34]:
x <- c(1, 2, NA, 4, NA, 5)
bad <- is.na(x)
print(bad)

[1] FALSE FALSE  TRUE FALSE  TRUE FALSE


In [35]:
x[!bad]

In [36]:
head(airquality)

Ozone,Solar.R,Wind,Temp,Month,Day
41.0,190.0,7.4,67,5,1
36.0,118.0,8.0,72,5,2
12.0,149.0,12.6,74,5,3
18.0,313.0,11.5,62,5,4
,,14.3,56,5,5
28.0,,14.9,66,5,6


In [38]:
# we can use complete.cases on dataframes to eliminate missing values which returns logicals
good <- complete.cases(airquality)
head(airquality[good,])

Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
1,41,190,7.4,67,5,1
2,36,118,8.0,72,5,2
3,12,149,12.6,74,5,3
4,18,313,11.5,62,5,4
7,23,299,8.6,65,5,7
8,19,99,13.8,59,5,8


## Vectorized operations

Many operations in R are vectorized, meaning that operations occur in parallel in certain R objects.
This allows you to write code that is efficient, concise, and easier to read than in non-vectorized
languages.

In [40]:
x <- 1:4
y <- 6:9
z <- x+y
z

In [41]:
x > 2

## Managing DataFrames with the DPLYR package