# PART 03 - Data structures and basic functions

In this part, you will learn:
* About commonly-used R data structures 
* Basic, frequently-used R functions 
* About categorical variables, i.e., factors


## Vectors

- One dimensional 
- Same mode (numeric, character, logical) 
- Scalars = single-element vectors
- as.vector(x)

In [1]:
c(1, 2, "3")

In [2]:
as.numeric(c(1, 2, "3"))

In [3]:
c(c(2, 3, 4), 3)

In [9]:
# create a vector using the constructor and list of values
a <- c(1,2,3,4)
# create a vector using the constructor and range
a <- c(1:4)
print("The whole vector")
a
print("Take the second element")
a[2]
print("Take elements from the second to the forth one")
a[2:4]
print("Take the first and the third element")
a[c(1,3)]
print("Take all the elements but the first one")
a[-1]
print("Take all the elements but the first and the second ones")
a[-c(1,2)]

[1] "The whole vector"


[1] "Take the second element"


[1] "Take elements from the second to the forth one"


[1] "Take the first and the third element"


[1] "Take all the elements but the first one"


[1] "Take all the elements but the first and the second ones"


In [10]:
a[length(a)]

## Matrix

- Two-dimensional arrays
- Same mode (numeric, character, logical) 


In [11]:
a <- matrix(1:20, nrow=5, ncol=4)
a

0,1,2,3
1,6,11,16
2,7,12,17
3,8,13,18
4,9,14,19
5,10,15,20


You can add names to rows and columns

In [12]:
colnames(a) <- c("a", "b", "c", "d")
rownames(a) <- c("x1", "x2", "x3", "x4", "x5")
a

Unnamed: 0,a,b,c,d
x1,1,6,11,16
x2,2,7,12,17
x3,3,8,13,18
x4,4,9,14,19
x5,5,10,15,20


In [14]:
print("Take the value in the first row and the second column")
a[1,2]
print("Take all the rows and the second column")
a[,2]
print("Take all the rows and all the columns but the first one")
a[,-1]
print("Take all the rows and the column with the name 'a'")
a[, "a"]

[1] "Take the value in the first row and the second column"


[1] "Take all the rows and the second column"


[1] "Take all the rows and all the columns but the first one"


Unnamed: 0,b,c,d
x1,6,11,16
x2,7,12,17
x3,8,13,18
x4,9,14,19
x5,10,15,20


[1] "Take all the rows and the column with the name 'a'"


In [15]:
c(1,2,3)
as.matrix(c(1,2,3))

0
1
2
3


## Arrays

- n-dimensional arrays
- Same type (numeric, character, logical) 

In [16]:
a <- array(1:24, dim=c(2,3,4))
a[1,1,1]

## Data frame

- Two-dimensional arrays
- ***Different modes in different columns***

In [19]:
a <- data.frame(a=c(1,2,3), b=c('john', 'marry', "anna"), c=c(T,T,F))
a

a,b,c
1,john,True
2,marry,True
3,anna,False


In [20]:
# Similar to the matrix, please not another possiblity to take a column by using $
a[1,2] 
a[2, "b"]
a[,c("b", "c")]
a$c
a[, "c"]

b,c
john,True
marry,True
anna,False


In [21]:
colnames(a)

In [22]:
colnames(a) <- c("c1", "c2", "c3")
a

c1,c2,c3
1,john,True
2,marry,True
3,anna,False


In [23]:
row.names(a)

## List

Ordered collection of objects

In [24]:
a <- list(22, c(2, 3, 4), data.frame(a=c(1,2,3), b=c(1,2,3)))
a

a,b
1,1
2,2
3,3


In [25]:
a[3]

a,b
1,1
2,2
3,3


In [26]:
# If you want to access an element in the list use double [[]] - this one will FAIL
a[3][1,1]

ERROR: Error in a[3][1, 1]: incorrect number of dimensions


In [27]:
# This is OK
a[[3]][1,1]

In [28]:
a['my_key'] <- 1223
a

a,b
1,1
2,2
3,3


## Factors - nominal and ordinal variables

In [29]:
a <- c("a", "b", "c", "a")
a

In [30]:
b <- factor(a)
b


In [31]:
levels(b)

In [32]:
as.numeric(b)

In [33]:
as.character(b)

In [34]:
scale <- factor(c('low', 'medium', 'low', 'high'), 
                levels=c('low', 'medium', 'high'), ordered=T)
scale

In [35]:
"high" > "medium"

In [36]:
scale[4] > scale[2]

In [37]:
a <- factor(c("apple", "apple", "plum"), 
            levels=c("apple", "plum", "cherry"), ordered=T)

In [38]:
levels(a)

In [39]:
a

## Basic functions

- summary (summarizes the object)
- length (Number of elements/components)
- dim (Dimensions of an object)
- str (Structure of an object)
- class (Class or type of an object)
- names (Names of components in an object)
- cbind (Combines objects as columns)
- rbind (Combines objects as rows)
- c (creates a vector)
- head(data)
- tail(data)

In [40]:
a <- data.frame(a=c(1,2,3), b=c('john', 'marry', "anna"), c=c(T,T,F))
summary(a)

       a           b         c          
 Min.   :1.0   anna :1   Mode :logical  
 1st Qu.:1.5   john :1   FALSE:1        
 Median :2.0   marry:1   TRUE :2        
 Mean   :2.0                            
 3rd Qu.:2.5                            
 Max.   :3.0                            

In [41]:
str(a)

'data.frame':	3 obs. of  3 variables:
 $ a: num  1 2 3
 $ b: Factor w/ 3 levels "anna","john",..: 2 3 1
 $ c: logi  TRUE TRUE FALSE


In [42]:
a$b <- as.character(a$b)

In [50]:
head(a, 2)

a,b,c
1,john,True
2,marry,True


In [49]:
tail(a, 2)

Unnamed: 0,a,b,c
2,2,marry,True
3,3,anna,False


This is how you can filter values - look at this and see if you understand how does it work.

In [51]:
x <- c(1,2,3,4,NA,4, NA, 5)
x[!is.na(x)]

In [52]:
!is.na(x)

Yes, R is able to perform vector operations.

In [53]:
a <- c(1,2,3)
b <- c(1,2,3)
a + b
a * b
a * 2