In [1]:
options(jupyter.rich_display = FALSE)

We've seen that every element of a vector in R must have the same mode (number, character, etc.). This has the advantage of efficiency, and provides fast computations. However, we cannot store more complicated data structures, such as

    Name: Fatma
    Salary: 5624.25
    Full time: yes

_Lists_ provide a generalization of the basic vector structure. A list's elements can be of different types, allowing for more complicated data representations. They form the bridge between vectors and _data frames_, which we'll see later.

Creating lists
===
In the simplest form, a list can be created with the `list()` function call.

In [2]:
ftm <- list("Fatma", 5624.25, TRUE)

We can display the contents of the list we've created.

In [3]:
ftm

[[1]]
[1] "Fatma"

[[2]]
[1] 5624.25

[[3]]
[1] TRUE


In this form we see the elements together with their numeric indices in the list. We can access them separately, if we want.

In [4]:
ftm[[1]]

[1] "Fatma"

In [5]:
ftm[[2]]

[1] 5624.25

Note that we use double brackets `[[]]` to refer to a list element, not single brackets `[]` as in vectors.

However, the better practice is to use component names (called _tags_) in lists.

In [6]:
ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)

In [7]:
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE


A list can also contain multiple-element vectors and lists as elements.

In [8]:
list(1, c(2,3), list("abc",4))

[[1]]
[1] 1

[[2]]
[1] 2 3

[[3]]
[[3]][[1]]
[1] "abc"

[[3]][[2]]
[1] 4



List indexing
===
When a list is created with tags, tags can be used to refer to list elements, using the `listname$tag` syntax.

In [9]:
ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
ftm$name

[1] "Fatma"

In [10]:
ftm[[1]]

[1] "Fatma"

In [11]:
ftm$fulltime

[1] TRUE

Alternatively, the tag can be used with the double bracket syntax.

In [12]:
ftm[["salary"]]

[1] 5624.25

Numeric indices are always available.

In [13]:
ftm[[2]]

[1] 5624.25

Selecting a range of indices
----
A range of indices can be selected using the familiar vector syntax _with a single bracket_.

In [14]:
ftm[1:2]

$name
[1] "Fatma"

$salary
[1] 5624.25


In [15]:
ftm[c(1,3)]

$name
[1] "Fatma"

$fulltime
[1] TRUE


However, when we use double brackets, we can only give single indices.

In [16]:
ftm[[1:2]]

ERROR: Error in ftm[[1:2]]: subscript out of bounds


Difference between indexing with single and double brackets
----
The availability of two types of brackets for list indexing can be confusing. They can be distinguished by their return types:

* `[]` returns a list with a single component
* `[[]]` returns a single component.

In [17]:
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE


In [18]:
ftm[1]  # returns a list with a single component.

$name
[1] "Fatma"


In [19]:
ftm[[1]]  # returns a one-element vector

[1] "Fatma"

In [20]:
mode(ftm[1])
mode(ftm[[1]])

[1] "list"

[1] "character"

Adding and deleting list elements
====

Adding elements
---
You don't have to provide all the elements when creating the list. It is possible to add them later.

In [21]:
ftm <- list(name="Fatma", salary=5624.25)
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25


In [22]:
ftm$fulltime <- TRUE
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE


New list elements can also be added via vector indices.

In [23]:
ftm[[4]] <- 28
ftm[5:7] <- c(FALSE,TRUE,TRUE)
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE

[[4]]
[1] 28

[[5]]
[1] FALSE

[[6]]
[1] TRUE

[[7]]
[1] TRUE


This last example also shows that a list can have both tagged and untagged elements.

Delete elements
----
You can delete an element by setting it to `NULL`.

In [24]:
ftm$fulltime <- NULL
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

[[3]]
[1] 28

[[4]]
[1] FALSE

[[5]]
[1] TRUE

[[6]]
[1] TRUE


Note that after deletion, all elements below the deleted one are moved up and indices are decreased by one.

In [25]:
ftm[[3]] <- NULL
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

[[3]]
[1] FALSE

[[4]]
[1] TRUE

[[5]]
[1] TRUE


In [26]:
ftm[3:5] <- NULL
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25


Concatenating lists
----

In [27]:
c( list("abc", 32, T), list(5.1))

[[1]]
[1] "abc"

[[2]]
[1] 32

[[3]]
[1] TRUE

[[4]]
[1] 5.1


In [28]:
c(list(name="Fatma", salary=5624.25, fulltime=TRUE), list(hobby="painting"))

$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE

$hobby
[1] "painting"


Getting information on lists
===
To get the number of elements in a list, we can use the `length()` function.

In [29]:
ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
length(ftm)

[1] 3

To get the tags in a list, we use the `names()` function.

In [30]:
names(ftm)

[1] "name"     "salary"   "fulltime"

To obtain the values as a vector, we can use the `unlist()` function.

In [31]:
unlist(ftm)

     name    salary  fulltime 
  "Fatma" "5624.25"    "TRUE" 

Note that this function returns a vector, and the numeric and the Boolean values are converted to strings. The reason is that in a vector every element must be of the same type, and strings are the only common denominator here.
Applying functions to lists
===

The `lapply()` function applies a function to each element of a list, and returns the results as a list.

In [32]:
lapply(list(2,3.5,4), sqrt)

[[1]]
[1] 1.414214

[[2]]
[1] 1.870829

[[3]]
[1] 2


List elements can be tagged as well.

In [34]:
grades_1 <- c(10,12,11,14,8,12)
grades_2 <- c(13,11,10,11,9)
allgrades <- list(section1=grades_1, section2=grades_2)
allgrades

$section1
[1] 10 12 11 14  8 12

$section2
[1] 13 11 10 11  9


In [35]:
mean(grades_1)

[1] 11.16667

In [37]:
mean(allgrades$section1)

[1] 11.16667

In [40]:
mean(allgrades[1])

“argument is not numeric or logical: returning NA”

[1] NA

In [41]:
lapply(allgrades, mean)

$section1
[1] 11.16667

$section2
[1] 10.8


The `sapply()` (simple apply) function returns a vector or a matrix resulting from the application of the function.

In [42]:
sapply(allgrades, mean)

section1 section2 
11.16667 10.80000 

In [45]:
mode(sapply(allgrades, mean))

[1] "numeric"

We can define our own functions to specify what to do with each element.

In [46]:
mult_by2 <- function(x) {2*x}

In [47]:
lapply( list(1,2,3,4,5), mult_by2)

[[1]]
[1] 2

[[2]]
[1] 4

[[3]]
[1] 6

[[4]]
[1] 8

[[5]]
[1] 10


Examples
===
Calculate weekly payrolls
---
Set up a list of staff members, where each element is a list consisting of names, wages and number of hours worked.

In [48]:
staff <- list(
    id000=list(name="Fatma", wage=12.5, hours=20),
    id001=list(name="Ekrem", wage=11.7, hours=30),
    id002=list(name="Deniz", wage=13.3, hours=25)
)

Define a function that takes one person as defined above, and returns the weekly pay.

In [61]:
payroll <- function(person){person$wage * person$hours}

In [59]:
payroll(list(name="Deniz", wage=13.3, hours=25))

Deniz receives 332.5 


Now apply this function to every staff member on the list `staff`.

In [62]:
lapply(staff, payroll)

$id000
[1] 250

$id001
[1] 351

$id002
[1] 332.5


In [63]:
sapply(staff, payroll)

id000 id001 id002 
250.0 351.0 332.5 

Count the occurrences of numbers in a vector
-----
We have a vector of numbers where numbers are repeated.

In [7]:
data = c(1,2,3,15,1,2,3,4,1,2)

We want to keep the count of each number in a list, such that `counts[[i]]` stores how many times the number `i` occurs in `data`.

Initialize the `counts` list with zeros.

In [8]:
counts = list(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

Now loop over the data vector, and increase the count of the appropriate number.

In [9]:
for (x in data) {
    counts[[x]] <- counts[[x]] + 1
}
counts

Count the occurrences of words in a text
-----
Here is a simple application of textual analysis. Consider the following (short) text. We have preprocessed it to remove punctuation marks and uppercase letters.

In [67]:
sometext <- "my dear fellow said sherlock holmes as we sat on either side of the fire in his lodgings at baker street life is infinitely stranger than anything which the mind of man could invent we would not dare to conceive the things which are really mere commonplaces of existence if we could fly out of that window hand in hand hover over this great city gently remove the roofs and peep in at the queer things which are going on the strange coincidences the plannings the cross purposes the wonderful chains of events working through generations and leading to the most outré results it would make all fiction with its conventionalities and foreseen conclusions most stale and unprofitable"

In [68]:
print(sometext)

[1] "my dear fellow said sherlock holmes as we sat on either side of the fire in his lodgings at baker street life is infinitely stranger than anything which the mind of man could invent we would not dare to conceive the things which are really mere commonplaces of existence if we could fly out of that window hand in hand hover over this great city gently remove the roofs and peep in at the queer things which are going on the strange coincidences the plannings the cross purposes the wonderful chains of events working through generations and leading to the most outré results it would make all fiction with its conventionalities and foreseen conclusions most stale and unprofitable"


Our objective is to create a list `wordcounts` such that `wordcounts$word` gives the number of occurrences of `word` in the given text.

This problem is similar to the example above where we counted the occurrences of numbers. However, we don't know in advance what words and how many words we are going to encounter. So we cannot initialize the counts to zero.

In [69]:
wordcounts <- list()

We will approach the problem as follows:

    for every word in the word list
        if the word is already in the list, increase the count by 1.
        otherwise, add this word with a count of 1.

If an element is not in a list, the list returns `NULL`. This can be used to check for the existence of an element in a list.

In [70]:
wordcounts[["sherlock"]]

NULL

In [71]:
is.null(wordcounts[["sherlock"]])

[1] TRUE

So, beginning with the first word, we add it to our list:

In [72]:
word <- "my"
if (is.null(wordcounts[[word]])){
    wordcounts[[word]] <- 1
} else {
    wordcounts[[word]] <- wordcounts[[word]] + 1
}

In [73]:
wordcounts

$my
[1] 1


Similarly, the second word:

In [74]:
word <- "dear"
if (is.null(wordcounts[[word]])){
    wordcounts[[word]] <- 1
} else {
    wordcounts[[word]] <- wordcounts[[word]] + 1
}

And now the list contains the elements we gave, and nothing more.

In [75]:
wordcounts

$my
[1] 1

$dear
[1] 1


We can't go word by word manually. The better solution is to loop over every word in the text. We need to find a way to convert the large string of text to a vector, so that we can take the words one by one.

The `strsplit()` function does that for us:

In [76]:
strsplit(sometext, split=" ")

[[1]]
  [1] "my"                "dear"              "fellow"           
  [4] "said"              "sherlock"          "holmes"           
  [7] "as"                "we"                "sat"              
 [10] "on"                "either"            "side"             
 [13] "of"                "the"               "fire"             
 [16] "in"                "his"               "lodgings"         
 [19] "at"                "baker"             "street"           
 [22] "life"              "is"                "infinitely"       
 [25] "stranger"          "than"              "anything"         
 [28] "which"             "the"               "mind"             
 [31] "of"                "man"               "could"            
 [34] "invent"            "we"                "would"            
 [37] "not"               "dare"              "to"               
 [40] "conceive"          "the"               "things"           
 [43] "which"             "are"               "really"           
 [46

Note that `strsplit()` returns a list. The first element of this list is the vector of strings we look for.

In [77]:
wordsintext = strsplit(sometext, split=" ")[[1]]

In [78]:
wordsintext

  [1] "my"                "dear"              "fellow"           
  [4] "said"              "sherlock"          "holmes"           
  [7] "as"                "we"                "sat"              
 [10] "on"                "either"            "side"             
 [13] "of"                "the"               "fire"             
 [16] "in"                "his"               "lodgings"         
 [19] "at"                "baker"             "street"           
 [22] "life"              "is"                "infinitely"       
 [25] "stranger"          "than"              "anything"         
 [28] "which"             "the"               "mind"             
 [31] "of"                "man"               "could"            
 [34] "invent"            "we"                "would"            
 [37] "not"               "dare"              "to"               
 [40] "conceive"          "the"               "things"           
 [43] "which"             "are"               "really"           
 [46] "mer

In [79]:
for (word in wordsintext){
    if (is.null(wordcounts[[word]])){
        wordcounts[[word]] <- 1
    } else {
        wordcounts[[word]] <- wordcounts[[word]] + 1
    }
}


In [80]:
wordcounts

$my
[1] 2

$dear
[1] 2

$fellow
[1] 1

$said
[1] 1

$sherlock
[1] 1

$holmes
[1] 1

$as
[1] 1

$we
[1] 3

$sat
[1] 1

$on
[1] 2

$either
[1] 1

$side
[1] 1

$of
[1] 5

$the
[1] 10

$fire
[1] 1

$`in`
[1] 3

$his
[1] 1

$lodgings
[1] 1

$at
[1] 2

$baker
[1] 1

$street
[1] 1

$life
[1] 1

$is
[1] 1

$infinitely
[1] 1

$stranger
[1] 1

$than
[1] 1

$anything
[1] 1

$which
[1] 3

$mind
[1] 1

$man
[1] 1

$could
[1] 2

$invent
[1] 1

$would
[1] 2

$not
[1] 1

$dare
[1] 1

$to
[1] 2

$conceive
[1] 1

$things
[1] 2

$are
[1] 2

$really
[1] 1

$mere
[1] 1

$commonplaces
[1] 1

$existence
[1] 1

$`if`
[1] 1

$fly
[1] 1

$out
[1] 1

$that
[1] 1

$window
[1] 1

$hand
[1] 2

$hover
[1] 1

$over
[1] 1

$this
[1] 1

$great
[1] 1

$city
[1] 1

$gently
[1] 1

$remove
[1] 1

$roofs
[1] 1

$and
[1] 4

$peep
[1] 1

$queer
[1] 1

$going
[1] 1

$strange
[1] 1

$coincidences
[1] 1

$plannings
[1] 1

$cross
[1] 1

$purposes
[1] 1

$wonderful
[1] 1

$chains
[1] 1

$events
[1] 1

$working
[1] 1

$through
[1] 

Get the words that occur more than twice
---

In [81]:
wordcounts[unlist(wordcounts) > 2]

$we
[1] 3

$of
[1] 5

$the
[1] 10

$`in`
[1] 3

$which
[1] 3

$and
[1] 4


Sort the list, most frequent word first
-----

In [82]:
unlist(wordcounts)

               my              dear            fellow              said 
                2                 2                 1                 1 
         sherlock            holmes                as                we 
                1                 1                 1                 3 
              sat                on            either              side 
                1                 2                 1                 1 
               of               the              fire                in 
                5                10                 1                 3 
              his          lodgings                at             baker 
                1                 1                 2                 1 
           street              life                is        infinitely 
                1                 1                 1                 1 
         stranger              than          anything             which 
                1                 1                

In [83]:
order(unlist(wordcounts),decreasing = T)

 [1] 14 13 58  8 16 28  1  2 10 19 31 33 36 38 39 49 74  3  4  5  6  7  9 11 12
[26] 15 17 18 20 21 22 23 24 25 26 27 29 30 32 34 35 37 40 41 42 43 44 45 46 47
[51] 48 50 51 52 53 54 55 56 57 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 75
[76] 76 77 78 79 80 81 82 83 84 85 86 87

In [84]:
wordcounts[order(unlist(wordcounts),decreasing = T)]

$the
[1] 10

$of
[1] 5

$and
[1] 4

$we
[1] 3

$`in`
[1] 3

$which
[1] 3

$my
[1] 2

$dear
[1] 2

$on
[1] 2

$at
[1] 2

$could
[1] 2

$would
[1] 2

$to
[1] 2

$things
[1] 2

$are
[1] 2

$hand
[1] 2

$most
[1] 2

$fellow
[1] 1

$said
[1] 1

$sherlock
[1] 1

$holmes
[1] 1

$as
[1] 1

$sat
[1] 1

$either
[1] 1

$side
[1] 1

$fire
[1] 1

$his
[1] 1

$lodgings
[1] 1

$baker
[1] 1

$street
[1] 1

$life
[1] 1

$is
[1] 1

$infinitely
[1] 1

$stranger
[1] 1

$than
[1] 1

$anything
[1] 1

$mind
[1] 1

$man
[1] 1

$invent
[1] 1

$not
[1] 1

$dare
[1] 1

$conceive
[1] 1

$really
[1] 1

$mere
[1] 1

$commonplaces
[1] 1

$existence
[1] 1

$`if`
[1] 1

$fly
[1] 1

$out
[1] 1

$that
[1] 1

$window
[1] 1

$hover
[1] 1

$over
[1] 1

$this
[1] 1

$great
[1] 1

$city
[1] 1

$gently
[1] 1

$remove
[1] 1

$roofs
[1] 1

$peep
[1] 1

$queer
[1] 1

$going
[1] 1

$strange
[1] 1

$coincidences
[1] 1

$plannings
[1] 1

$cross
[1] 1

$purposes
[1] 1

$wonderful
[1] 1

$chains
[1] 1

$events
[1] 1

$working
[1] 1



Location of words
-----

In [85]:
sometext

[1] "my dear fellow said sherlock holmes as we sat on either side of the fire in his lodgings at baker street life is infinitely stranger than anything which the mind of man could invent we would not dare to conceive the things which are really mere commonplaces of existence if we could fly out of that window hand in hand hover over this great city gently remove the roofs and peep in at the queer things which are going on the strange coincidences the plannings the cross purposes the wonderful chains of events working through generations and leading to the most outré results it would make all fiction with its conventionalities and foreseen conclusions most stale and unprofitable"

In [87]:
wordlocations <- list()

In [88]:
for (i in 1:length(wordsintext)){
    word <- wordsintext[i]
    wordlocations[[word]] <- c(wordlocations[[word]],i)
}
wordlocations

$my
[1] 1

$dear
[1] 2

$fellow
[1] 3

$said
[1] 4

$sherlock
[1] 5

$holmes
[1] 6

$as
[1] 7

$we
[1]  8 35 51

$sat
[1] 9

$on
[1] 10 80

$either
[1] 11

$side
[1] 12

$of
[1] 13 31 48 55 92

$the
 [1]  14  29  41  68  74  81  84  86  89 100

$fire
[1] 15

$`in`
[1] 16 59 72

$his
[1] 17

$lodgings
[1] 18

$at
[1] 19 73

$baker
[1] 20

$street
[1] 21

$life
[1] 22

$is
[1] 23

$infinitely
[1] 24

$stranger
[1] 25

$than
[1] 26

$anything
[1] 27

$which
[1] 28 43 77

$mind
[1] 30

$man
[1] 32

$could
[1] 33 52

$invent
[1] 34

$would
[1]  36 105

$not
[1] 37

$dare
[1] 38

$to
[1] 39 99

$conceive
[1] 40

$things
[1] 42 76

$are
[1] 44 78

$really
[1] 45

$mere
[1] 46

$commonplaces
[1] 47

$existence
[1] 49

$`if`
[1] 50

$fly
[1] 53

$out
[1] 54

$that
[1] 56

$window
[1] 57

$hand
[1] 58 60

$hover
[1] 61

$over
[1] 62

$this
[1] 63

$great
[1] 64

$city
[1] 65

$gently
[1] 66

$remove
[1] 67

$roofs
[1] 69

$and
[1]  70  97 112 117

$peep
[1] 71

$queer
[1] 75

$going
[1] 79

$str

Counts of words, revisited
----
Using the `wordlocations` list, we can get the number of occurrences of words without passing over the data again. We only need to apply the `length()` function to the list.

In [89]:
lapply(wordlocations, length)

$my
[1] 1

$dear
[1] 1

$fellow
[1] 1

$said
[1] 1

$sherlock
[1] 1

$holmes
[1] 1

$as
[1] 1

$we
[1] 3

$sat
[1] 1

$on
[1] 2

$either
[1] 1

$side
[1] 1

$of
[1] 5

$the
[1] 10

$fire
[1] 1

$`in`
[1] 3

$his
[1] 1

$lodgings
[1] 1

$at
[1] 2

$baker
[1] 1

$street
[1] 1

$life
[1] 1

$is
[1] 1

$infinitely
[1] 1

$stranger
[1] 1

$than
[1] 1

$anything
[1] 1

$which
[1] 3

$mind
[1] 1

$man
[1] 1

$could
[1] 2

$invent
[1] 1

$would
[1] 2

$not
[1] 1

$dare
[1] 1

$to
[1] 2

$conceive
[1] 1

$things
[1] 2

$are
[1] 2

$really
[1] 1

$mere
[1] 1

$commonplaces
[1] 1

$existence
[1] 1

$`if`
[1] 1

$fly
[1] 1

$out
[1] 1

$that
[1] 1

$window
[1] 1

$hand
[1] 2

$hover
[1] 1

$over
[1] 1

$this
[1] 1

$great
[1] 1

$city
[1] 1

$gently
[1] 1

$remove
[1] 1

$roofs
[1] 1

$and
[1] 4

$peep
[1] 1

$queer
[1] 1

$going
[1] 1

$strange
[1] 1

$coincidences
[1] 1

$plannings
[1] 1

$cross
[1] 1

$purposes
[1] 1

$wonderful
[1] 1

$chains
[1] 1

$events
[1] 1

$working
[1] 1

$through
[1] 