In [1]:
options(jupyter.rich_display = FALSE)

# Lecture 8: Lists

# Where vectors are not adequate

* We've seen that every element of a vector in R must have the same mode (number, character, etc.).
* This has the advantage of efficiency, and provides fast computations.
* However, we cannot store more complicated data structures, such as

        Name: Fatma
        Salary: 5624.25
        Full time: yes

# Lists

* _Lists_ are more general than vectors.
* A list's elements can be of different types, allowing for more complicated data representations.
* They form the bridge between vectors and *data frames*, which we'll see later.

# Creating lists

In the simplest form, a list can be created with the `list()` function call.

In [2]:
ftm <- list("Fatma", 5624.25, TRUE)
ftm

[[1]]
[1] "Fatma"

[[2]]
[1] 5624.25

[[3]]
[1] TRUE


We can access list elements using the _double bracket_ `[[...]]` notation.

In [3]:
ftm[[1]]

[1] "Fatma"

In [4]:
ftm[[2]]

[1] 5624.25

# Tags of list elements

Instead of using integer indices, we can assign names (*tags*) to list components and refer to them using these tags.

In [5]:
ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE


In [6]:
ftm$name

[1] "Fatma"

# Mixing different objects
A list can comprise any type of object, such as vectors, matrices, sublists, etc.

In [7]:
list(1, c(2,3), list("abc",4))

[[1]]
[1] 1

[[2]]
[1] 2 3

[[3]]
[[3]][[1]]
[1] "abc"

[[3]][[2]]
[1] 4



List indexing
===
List elements can be accessed with
1. using integer indices: `mylist[[1]]`
1. using the `mylist$tag` notation, if tags are given
1. using the `mylist[["tag"]]` notation, if tags are given

In [8]:
ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
ftm$name  # ftm[[1]], ftm[["name"]]
ftm[[2]]  # ftm$salary, ftm[["salary"]]
ftm[["fulltime"]] # ftm$fulltime , ftm[[3]]

[1] "Fatma"

[1] 5624.25

[1] TRUE

If one element is a vector, the `[...]` operator can be used afterwards in order to select elemnets of that vector.

In [9]:
ftm <- list(name="Fatma", grades=c(10,12,9))
ftm

$name
[1] "Fatma"

$grades
[1] 10 12  9


In [10]:
ftm$grades

[1] 10 12  9

In [11]:
ftm[[2]][3]
ftm$grades[3]

[1] 9

[1] 9

The syntax `listname[["tagname"]]` is useful when tagnames are stored in a variable.

In [12]:
ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
x <- "salary"
ftm[[x]]

[1] 5624.25

Selecting a range of indices
----
A range of indices can be selected using the familiar vector syntax _with a single bracket_.

This returns a _sublist_.

In [13]:
ftm[1:2]

$name
[1] "Fatma"

$salary
[1] 5624.25


In [14]:
ftm[c(1,3)]

$name
[1] "Fatma"

$fulltime
[1] TRUE


However, this does not work with the double bracket notation.

In [15]:
ftm[[1:2]]

ERROR: Error in ftm[[1:2]]: subscript out of bounds


Difference between indexing with single and double brackets
----
The availability of two types of brackets for list indexing can be confusing. They can be distinguished by their return types:

* `[i]` returns a list with a single component
* `[[i]]` returns a single component.

In [16]:
ftm[1]  # returns a list with a single component.

$name
[1] "Fatma"


In [17]:
ftm[[1]]  # returns a one-element vector

[1] "Fatma"

In [18]:
mode(ftm[1])
mode(ftm[[1]])

[1] "list"

[1] "character"

# Adding new elements to a list

You can start with an incomplete list and add new elements as you go along.

In [19]:
ftm <- list(name="Fatma", salary=5624.25)
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25


In [20]:
ftm$fulltime <- TRUE
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE


New list elements can also be added via vector indices.

In [21]:
ftm[[4]] <- 28
ftm[5:7] <- c(a=FALSE,b=TRUE,c=TRUE)
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE

[[4]]
[1] 28

[[5]]
[1] FALSE

[[6]]
[1] TRUE

[[7]]
[1] TRUE


This last example also shows that a list can have both tagged and untagged elements.

# Delete elements from a list

You can delete an element by setting it to `NULL`.

In [22]:
ftm$fulltime <- NULL
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

[[3]]
[1] 28

[[4]]
[1] FALSE

[[5]]
[1] TRUE

[[6]]
[1] TRUE


Note that after deletion, all elements below the deleted one are moved up and their indices are decreased by one.

In [23]:
ftm[[3]] <- NULL
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25

[[3]]
[1] FALSE

[[4]]
[1] TRUE

[[5]]
[1] TRUE


In [24]:
ftm[3:5] <- NULL
ftm

$name
[1] "Fatma"

$salary
[1] 5624.25


# Concatenating lists

The familiar `c()` function can be used on lists, too.

In [27]:
c( list("abc", 32, T), list(5.1))

[[1]]
[1] "abc"

[[2]]
[1] 32

[[3]]
[1] TRUE

[[4]]
[1] 5.1


In [28]:
c(list(name="Fatma", salary=5624.25, fulltime=TRUE), list(hobby="painting"))

$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE

$hobby
[1] "painting"


# Getting information on lists
To get the number of elements in a list, we can use the `length()` function.

In [29]:
ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
length(ftm)

[1] 3

To get the tags in a list, we use the `names()` function.

In [30]:
names(ftm)

[1] "name"     "salary"   "fulltime"

To obtain the values as a vector, we can use the `unlist()` function.

In [31]:
unlist(ftm)

     name    salary  fulltime 
  "Fatma" "5624.25"    "TRUE" 

In [32]:
unname(unlist(ftm))

[1] "Fatma"   "5624.25" "TRUE"   

Note that this function returns a vector, and the numeric and the Boolean values are converted to strings. The reason is that in a vector every element must be of the same type, and strings are the only common denominator here.

# Applying functions to lists

The `lapply()` function applies a function to each element of a list, and returns the results as a list.

In [33]:
lapply(list(2,3.5,4), sqrt)

[[1]]
[1] 1.414214

[[2]]
[1] 1.870829

[[3]]
[1] 2


Working with tagged elements:

In [34]:
grades_1 <- c(10,12,11,14,8,12)
grades_2 <- c(13,11,10,11,9)
allgrades <- list(section1=grades_1, section2=grades_2)
allgrades

$section1
[1] 10 12 11 14  8 12

$section2
[1] 13 11 10 11  9


In [36]:
mean(allgrades$section1)

[1] 11.16667

In [37]:
lapply(allgrades, mean)

$section1
[1] 11.16667

$section2
[1] 10.8


The `sapply()` (simple apply) function returns a vector or a matrix resulting from the application of the function.

In [38]:
sapply(allgrades, mean)

section1 section2 
11.16667 10.80000 

In [39]:
mode(sapply(allgrades, mean))

[1] "numeric"

We can define our own functions to specify what to do with each element.

In [40]:
mult_by2 <- function(x) {2*x}
mult_by2(c(1,2,3,4))

[1] 2 4 6 8

In [41]:
lapply( list(1, 2, 3:7), mult_by2)

[[1]]
[1] 2

[[2]]
[1] 4

[[3]]
[1]  6  8 10 12 14


# Examples


# Example: Calculate weekly payrolls

Set up a list of staff members, where each element is a list consisting of names, wages and number of hours worked.

In [42]:
staff <- list(
    id000=list(name="Fatma", wage=12.5, hours=20),
    id001=list(name="Ekrem", wage=11.7, hours=30),
    id002=list(name="Deniz", wage=13.3, hours=25)
)
staff

$id000
$id000$name
[1] "Fatma"

$id000$wage
[1] 12.5

$id000$hours
[1] 20


$id001
$id001$name
[1] "Ekrem"

$id001$wage
[1] 11.7

$id001$hours
[1] 30


$id002
$id002$name
[1] "Deniz"

$id002$wage
[1] 13.3

$id002$hours
[1] 25



Define a function that takes one person as defined above, and returns the weekly pay.

In [43]:
payroll <- function(person){person$wage * person$hours}

In [44]:
payroll(list(name="Deniz", wage=13.3, hours=25))

[1] 332.5

Now apply this function to every staff member on the list `staff`.

In [46]:
lapply(staff, payroll)

$id000
[1] 250

$id001
[1] 351

$id002
[1] 332.5


In [47]:
sapply(staff, payroll)

id000 id001 id002 
250.0 351.0 332.5 

Example: Count the occurrences of numbers in a vector
-----
We have a vector of numbers where numbers are repeated.

In [49]:
mydata = c(1,2,3,15,1,2,3,4,1)

We want to keep the count of each number in a list, such that `counts[[i]]` stores how many times the number `i` occurs in `data`.

Initialize the `counts` list with zeros.

In [50]:
counts = list(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

Now loop over the data vector, and increase the count of the appropriate number.

In [51]:
mydata

[1]  1  2  3 15  1  2  3  4  1

In [52]:
for (x in mydata) {
    counts[[x]] <- counts[[x]] + 1
}
counts

[[1]]
[1] 3

[[2]]
[1] 2

[[3]]
[1] 2

[[4]]
[1] 1

[[5]]
[1] 0

[[6]]
[1] 0

[[7]]
[1] 0

[[8]]
[1] 0

[[9]]
[1] 0

[[10]]
[1] 0

[[11]]
[1] 0

[[12]]
[1] 0

[[13]]
[1] 0

[[14]]
[1] 0

[[15]]
[1] 1

[[16]]
[1] 0

[[17]]
[1] 0


Count the occurrences of words in a text
-----
Here is a simple application of textual analysis. Consider the following (short) text. We have preprocessed it to remove punctuation marks and uppercase letters.

In [53]:
sometext <- "my dear fellow said sherlock holmes as we sat on either side of the fire in his lodgings at baker street life is infinitely stranger than anything which the mind of man could invent we would not dare to conceive the things which are really mere commonplaces of existence if we could fly out of that window hand in hand hover over this great city gently remove the roofs and peep in at the queer things which are going on the strange coincidences the plannings the cross purposes the wonderful chains of events working through generations and leading to the most outré results it would make all fiction with its conventionalities and foreseen conclusions most stale and unprofitable"

In [54]:
print(sometext)

[1] "my dear fellow said sherlock holmes as we sat on either side of the fire in his lodgings at baker street life is infinitely stranger than anything which the mind of man could invent we would not dare to conceive the things which are really mere commonplaces of existence if we could fly out of that window hand in hand hover over this great city gently remove the roofs and peep in at the queer things which are going on the strange coincidences the plannings the cross purposes the wonderful chains of events working through generations and leading to the most outré results it would make all fiction with its conventionalities and foreseen conclusions most stale and unprofitable"


* Objective: Create a list `wordcounts` such that `wordcounts$word` gives the number of occurrences of `word` in the given text.
* This problem is similar to the example above where we counted the occurrences of numbers. 
* However, we don't know in advance what words and how many words we are going to encounter. So we cannot initialize the counts to zero.

We will approach the problem as follows:

    for every word in the word list
        if the word is already in the list, increase the count.
        otherwise, add this word with a count of 1.

If an element is not in a list, the list returns `NULL`. This can be used to check for the existence of an element in a list.

In [55]:
wordcounts <- list()
wordcounts

list()

In [56]:
wordcounts[["sherlock"]]

NULL

In [57]:
is.null(wordcounts[["sherlock"]])

[1] TRUE

So, beginning with the first word, we add it to our list:

In [60]:
word <- "my"
if (is.null(wordcounts[[word]])){
    wordcounts[[word]] <- 1
} else {
    wordcounts[[word]] <- wordcounts[[word]] + 1
}

In [61]:
wordcounts

$my
[1] 2


Similarly, the second word:

In [62]:
word <- "dear"
if (is.null(wordcounts[[word]])){
    wordcounts[[word]] <- 1
} else {
    wordcounts[[word]] <- wordcounts[[word]] + 1
}
wordcounts

$my
[1] 2

$dear
[1] 1


And now the list contains the elements we gave, and nothing more.

In [63]:
wordcounts

$my
[1] 2

$dear
[1] 1


We can't go word by word manually. The better solution is to loop over every word in the text. We need to find a way to convert the large string of text to a vector, so that we can take the words one by one.

The `strsplit()` function does that for us:

In [64]:
strsplit(sometext, split=" ")

[[1]]
  [1] "my"                "dear"              "fellow"           
  [4] "said"              "sherlock"          "holmes"           
  [7] "as"                "we"                "sat"              
 [10] "on"                "either"            "side"             
 [13] "of"                "the"               "fire"             
 [16] "in"                "his"               "lodgings"         
 [19] "at"                "baker"             "street"           
 [22] "life"              "is"                "infinitely"       
 [25] "stranger"          "than"              "anything"         
 [28] "which"             "the"               "mind"             
 [31] "of"                "man"               "could"            
 [34] "invent"            "we"                "would"            
 [37] "not"               "dare"              "to"               
 [40] "conceive"          "the"               "things"           
 [43] "which"             "are"               "really"           
 [46

Note that `strsplit()` returns a list. The first element of this list is the vector of strings we look for.

In [65]:
wordsintext <- strsplit(sometext, split=" ")[[1]]
wordsintext

  [1] "my"                "dear"              "fellow"           
  [4] "said"              "sherlock"          "holmes"           
  [7] "as"                "we"                "sat"              
 [10] "on"                "either"            "side"             
 [13] "of"                "the"               "fire"             
 [16] "in"                "his"               "lodgings"         
 [19] "at"                "baker"             "street"           
 [22] "life"              "is"                "infinitely"       
 [25] "stranger"          "than"              "anything"         
 [28] "which"             "the"               "mind"             
 [31] "of"                "man"               "could"            
 [34] "invent"            "we"                "would"            
 [37] "not"               "dare"              "to"               
 [40] "conceive"          "the"               "things"           
 [43] "which"             "are"               "really"           
 [46] "mer

For each word in the text, increase the count if the word exists in the counter list, otherwise set it to one.

In [66]:
wordcounts <- list()
for (word in wordsintext){
    if (is.null(wordcounts[[word]])){
        wordcounts[[word]] <- 1
    } else {
        wordcounts[[word]] <- wordcounts[[word]] + 1
    }
}

In [67]:
wordcounts

$my
[1] 1

$dear
[1] 1

$fellow
[1] 1

$said
[1] 1

$sherlock
[1] 1

$holmes
[1] 1

$as
[1] 1

$we
[1] 3

$sat
[1] 1

$on
[1] 2

$either
[1] 1

$side
[1] 1

$of
[1] 5

$the
[1] 10

$fire
[1] 1

$`in`
[1] 3

$his
[1] 1

$lodgings
[1] 1

$at
[1] 2

$baker
[1] 1

$street
[1] 1

$life
[1] 1

$is
[1] 1

$infinitely
[1] 1

$stranger
[1] 1

$than
[1] 1

$anything
[1] 1

$which
[1] 3

$mind
[1] 1

$man
[1] 1

$could
[1] 2

$invent
[1] 1

$would
[1] 2

$not
[1] 1

$dare
[1] 1

$to
[1] 2

$conceive
[1] 1

$things
[1] 2

$are
[1] 2

$really
[1] 1

$mere
[1] 1

$commonplaces
[1] 1

$existence
[1] 1

$`if`
[1] 1

$fly
[1] 1

$out
[1] 1

$that
[1] 1

$window
[1] 1

$hand
[1] 2

$hover
[1] 1

$over
[1] 1

$this
[1] 1

$great
[1] 1

$city
[1] 1

$gently
[1] 1

$remove
[1] 1

$roofs
[1] 1

$and
[1] 4

$peep
[1] 1

$queer
[1] 1

$going
[1] 1

$strange
[1] 1

$coincidences
[1] 1

$plannings
[1] 1

$cross
[1] 1

$purposes
[1] 1

$wonderful
[1] 1

$chains
[1] 1

$events
[1] 1

$working
[1] 1

$through
[1] 

Which words occur more than twice?

In [68]:
wordcounts[wordcounts > 2]

$we
[1] 3

$of
[1] 5

$the
[1] 10

$`in`
[1] 3

$which
[1] 3

$and
[1] 4


Sort the list, most frequent word first.

In [69]:
unlist(wordcounts)

               my              dear            fellow              said 
                1                 1                 1                 1 
         sherlock            holmes                as                we 
                1                 1                 1                 3 
              sat                on            either              side 
                1                 2                 1                 1 
               of               the              fire                in 
                5                10                 1                 3 
              his          lodgings                at             baker 
                1                 1                 2                 1 
           street              life                is        infinitely 
                1                 1                 1                 1 
         stranger              than          anything             which 
                1                 1                

In [70]:
order(unlist(wordcounts),decreasing = T)

 [1] 14 13 58  8 16 28 10 19 31 33 36 38 39 49 74  1  2  3  4  5  6  7  9 11 12
[26] 15 17 18 20 21 22 23 24 25 26 27 29 30 32 34 35 37 40 41 42 43 44 45 46 47
[51] 48 50 51 52 53 54 55 56 57 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 75
[76] 76 77 78 79 80 81 82 83 84 85 86 87

In [74]:
sort(unlist(wordcounts),decreasing = T)

              the                of               and                we 
               10                 5                 4                 3 
               in             which                on                at 
                3                 3                 2                 2 
            could             would                to            things 
                2                 2                 2                 2 
              are              hand              most                my 
                2                 2                 2                 1 
             dear            fellow              said          sherlock 
                1                 1                 1                 1 
           holmes                as               sat            either 
                1                 1                 1                 1 
             side              fire               his          lodgings 
                1                 1                

In [75]:
wordcounts[order(unlist(wordcounts),decreasing = T)]

$the
[1] 10

$of
[1] 5

$and
[1] 4

$we
[1] 3

$`in`
[1] 3

$which
[1] 3

$on
[1] 2

$at
[1] 2

$could
[1] 2

$would
[1] 2

$to
[1] 2

$things
[1] 2

$are
[1] 2

$hand
[1] 2

$most
[1] 2

$my
[1] 1

$dear
[1] 1

$fellow
[1] 1

$said
[1] 1

$sherlock
[1] 1

$holmes
[1] 1

$as
[1] 1

$sat
[1] 1

$either
[1] 1

$side
[1] 1

$fire
[1] 1

$his
[1] 1

$lodgings
[1] 1

$baker
[1] 1

$street
[1] 1

$life
[1] 1

$is
[1] 1

$infinitely
[1] 1

$stranger
[1] 1

$than
[1] 1

$anything
[1] 1

$mind
[1] 1

$man
[1] 1

$invent
[1] 1

$not
[1] 1

$dare
[1] 1

$conceive
[1] 1

$really
[1] 1

$mere
[1] 1

$commonplaces
[1] 1

$existence
[1] 1

$`if`
[1] 1

$fly
[1] 1

$out
[1] 1

$that
[1] 1

$window
[1] 1

$hover
[1] 1

$over
[1] 1

$this
[1] 1

$great
[1] 1

$city
[1] 1

$gently
[1] 1

$remove
[1] 1

$roofs
[1] 1

$peep
[1] 1

$queer
[1] 1

$going
[1] 1

$strange
[1] 1

$coincidences
[1] 1

$plannings
[1] 1

$cross
[1] 1

$purposes
[1] 1

$wonderful
[1] 1

$chains
[1] 1

$events
[1] 1

$working
[1] 1



Where in the text does a word occur? Generate a list such that words are tags and the corresponding value is a vector of positions.

In [77]:
wordsintext

  [1] "my"                "dear"              "fellow"           
  [4] "said"              "sherlock"          "holmes"           
  [7] "as"                "we"                "sat"              
 [10] "on"                "either"            "side"             
 [13] "of"                "the"               "fire"             
 [16] "in"                "his"               "lodgings"         
 [19] "at"                "baker"             "street"           
 [22] "life"              "is"                "infinitely"       
 [25] "stranger"          "than"              "anything"         
 [28] "which"             "the"               "mind"             
 [31] "of"                "man"               "could"            
 [34] "invent"            "we"                "would"            
 [37] "not"               "dare"              "to"               
 [40] "conceive"          "the"               "things"           
 [43] "which"             "are"               "really"           
 [46] "mer

In [78]:
wordlocations <- list()

In [84]:
for (i in 1:length(wordsintext)){
    word <- wordsintext[i]
    wordlocations[[word]] <- c(wordlocations[[word]],i)
}
wordlocations

$my
[1] 1 1 1

$dear
[1] 2 2 2

$fellow
[1] 3 3 3

$said
[1] 4 4 4

$sherlock
[1] 5 5 5

$holmes
[1] 6 6 6

$as
[1] 7 7 7

$we
[1]  8 35 51  8 35 51  8 35 51

$sat
[1] 9 9 9

$on
[1] 10 80 10 80 10 80

$either
[1] 11 11 11

$side
[1] 12 12 12

$of
 [1] 13 31 48 55 92 13 31 48 55 92 13 31 48 55 92

$the
 [1]  14  29  41  68  74  81  84  86  89 100  14  29  41  68  74  81  84  86  89
[20] 100  14  29  41  68  74  81  84  86  89 100

$fire
[1] 15 15 15

$`in`
[1] 16 59 72 16 59 72 16 59 72

$his
[1] 17 17 17

$lodgings
[1] 18 18 18

$at
[1] 19 73 19 73 19 73

$baker
[1] 20 20 20

$street
[1] 21 21 21

$life
[1] 22 22 22

$is
[1] 23 23 23

$infinitely
[1] 24 24 24

$stranger
[1] 25 25 25

$than
[1] 26 26 26

$anything
[1] 27 27 27

$which
[1] 28 43 77 28 43 77 28 43 77

$mind
[1] 30 30 30

$man
[1] 32 32 32

$could
[1] 33 52 33 52 33 52

$invent
[1] 34 34 34

$would
[1]  36 105  36 105  36 105

$not
[1] 37 37 37

$dare
[1] 38 38 38

$to
[1] 39 99 39 99 39 99

$conceive
[1] 40 40 40

$thing

Once we have the `wordlocations` list, we can get the number of occurrences of words without passing over the data again. We only need to apply the `length()` function to the list.

In [82]:
lapply(wordlocations, length)

$my
[1] 1

$dear
[1] 1

$fellow
[1] 1

$said
[1] 1

$sherlock
[1] 1

$holmes
[1] 1

$as
[1] 1

$we
[1] 3

$sat
[1] 1

$on
[1] 2

$either
[1] 1

$side
[1] 1

$of
[1] 5

$the
[1] 10

$fire
[1] 1

$`in`
[1] 3

$his
[1] 1

$lodgings
[1] 1

$at
[1] 2

$baker
[1] 1

$street
[1] 1

$life
[1] 1

$is
[1] 1

$infinitely
[1] 1

$stranger
[1] 1

$than
[1] 1

$anything
[1] 1

$which
[1] 3

$mind
[1] 1

$man
[1] 1

$could
[1] 2

$invent
[1] 1

$would
[1] 2

$not
[1] 1

$dare
[1] 1

$to
[1] 2

$conceive
[1] 1

$things
[1] 2

$are
[1] 2

$really
[1] 1

$mere
[1] 1

$commonplaces
[1] 1

$existence
[1] 1

$`if`
[1] 1

$fly
[1] 1

$out
[1] 1

$that
[1] 1

$window
[1] 1

$hand
[1] 2

$hover
[1] 1

$over
[1] 1

$this
[1] 1

$great
[1] 1

$city
[1] 1

$gently
[1] 1

$remove
[1] 1

$roofs
[1] 1

$and
[1] 4

$peep
[1] 1

$queer
[1] 1

$going
[1] 1

$strange
[1] 1

$coincidences
[1] 1

$plannings
[1] 1

$cross
[1] 1

$purposes
[1] 1

$wonderful
[1] 1

$chains
[1] 1

$events
[1] 1

$working
[1] 1

$through
[1] 