# Strings, lists and tuples

## $ \S 1 $ Strings<a name="strings"></a>

### $ 1.1 $ Strings as sequences of characters
A __string__ is a sequence of characters enclosed in either single `'` or double `"` quotes. The type corresponding to strings is called `str`.

__Example:__

In [16]:
g = "Gandalf"
s = 'Sauron'
print(g, type(g))   # Printing the string g and its type.
print(s, type(s))   # Doing the same for the string s.

Gandalf <class 'str'>
Sauron <class 'str'>


📝 Unlike several other languages, Python does not have a special type for
characters: a character is represented simply as a string of length 1.

In [15]:
letter = 'a'
print(letter, type(letter))    # Note that letter is of type str.

a <class 'str'>


To get the $ i $-th character of a string called, say, $ s $, use `s[i]`; the output is also string (albeit one having only $ 1 $ character).

<div class="alert alert-warning">In Python, indices are <i>always</i> counted starting from <b> $ 0 $  (zero)</b>, not $ 1 $. To avoid confusion, we adapt our terminology accordingly to speak of, e.g., 'm' as the <i>0-th</i> character of the string 'magic', 'a' as its first character, and so on... In particular, if a string has length $ n $ (i.e., if it consists of $ n $ characters), then its last index is $ n - 1 $, not $ n $. </div>

__Example:__

In [17]:
g = "Gandalf"
s = "Sauron"

print(g[0], type(g[0]))
print(g[3], type(g[3]))

# Since s contains 6 letters, the last one is indexed by 5:
print(s[5])

G <class 'str'>
d <class 'str'>
n


By prefixing an index with a minus sign $ - $, we start counting to the 'left'
from the 0-th character. For example, `s[-1]` is the _last_ character of $ s $,
`s[-2]` its _next-to-last_ character, and so on.

📝 If we want to create a string that has a single quote as one of its
characters, we should enclose it in double quotes, and vice-versa.

In [1]:
explosion = "'BOOM!'"
last = explosion[-1]
print(last)

next_to_last = explosion[-2]
print(next_to_last)

'
!


### $ 1.2 $ Operations on strings

Strings can be __concatenated__ using the binary operator __+__.

__Example:__

In [19]:
string_1 = "ancient"
string_2 = "magic"
string_3 = "spells"

print(string_1 + string_2)
print(string_1 + " " + string_2 + " " + string_3)

ancientmagic
ancient magic spells


__Exercise:__ Suppose that the statements `a = "hello"` and `b = 'world'` have
just been run through the interpreter. Determine the output of each of the
following statements:

(a) `a + a`

(b) `b + " " + a`

(c) `a * 3`

(d) `2 * b`

(e) `(-1) * a`

(f) `1 + "1"`

(g) `a * b`

(h) `a - a`

(i) `0 == '0'`

(j) `True == "True"`

📝 In analogy with the interpretation of `+` as concatenation of strings, if one
uses `*` to "multiply" a string by a positive integer $ n $, then the result is
a new string which consists of $ n $ copies of the original string, concatenated
one after another. The remaining arithmetic operators (`-`, `/`, `//` and `%`)
cannot be applied to strings.

The function `len` applied to a string returns its **length**, i.e., the number of characters it contains, which is always a non-negative integer.

__Example:__

In [34]:
f = "four"
p = "Polysyllabic"
print(len(f), type(len(f)))
print(len(p), type(len(p)))

print(2 * f, len(2 * f))
print(f + p, len(f + p))

4 <class 'int'>
12 <class 'int'>
fourfour 8
fourPolysyllabic 16


In [6]:
# The empty string is the only string having length 0:
len("")

# Note that whitespaces also count as valid characters:
len("   ")    # <--- Contains three spaces.


3

🚫 A string is an __immutable__ object, meaning that its individual characters _cannot_ be modified during the program. Trying to do so will make the interpreter throw a `TypeError`.

In [35]:
g = "Gandalf"
# Let's try to modify the first string of g to see what happens:
g[0] = 'R'

TypeError: 'str' object does not support item assignment

The __colon operator `:`__, as in `[i:j]`, is used to __slice__ a string from
its $ i $-th character (inclusive) to its $ j $-th character (exclusive).
This operation does not modify the string (after all, strings are immutable);
rather, it creates a new string which consists of the characters having index
ranging from $ i $ up to and including $ j - 1 $.

__Example:__

In [2]:
string = 'magic'

# Slice from the 0th character to the 2nd (not including the 2nd):
init = string[0:2]
print(init, type(init))
# Slice from the 2nd character to the 5th (not including the 5th):
final = string[2:5]   
print(final, type(final))

print(init + final)

ma <class 'str'>
gic <class 'str'>
magic


Omitting the first index in a slice has the same effect slicing from the
beginning.  Similarly, if we omit the second index, then the string will be
sliced until the end.

In [37]:
word = 'automaton'
print(word[:4])
print(word[4:])

auto
maton


One frequently has a need to make an independent copy of a string. To achieve
this, a _full slice_ `[:]` can be used.

__Example:__

In [4]:
string_1 = 'potion'

# Omit both indices in the slice to create a copy of the original string:
string_2 = string_1[:]    

string_1 = 'magic'
print(string_1, string_2)

magic potion


__Exercise:__ Let $ s $ be a variable whose value is a string. True or False? Explain.

(a) `s[:] == s[0:]`

(b) `s[:] == s[0:len(s)]`

(c) `s[:] == s[0:-1]`


The slice operation also admits a third argument, which specifies the
__step size__ in the slicing operation. The syntax of a slice whice makes use of
all three arguments is thus:
`[<start index (inclusive)>:<stop index (exclusive)>: <step size>]`. If omitted,
the step size is set to $ 1 $ by default. Step sizes can also be negative, which
causes the string to be sliced in the right-to-left direction.

__Example:__

In [8]:
s = "magic"
print(s[::])    # Full slice, step size set to 1 by default.
print(s[::1])   # Full slice, explicit step size of 1.
print(s[::2])   # Slice consisting of even-indexed characters.
print(s[1::2])  # Slice consisting of odd-indexed characters.
print(s[::-1])  # Slice which amounts to the reverse of the string.

magic
magic
mgc
ai
cigam


📝 As above, to create a copy of string $ s $ with its characters reversed, we use `s[::-1]`.

__Exercise:__ Suppose that $ p $ is the name of a variable whose value is the string "racecar". What is the output of the following statements?

(a) `p[::-1]`

(b) `p[0::2]`

(c) `p[::2]`

(d) `p[1::4]`

(e) `p[2:4:2]`

(f) `p[2:5:2]`

(g) `p[2:6:2]`

(h) `p[2:7:2]`

(i) `p[4:0:-1]`

(j) `p[4::-2]`

(k) `p[1::-1]`

### $ 1.3 $ Comparing strings

All of the comparison operators introduced in the previous notebook work for
strings as well. Strings are ordered according to the __lexicographic__ (or __dictionary__) __order__.
Therefore:

* The operators `==` and `!=` tell whether two given strings have the
  same value or not, i.e., if they consist of exactly the same characters in the
  same order (this includes distinguishing uppercase from lowercase letters).
* When applied to two strings $ a $ and $ b $, `a < b` returns `True` if and
  only if $ a $ comes before $ b $ in the dictionary order. Similarly for
  `<=`, `>` and `>=`.

__Exercise:__ Let $ a,\,b,\,q,\,r $ be as defined in the code cell below.
Determine the value of:

(a) `a < b`

(b) `b < q `

(c) `a == a`

(d) `a != b`

(e) `q >= r`

(f) `b < a < q < r`

In [7]:
a = "potion"
b = "portion"
q = "quarterstaff"  
r = "robe"

## $ \S 2 $ Lists

### $ 2.1 $ The `list` type

A __list__ (that is, an object of type `list`) consists of zero, one or several
objects ordered in sequence. The types of different elements of a list do not
have to coincide. For example, one can create lists which contain integers,
floats and strings; or lists whose elements can be either complex numbers or
functions.

In short, the items of a list are allowed to be of _any_ type. In particular,
lists in Python have the important property of __closure__: one is allowed to
make lists of lists, or lists of lists of lists, etc.

A list is represented using _brackets_ `[]`, with its elements separated by commas. The function `len` can be used to count the number of items contained in a list.

__Example:__

In [None]:
fruits = ["acai", 'apple', "apricot", 'avocado']
# The elements of a list can be of different types:
numbers = [0, 'eight', -53, 12.34, (3 + 4j)]

empty = []                  # This is an empty list.
mages = ['Delfador']        # This list has a single element.

print(len(fruits))          # Use 'len' to get the length of a list.
print(len(empty))

new_list = fruits + mages   # We can concatenate two lists using '+'.
print(new_list)

4
0
['acai', 'apple', 'apricot', 'avocado', 'Delfador']


Just like strings, lists can be __concatenated__ with the `+` operator,
__repeated__ by "multiplication" with a positive integers using `*` and
__sliced__ with the `:` operator. Note that none of these operations
_modifies_ the original list; instead, they create a new list.


__Exercises:__ Let _movies_ be the list in the code cell below. Determine the output of the following statements:

(a) `movies * 2`

(b) `movies + ["Paths of Glory", "Modern Times"]`

(c) `["Star Wars", "The Third Man"] + movies`

(d) `movies[:2]`

(e) `movies[::-1]`

(f) `movies + []`

(g) `movies + "error"`

In [None]:
movies = ["Gone with the Wind",
         "Interstellar",
         "E.T.",
         "It's a Wonderful Life",
         "Rain Man",
         "Rambo"]

### $ 2.2 $ Modifying lists

In contrast to strings, lists are __mutable__ objects, meaning that their individual elements can be modified by assignments.

__Exercise:__ Again, let _movies_ be as in the next code cell. What is the output of each the statements below, when they are run through the interpreter in sequence?

(a) `movies[1] = "Forrest Gump"`

(b) `movies[2:4] = ["Modern Times", "Paths of Glory"]`

(c) `movies[-1] = "Bicycle Thieves"`

(d) `movies += "Das Leben der Anderen"`


In [9]:
movies = ["Gone with the Wind",
         "Interstellar",
         "E.T.",
         "It's a Wonderful Life",
         "Rain Man",
         "Rambo"]

⚠️ In order to _assign_ an element at the $ k $-th index of a list, the list
must have items associated with every index between $ 0 $ and $ k - 1 $. Trying
to modify or access in any way the element having index $ k $ in a list which
currently does not have such an element generates an `IndexError`.

__Example:__

In [10]:
drinks = ["coffee", "tea", "water"]
drinks[3] = "orange juice"

IndexError: list assignment index out of range

### $ 2.3 $ Some methods defined on lists

Lists also support several useful methods (a __method__ is a function associated
with a specific class or type). Here are examples of how some of them are used.

__Example:__

In [23]:
fruits = ["avocado", 'apricot', "acai", 'apple']

fruits.append('apple')             # Append an element to the end of a list.
print(fruits)

['avocado', 'apricot', 'acai', 'apple', 'apple']


In [24]:
fruits.insert(0, "strawberry")     # Insert an element having a specified index.
print(fruits)

['strawberry', 'avocado', 'apricot', 'acai', 'apple', 'apple']


In [25]:
fruits.remove('apple')   # Remove the _first occurrence_ of an element.
print(fruits)


['strawberry', 'avocado', 'apricot', 'acai', 'apple']


In [26]:
fruits.sort()            # Sort the list.
print(fruits)

['acai', 'apple', 'apricot', 'avocado', 'strawberry']


In [27]:
a = fruits.pop(2)                # Remove the element having the specified index
print(fruits)                    # and return it as output. 

b = fruits.pop()                 # Use 'pop' without any arguments to remove the
print(fruits)                    # last item of a list and return it as output.

a, b

['acai', 'apple', 'avocado', 'strawberry']
['acai', 'apple', 'avocado']


('apricot', 'strawberry')

Note that, in each case, the name of the list appears before the method, and is
separated from it by a period `.`; more formally:

* `append(x)` can be used to append an element $ x $ to the _end_ of a list.
* `insert(i, x)` is used to insert an element $ x $ at the $ i $-th index of a
  list. Elements having indices $ < i $ remain in their original position, while
  those having indices $ \ge i $ are shifted one position to the right.
* `remove(x)` is used to remove the *first occurrence* of an item $ x $ of a
  list. If the list does not contain any instances of this element, then the
  interpreter throws a `TypeError`.
* `sort()` is used to __sort__ the elements of a list _lst_ in ascending
  order, provided this makes sense.
* `pop(i)` removes the item of the list at index $ i $.
* `index(x)` returns the index of the first element in the list whose value is
  the same as that of $ x $.

📝 The first four of these methods _modify_ the list in-place as described, and
they return `None` as output. However, `pop` removes the specified element and
returns it popped element as output. Similarly, `index` does not modify the list,
but returns the index as output.

__Exercise:__ Let _countries_ be the list in the code cell below. Describe
_countries_ after each of the following statements is run in sequence through
the interpreter.

(a) `countries.insert(0, "Germany")`

(b) `countries + ["Germany"]`

(c) `countries.remove("Germany")`

(d) `countries.sort()`

(e) `countries.index("Bolivia")`

In [None]:
countries = ["Austria",
             "Egypt",
             "Bolivia",
             "Denmark",
             "China",
             "Finland"]

b = fruits.sort()
print(b)
print(fruits)

None
None
['acai', 'avocado', 'banana', 'strawberry']


<div class="alert alert-warning">If $ x $ stores a <i>mutable</i> object, then the assignment <code>y = x</code> does not result in a new <i>object</i> named $ y $; instead, this just makes $ y $ a new pointer to the object stored by $ x $. Because of this, any modification of the value of $ x $ will affect $ y $, and vice-versa.</div>

__Example:__

In [None]:
x = [0, 1, 2]
y = x

x.pop()         # Popping an element from x also affects y,
y               # since they refer to the same object!

[0, 1]

In [None]:
x = [0, 1, 2]    # To create an independent copy of x, use a complete slice:
y = x[:]

x.pop()
y                # y has not been affected by the modification of x.

[0, 1, 2]

## $ \S 3 $ Tuples

### $ 3.1 $ The `tuple` type

Another sequential data type is `tuple`, the type of __tuples__. Like a list, a tuple is a sequence of non-negative length of objects of arbitrary types, separated by commas. However, tuples are enclosed by _parentheses_ `()` instead of brackets. Also, tuples are __immutable__ (like strings), so that their individual elements _cannot_ be modified.

### $ 3.2 $ Operations on tuples

As for the other sequential types that we have considered (strings and lists), tuples can be concatenated with `+`, their length can be retrieved using `len`, and their elements and slices can be accessed using `[]` and the `:` operator.

__Example:__

In [None]:
# Each of the following tuples records some data about famous scientists:
record_1 = ('Albert', 'Einstein', 'physicist', 26, 'Germany')
record_2 = ('Marie', 'Curie', 'chemist', 32, 'Poland')
record_3 = ('Charles', 'Darwin', 'biologist', 50, 'England')

# Each of them is indeed of type 'tuple':
print(type(record_1))

# Accessing individual elements:
print(record_1[0])
print(record_2[2])
print(record_3[-1])

<class 'tuple'>
Albert
chemist
England


In [None]:
# Slicing:
full_name = record_1[:2]
print(full_name)

('Albert', 'Einstein')


In [None]:
# To convert a tuple to a list, use 'list' as a function:
data = list(record_1)
print(data, type(data))

# Similarly, to convert a list to a tuple, use 'tuple' as a function:
philosophers = ["Plato", "Aristotle", "Seneca", "Socrates"]
tuple_of_philosophers = tuple(philosophers)
print(tuple_of_philosophers, type(tuple_of_philosophers))

['Albert', 'Einstein', 'physicist', 26, 'Germany'] <class 'list'>
('Plato', 'Aristotle', 'Seneca', 'Socrates') <class 'tuple'>


### $ 3.3 $ Some warnings

⚠️ To define a tuple consisting of a single item, a comma must still be used, so that the tuple can be disambiguated from an expression surrounded by parentheses:

In [None]:
language = ('Sindarin', )         # To define a tuple, we must include a comma!
print(language, type(language))

lang = ('Sindarin')               # This is not a tuple, but rather a string;
print(lang, type(lang))           # the parentheses play no role in this case.


('Sindarin',) <class 'tuple'>
Sindarin <class 'str'>


🚫 Since a tuple is _immutable_, an attempt to modify one or more of its elements results in a `TypeError`:

In [None]:
coordinates = (1.234, 5.678)
coordinates[0] = 0.123

TypeError: 'tuple' object does not support item assignment

⚠️ We emphasize that even if $ x $ and $ y $ are two tuples or lists of the same length and whose items are  of the same numerical type, `x + y` is *not* obtained by summing their respective elements; it is instead the *concatenation* of $ x $ and $ y $. Similarly, if $ a $ is a scalar, then `a * x` is *not* obtained by multiplying each item of $ x $ by $ a $, even if $ a $ is an integer.

<div class="alert alert-warning">Neither lists nor tuples are adequate data structures to represent <b>vectors</b> (in the sense of linear algebra). The most adequate type for this task is an <b>array</b> (type: <code>array</code>, provided by the <a href="https://scipy.github.io/old-wiki/pages/Numpy_Example_List.html"><b>NumPy</b></a> module), which we will consider later.</div>

📝 **Question:** Do we really need both lists and tuples? 

*Answer:* No, strictly speaking we could always get by using only one of them. However, the versatility has many advantages.

📝 **Question:** What is the difference between lists and tuples?

*Answer:* The main difference is that lists are _mutable_ while tuples are _immutable_. In particular:
* We cannot modify the value of individual elements of a tuple as we can with lists.
* We cannot remove or add elements to a tuple. Tuples have no methods equivalent to `append`, `pop`, `insert`, `remove`, etc.. In particular, tuples have a fixed length. Even though it is possible to assign a tuple to a variable and then assign another tuple of different length to the same variable, this is not the same operation as modifying the original tuple.
* Tuples are generally a bit 'faster' than lists.
* Because tuples are immutable, they offer a better choice when storing information which should be protected from modification to avoid unforeseen behavior.

