# Sequences

[Strings](#strings)
  
- [Basics](#string-basics)

- [Indexing and slicing](#indexing-slicing)

- [String methods](#string-methods)

- [Conversion and formatting](#conversion-formatting)

[Lists](#lists)

- [Basics](#list-basics)

- [List methods](#list-methods)

- [Sorting](#sorting)

[Dictionaries](#dictionary)

- [Basics](#dict-basics)

- [Setting and retrieving values](#dict-use)

## Strings
<a id='strings'></a>

###  Basics
<a id='string-basics'></a>

The string type in Python represents strings as a sequence of Unicode characters. String literals are enclosed in single or double quotes, these are freely interchangeable but single quotes are preferred by some.

In [4]:
b1 = "penguin" == 'penguin'
b2 = "penguin" is "penguin"

print(f'strings match by value: {b1}')
print(f'both literals reference the same object: {b2}')

s1 = ""

strings match by value: True
both literals reference the same object: True


In [5]:
'ű'

'ű'

Strings are immutable types:

In [6]:
s1 = "a penguin and a giraffe"
s2 = s1

In [7]:
id(s1)

4446901496

In [8]:
id(s2)

4446901496

In [9]:
s1 += ' '

In [10]:
s1

'a penguin and a giraffe '

In [11]:
s2

'a penguin and a giraffe'

Some control characters are represented in literals with escaped sequences (prefixed with a backslash "\")

In [12]:
"\n" # newline
"\t" # tab
"\r" # CR
'\'' # single quote
"\"" # double quote
"\\" # backslash
# "\ooo" # character with octal value ooo
# "\xhh" # character with hex value hh

print ("This line \t has tabs \t and a newline\n")

This line 	 has tabs 	 and a newline



### Indexing and slicing
<a id='indexing-slicing'></a>

Individual characters in a string can be accessed via __indexing__. Indexing in python is "one-off", numbering starts at zero:

In [13]:
s = "penguin"

In [14]:
s[0]

'p'

In [15]:
s[3]

'g'

Negative indices count characters from the back of the string:

In [16]:
s[-1]

'n'

In [17]:
s[-3]

'u'

It is also possible to refer to a sequence of characters, this is known as slicing. Note that the second number indicates the first element that is _not_ included. This takes getting used to, but has the advantage that the length of any slice i:j is j-i.

In [18]:
s[1:3]

'en'

In [19]:
s[2:-1]

'ngui'

If either number is ommited from the slicing syntax, it refers to the beginning or end of the string:

In [20]:
s[:3]

'pen'

In [21]:
s[3:]

'guin'

In [22]:
s[:-3]

'peng'

In [23]:
s[-3:]

'uin'

If either index is larger than the length of the string, it is considered equal to it.

In [24]:
s[:100]

'penguin'

In other words, $[i:j]$ really means: "from the i-th element to the last element before the j-th, or to the end of the string".

It is also possible to only include every n-th character in a slice, this is achieved via a second colon and a third number:

In [25]:
s[1:6:2]

'egi'

In [26]:
s[::2]

'pnun'

### String methods
<a id='string-methods'></a>

String objects offer a wide variety of built-in methods for string manipulation, we will now list a few of them.

The __replace__ method of strings will replace all occurences of some character in a string with another:

In [40]:
s = "This\ttext\tcontains\ttabs\tinstead\tof\tspaces"

In [41]:
print(s)

This	text	contains	tabs	instead	of	spaces


In [42]:
s.replace('\t', ' ')

'This text contains tabs instead of spaces'

The _replace_ method can also be used to delete all occurrences of some character:

In [43]:
s.replace('\t', '')

'Thistextcontainstabsinsteadofspaces'

<hr />

The __strip__ method will remove all occurences of some character from the edges of a string, or all whitespace if no arguments are given

In [44]:
s = " Here's a text with whitespace at each end\n"

In [45]:
s.strip()

"Here's a text with whitespace at each end"

In [46]:
s = "*** This text has stars at each end ***"

In [47]:
s.strip("*")

' This text has stars at each end '

We can pass multiple characters to _strip_ in the form of a string and it will remove all occurences of each character.

In [48]:
s.strip("* ")

'This text has stars at each end'

<hr />

The __split__ method splits the string on all whitespace if no arguments are passed, or on a specific string, and returns a _list_ of the resulting substrings:

In [49]:
s = "This line\tcontains both\t\tspaces and tabs\n"

In [50]:
print(s)

This line	contains both		spaces and tabs



In [51]:
s.split()

['This', 'line', 'contains', 'both', 'spaces', 'and', 'tabs']

In [52]:
s.split('\t')

['This line', 'contains both', '', 'spaces and tabs\n']

In [53]:
s.split('\t\t')

['This line\tcontains both', 'spaces and tabs\n']

As we have seen earlier, the _split()_ method is limited and sometimes using _re.split()_ is a better option.

<hr />

The inverse operation of split, __join__, is also a string method:

In [54]:
l = ["Here's", "a", "list", "of", "words", "waiting", "to", "be", "joined"]

In [55]:
" ".join(l)

"Here's a list of words waiting to be joined"

In [56]:
"***".join(l)

"Here's***a***list***of***words***waiting***to***be***joined"

<hr />

The functions __lower__ and __upper__ will return lowercased and uppercased versions of a string

In [57]:
"Penguin".lower()

'penguin'

In [58]:
"Penguin".upper()

'PENGUIN'

The __title__ function capitalizes the first letter only:

In [59]:
"penguin".title()

'Penguin'

These functions will leave strings unchanged if they are already uppercased/lowercased/titlecased:

In [60]:
"PENGUIN".upper()

'PENGUIN'

There are also functions to check for these properties of strings:

In [61]:
"PENGUIN".isupper()

True

In [62]:
"Penguin".isupper()

False

In [63]:
"Penguin".istitle()

True

In [64]:
"Penguin".islower()

False

<hr />

Some more boolean functions on strings include __isalpha__ and __isalnum__, which check whether a string contains alphabetical characters only, or alphanumeric characters only:

In [65]:
"penguin".isalpha()

True

In [66]:
"penguin221".isalpha()

False

In [67]:
"penguin".isalnum()

True

In [68]:
"penguin221".isalnum()

True

To find out if some string is contained in another, the keyword __in__ may be used:

In [69]:
"gui" in "penguin"

True

In [70]:
"gi" in "penguin"

False

The functions __startswith__ and __endswith__ will only check the beginning and end of a string, respectively:

In [71]:
"penguin".startswith("pen")

True

In [72]:
"penguin".endswith("in")

True

These are more convenient than slicing, since strings that are too short will not raise errors:

In [73]:
s = "penguin"
t = ""

In [74]:
s[-1] == "n"

True

In [75]:
t[-1] == "n"

IndexError: string index out of range

In [76]:
t.endswith("n")

False

### Conversion and formatting
<a id='conversion-formatting'></a>

Most Python types can be converted to strings without any problems:

In [77]:
str(3)

'3'

In [78]:
str([1, 2])

'[1, 2]'

In [79]:
str(True)

'True'

This and string concatenation already provide a straight-forward way for printing any value:

In [80]:
i = 3
print("The value of i is " + str(i) + ".")

The value of i is 3.


A much more convenient way to achieve this is __string formatting__. The format method of a string will substitute arguments of arbitrary type into the string, provided that the string conforms to the syntax of __format strings__.

In [81]:
name = "John"
age = 8

In [82]:
print("{0} is {1} years old".format(name, age))

John is 8 years old


Format instructions in a string are enclosed in curly braces. Numbers indicate the position of the argument that is to be substituted at the given position in the string. Compare:

In [83]:
print("{1} is {0} years old".format(name, age))

8 is John years old


If numbers are omitted, variables will be substituted in left-to-right order:

In [84]:
print("{} is {} years old".format(name, age))

John is 8 years old


Format strings may also refer to named (keyword) arguments by their names:

In [85]:
print("{name} is {age} years old".format(name="John", age=8))

John is 8 years old


The most useful thing about format strings is that they allow us to specify the way in which non-string types are formatted. This is achieved by writing a colon (:) after the number or name of the varaible in the format string, and then providing a __format specification__.

Format specifications can control many different properties of how a string is displayed, including alignment, number of decimal places, and a choice of number formats. All options are documented in the official Python manual under the section "[Format Specification Mini-language](https://docs.python.org/3.7/library/string.html#formatspec)". Below we show only a few examples.

This example instructs the _format_ method to only print a float to 4 digits:

In [86]:
import math

In [87]:
print("The value of Pi is {0:.4f}".format(math.pi))

The value of Pi is 3.1416


The $.4$ part of the above example configures precision, the $f$ following it sets the number type to __fixed point__. The number is correctly rounded to the specified number of decimal digits.

Another example will treat the variable as a percentage (multiplying it by 100 and printing a % sign).

In [88]:
print("The current annual interest rate is {0:.2%}".format(0.0525))

The current annual interest rate is 5.25%


The last two examples show format strings that will cause a string to be centered or right-aligned (a display width must be specified)

In [89]:
print('The text below is\n{0:^30}'.format('centered'))

The text below is
           centered           


In [90]:
print('The text below is\n{0:>30}'.format('right-aligned'))

The text below is
                 right-aligned


## Lists
<a id='lists'></a>

### Basics
<a id='lists-basics'></a>

We've seen earlier that lists can hold elements of arbitrary types, and that they are mutable types, i.e. its elements may be changed without creating a new list object.

Indexing and slicing work for lists the same way they do for strings:

In [91]:
l = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']

In [92]:
l[3]

'd'

In [93]:
l[2:5]

['c', 'd', 'e']

In [94]:
l[-2:]

['k', 'l']

In [95]:
l[:4]

['a', 'b', 'c', 'd']

Since strings are mutable, indexing and slicing can also be used to change one or several elements in the list:

In [96]:
l[1] = 'x'

In [97]:
l

['a', 'x', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']

In [98]:
l[2:4] = ['y', 'z']

In [99]:
l

['a', 'x', 'y', 'z', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']

### List methods
<a id='list-methods'></a>

The __append__ function can be used to add a new element to the end of a list:

In [100]:
l.append("m")

In [101]:
l

['a', 'x', 'y', 'z', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm']

The __extend__ function adds all elements from another list to the end of a list:

In [102]:
m = ['n', 'o', 'p', 'q']

In [103]:
l.extend(m)

In [104]:
l

['a',
 'x',
 'y',
 'z',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q']

The addition operation (+) creates a new list using all elements of the first list, then all elements of the second list:

In [105]:
l + m

['a',
 'x',
 'y',
 'z',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'n',
 'o',
 'p',
 'q']

The __insert__ method of lists can be used to insert an element at a given position:

In [106]:
l.insert(1, 'b')

In [107]:
l

['a',
 'b',
 'x',
 'y',
 'z',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q']

The __remove__ command removes the first occurrence of a given element from a list (an error is raised if the element is not in the list at all):

In [108]:
l.remove('x')

In [109]:
l

['a',
 'b',
 'y',
 'z',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q']

The __pop__ method will return a list element at a given position and also remove it from the list. If no index is specified, the last element is removed:

In [110]:
l.pop()

'q'

In [111]:
l

['a',
 'b',
 'y',
 'z',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p']

In [112]:
l.pop(3)

'z'

In [113]:
l

['a', 'b', 'y', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']

Alternatively, the keyword __del__ will remove an element based on its index, but not return it:

In [114]:
del l[2]

In [115]:
l

['a', 'b', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']

This also works on entire slices of a list:

In [116]:
del l[2:5]

In [117]:
l

['a', 'b', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']

The __index__ method will return the position in the list where a given element occurs for the first time. An error is raised if the element is not in the list.

In [118]:
l.index('b')

1

In [119]:
l.index(9)

ValueError: 9 is not in list

The __count__ method of lists returns the number of times a given element occurs in a list:

In [120]:
l.count('m')

1

### 4.2.3 Sorting
<a id='4.2.3'></a>

The __sort__ method will sort the list __in place__:

In [121]:
l = [ "f", "a", "z", "l", "b", "e"]
l.sort()

In [122]:
l

['a', 'b', 'e', 'f', 'l', 'z']

It's counterpart, __sorted__, will return a new list object, the original unsorted list remains unaffected:

In [123]:
l = [3,8,2,7,3,1,7,13,1]

In [124]:
sorted(l)

[1, 1, 2, 3, 3, 7, 7, 8, 13]

In [125]:
l

[3, 8, 2, 7, 3, 1, 7, 13, 1]

NOTE: these sort functions can take an arbitrary function as their _key_ parameter to specify a function of each element that is to be used for sorting:

In [126]:
sorted(l, key=lambda x: -x)

[13, 8, 7, 7, 3, 3, 2, 1, 1]

In [127]:
sorted(l, key=lambda x: x%7)

[7, 7, 8, 1, 1, 2, 3, 3, 13]

## Dictionaries
<a id='dictionaries'></a>

### Basics
<a id='dictionary-basics'></a>

__Dictionaries__ in Python are one of the most commonly used types for storing structured data. A dictionary is a map from keys to values, in some programming languages it is known as an associative array.

Here's a sample dictionary mapping persons to their age:

In [128]:
d = {"John": 8, "Mary": 11, "Susan": 15}

Any value can be retrieved by indexing the dictionary with a key:

In [129]:
d['John']

8

New key-value pairs are added in a similar fashion:

In [130]:
d['Jack'] = 12

In [131]:
d

{'John': 8, 'Mary': 11, 'Susan': 15, 'Jack': 12}

Dictionaries can be constructed from any sequence of pairs using the constructor __dict__, e.g.:

In [132]:
dict([('Jack', 12), ('John', 8), ('Mary', 11), ('Susan', 15)])

{'Jack': 12, 'John': 8, 'Mary': 11, 'Susan': 15}

The keyword __in__, which we have so far used to test membership of an element is lits, tuples or sets, also works with dictionaries and determines whether a key is present (values are not considered):

In [133]:
"Jack" in d

True

In [134]:
12 in d

False

Values of a dictionary may be of arbitrary type, but keys must be hashable, e.g. we can't use lists:

In [135]:
d[[1,2]] = 3

TypeError: unhashable type: 'list'

Tuples work, however, they can be used if keys need to have structure:

In [136]:
d[(1,2)] = 3

Dictionary elements can be removed using the keyword __del__:

In [137]:
del d[(1,2)]

There are methods to retrieve the list of all keys, all values, or all pairs in a dictionary:

In [138]:
d.keys()

dict_keys(['John', 'Mary', 'Susan', 'Jack'])

In [139]:
d.values()

dict_values([8, 11, 15, 12])

In [140]:
d.items()

dict_items([('John', 8), ('Mary', 11), ('Susan', 15), ('Jack', 12)])

There are multiple ways to iterate over elements of a dictionary. If used as the iterator in a for loop, it will expose its keys:

In [141]:
for k in d:
    print(k)

John
Mary
Susan
Jack


Additionally, there are methods that return iterators over values or key-value items:

In [142]:
for k in d.values():
    print(k)

8
11
15
12


In [143]:
for k, v in d.items():
    print(k, v)

John 8
Mary 11
Susan 15
Jack 12


### Setting and retrieving values
<a id='dict-use'></a>

It is very common to have to access a dictionary without knowing whether some key is present in it or not. There are several different ways to do this:

In [144]:
def lookup(d, key):
    if not key in d:
        print("Key not in dict!")
    else:
        return d[key]

In [145]:
lookup(d, "John")

8

In [146]:
lookup(d, "Jill")

Key not in dict!


In [147]:
def lookup2(d, key):
    try:
        return d[key]
    except KeyError:
        print("Key not in dict!")

In [148]:
lookup2(d, "John")

8

In [149]:
lookup(d, "Jill")

Key not in dict!


Dictionaries also offer the __get__ method, which returns the value for some key if the key is present in the dictionary and a default element otherwise:

In [150]:
def lookup3(d, key):
    return d.get(key, "default")

In [151]:
lookup3(d, "John")

8

In [152]:
lookup3(d, "Jill")

'default'

Without a second argument, _get_ returns None for keys that are not in the dictionary:

In [153]:
print(d.get("Jill"))

None


Often we'd like to ensure that once a new key is encountered, it is added to the dictionary with some default value. When combining this with lookup, a straightforward implementation would be the following:

In [154]:
def lookup4(d, key):
    if not key in d:
        d[key] = "default"
    return d[key]

In [155]:
d

{'John': 8, 'Mary': 11, 'Susan': 15, 'Jack': 12}

In [156]:
lookup4(d, "Jill")

'default'

In [157]:
d

{'John': 8, 'Mary': 11, 'Susan': 15, 'Jack': 12, 'Jill': 'default'}

This behaviour is implemented by the dictionary method __setdefault__, which takes an optional second argument specifying the default value to be set (and returned):

In [158]:
d.setdefault("John", "default")

8

In [159]:
d.setdefault("Michelle", "default")

'default'

In [160]:
d

{'John': 8,
 'Mary': 11,
 'Susan': 15,
 'Jack': 12,
 'Jill': 'default',
 'Michelle': 'default'}

Finally, dictionaries that exhibit this behaviour upon plain lookup are available via the __defaultdict__ type of the collections module.

A dictionary's items can also be changed by calling the __update__ method on another dictionary or any iterable of pairs. Values are overwritten for keys that were already present in the dictionary, while new keys are also added.

In [161]:
d

{'John': 8,
 'Mary': 11,
 'Susan': 15,
 'Jack': 12,
 'Jill': 'default',
 'Michelle': 'default'}

In [162]:
e = {"John": 16, "Jill": 17, "Roger": 9}

In [163]:
d.update(e)

In [164]:
d

{'John': 16,
 'Mary': 11,
 'Susan': 15,
 'Jack': 12,
 'Jill': 17,
 'Michelle': 'default',
 'Roger': 9}