# Chapter 3. Strings and Collections

Python includes a rich selection of collection types.

* str
* bytes
* list
* dict

## 3.1 str

### 3.1.1 Immutable sequence of Unicode codepoints

### 3.1.2 Either single quotes or double quotes.

"Practicality beats purity"

"Beautiful text strings  
Rendered in literal form  
Simple elegance"

In [1]:
'This is a string.'

'This is a string.'

In [2]:
"This is also a string."

'This is also a string.'

In [3]:
"It's a good thing."

"It's a good thing."

In [4]:
'"Yes!", he said, "I agree!"'

'"Yes!", he said, "I agree!"'

### 3.1.3 Concatenation

In [6]:
"first" "second"

'firstsecond'

### 3.1.4 Strings with newlines

PEP 278 -- Universal newlines in Python '\n' = '\r\n' on Windows

(1) Multiline strings


In [8]:
"""This is
a multiline
string"""

'This is\na multiline\nstring'

In [9]:
'''so
is
this.'''

'so\nis\nthis.'

(2) Escape sequences

In [11]:
m = 'This string\nspans multiple\nlines.'
m

'This string\nspans multiple\nlines.'

In [13]:
print(m)

This string
spans multiple
lines.


In [15]:
"This is a \" in a string."

'This is a " in a string.'

In [17]:
'This is a \' in a string.'

"This is a ' in a string."

In [19]:
'This is a \" and a \' in a string.'

'This is a " and a \' in a string.'

In [21]:
k = 'A \\ in a string.'
k

'A \\ in a string.'

In [23]:
print(k)

A \ in a string.


### 3.1.5 Raw strings

Prefix single quotes or double quotes with 'r'.

In [25]:
path = r'C:\Users\Merlin\Documents\Spells'
path

'C:\\Users\\Merlin\\Documents\\Spells'

In [27]:
print(path)

C:\Users\Merlin\Documents\Spells


### 3.1.6 String constructor

In [28]:
str(496)

'496'

In [29]:
str(6e23)

'6e+23'

### 3.1.7 Sequence type

No separate character type

In [31]:
s = 'parrot'
s[4]

'o'

In [32]:
type(s[4])

str

### 3.1.8 String methods

In [33]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt

In [34]:
c = 'oslo'
c.capitalize()

'Oslo'

In [35]:
c

'oslo'

### 3.1.9 Python strings are Unicode.

Default source encoding: UTF-8

In [36]:
'我爱Python'

'我爱Python'

In [38]:
'我爱\u00e5Python\u00e5'

'我爱åPythonå'

In [39]:
'\xe5'

'å'

In [40]:
'\345'

'å'

## 3.2 Bytes

### 3.2.1 Immutable sequence of bytes

In [41]:
b'data'

b'data'

In [42]:
b"data"

b'data'

In [43]:
d = b'some bytes'
d.split()

[b'some', b'bytes']

### 3.2.2 Converting between strings and bytes

To convert between bytes and strings, we must know the encoding of the byte sequence used to represent the string's Unicode codepoints are bytes.

In [45]:
chi = "我爱Python"
data = chi.encode("utf-8")
data

b'\xe6\x88\x91\xe7\x88\xb1Python'

In [46]:
chi_out = data.decode("utf-8")
chi_out

'我爱Python'

In [48]:
chi_out == chi

True

## 3.3 Lists

### 3.3.1 Mutable sequences of objects

In [49]:
[1, 9, 8]

[1, 9, 8]

In [51]:
a = ["apple", "orange", "pear"]
a[1]

'orange'

### 3.3.2 List can be heterogeneous with respect to the type of the objects.

In [52]:
a[1] = 7
a

['apple', 7, 'pear']

In [53]:
b = []
b.append(1.618)
b

[1.618]

In [54]:
b.append(1.414)
b

[1.618, 1.414]

### 3.3.3 List constructor

In [55]:
list("character")

['c', 'h', 'a', 'r', 'a', 'c', 't', 'e', 'r']

In [56]:
c = ['bear',
     'giraffe',
     'elephant',
     'caterpillar',]
c

['bear', 'giraffe', 'elephant', 'caterpillar']

### 3.4 Dictionaries

### 3.4.1 Mutable mappings of keys to values

A.k.a. associative arrays in some other programming language.

### 3.4.2 Dict literals

In [57]:
d = {'alice': '878-8728-922', 'bob': '256-5262-124', 'eve': '192-2321-787'}
d['alice']

'878-8728-922'

In [58]:
d['alice'] = '966-4532-6272'
d

{'alice': '966-4532-6272', 'bob': '256-5262-124', 'eve': '192-2321-787'}

In [59]:
d['charles'] = '334-5551-913'
d

{'alice': '966-4532-6272',
 'bob': '256-5262-124',
 'charles': '334-5551-913',
 'eve': '192-2321-787'}

In [60]:
e = {}

## 3.5 For-Loops

More like "for-each" in some other programming language.

```python
for item in iterable:
    ...body...
```

### 3.5.1 Iterate over a list.

In [61]:
cities = ['London', 'New York', 'Paris', 'Oslo', 'Helsinki']
for city in cities:
    print(city)

London
New York
Paris
Oslo
Helsinki


### 3.5.2 Iterate over a dict.

In [62]:
colors = {'crimson': 0xdc143c, 'coral': 0xff7f50, 'teal': 0x008080}
for color in colors:
    print(color, colors[color])

crimson 14423100
coral 16744272
teal 32896


## 3.6 Putting it all together

In [63]:
from urllib.request import urlopen
# Using with statement is a good practice of avoiding resource leaks.
with urlopen('http://sixty-north.com/c/t.txt') as story:
    story_words = []
    for line in story:
        line_words = line.split()
        for word in line_words:
            story_words.append(word)
            
story_words

[b'It',
 b'was',
 b'the',
 b'best',
 b'of',
 b'times',
 b'it',
 b'was',
 b'the',
 b'worst',
 b'of',
 b'times',
 b'it',
 b'was',
 b'the',
 b'age',
 b'of',
 b'wisdom',
 b'it',
 b'was',
 b'the',
 b'age',
 b'of',
 b'foolishness',
 b'it',
 b'was',
 b'the',
 b'epoch',
 b'of',
 b'belief',
 b'it',
 b'was',
 b'the',
 b'epoch',
 b'of',
 b'incredulity',
 b'it',
 b'was',
 b'the',
 b'season',
 b'of',
 b'Light',
 b'it',
 b'was',
 b'the',
 b'season',
 b'of',
 b'Darkness',
 b'it',
 b'was',
 b'the',
 b'spring',
 b'of',
 b'hope',
 b'it',
 b'was',
 b'the',
 b'winter',
 b'of',
 b'despair',
 b'we',
 b'had',
 b'everything',
 b'before',
 b'us',
 b'we',
 b'had',
 b'nothing',
 b'before',
 b'us',
 b'we',
 b'were',
 b'all',
 b'going',
 b'direct',
 b'to',
 b'Heaven',
 b'we',
 b'were',
 b'all',
 b'going',
 b'direct',
 b'the',
 b'other',
 b'way',
 b'in',
 b'short',
 b'the',
 b'period',
 b'was',
 b'so',
 b'far',
 b'like',
 b'the',
 b'present',
 b'period',
 b'that',
 b'some',
 b'of',
 b'its',
 b'noisiest',
 b'authori

In [64]:
from urllib.request import urlopen
# Using with statement is a good practice of avoiding resource leaks.
with urlopen('http://sixty-north.com/c/t.txt') as story:
    story_words = []
    for line in story:
        line_words = line.decode('utf-8').split()
        story_words += line_words
        
story_words

['It',
 'was',
 'the',
 'best',
 'of',
 'times',
 'it',
 'was',
 'the',
 'worst',
 'of',
 'times',
 'it',
 'was',
 'the',
 'age',
 'of',
 'wisdom',
 'it',
 'was',
 'the',
 'age',
 'of',
 'foolishness',
 'it',
 'was',
 'the',
 'epoch',
 'of',
 'belief',
 'it',
 'was',
 'the',
 'epoch',
 'of',
 'incredulity',
 'it',
 'was',
 'the',
 'season',
 'of',
 'Light',
 'it',
 'was',
 'the',
 'season',
 'of',
 'Darkness',
 'it',
 'was',
 'the',
 'spring',
 'of',
 'hope',
 'it',
 'was',
 'the',
 'winter',
 'of',
 'despair',
 'we',
 'had',
 'everything',
 'before',
 'us',
 'we',
 'had',
 'nothing',
 'before',
 'us',
 'we',
 'were',
 'all',
 'going',
 'direct',
 'to',
 'Heaven',
 'we',
 'were',
 'all',
 'going',
 'direct',
 'the',
 'other',
 'way',
 'in',
 'short',
 'the',
 'period',
 'was',
 'so',
 'far',
 'like',
 'the',
 'present',
 'period',
 'that',
 'some',
 'of',
 'its',
 'noisiest',
 'authorities',
 'insisted',
 'on',
 'its',
 'being',
 'received',
 'for',
 'good',
 'or',
 'for',
 'evil',
 'i