# Strings and Bits
The first part of the chapter discusses *Unicode* and how python handles them. Also, according to the author, using `UTF-8` is almost always the way to go.

In [2]:
# old style
a = "help"
b = "me"
c = 30

"%s %s I'm %d" % (a, b, c)

"help me I'm 30"

In [3]:
# new style
"{} {} I'm {}".format(a, b, c)

"help me I'm 30"

In [5]:
# combine three values into a dictionary
d = {'a': 'help', 'b': 'me', 'c': 30}

"{0[a]} {0[b]} I'm {0[c]} {1}".format(d, 'years old')

"help me I'm 30 years old"

In [6]:
# one more example
"{0:!^10}".format('Help')

'!!!Help!!!'

## Regular Expressions
`re` is the standard module

In [7]:
import re

In [8]:
# exact match with match()
source = 'Young Frankenstein'
m = re.match('You', source)

if m:
    print(m.group())

You


`match()` starts at the beginning of the source. To find 'Frank', we'll have to use `search()`.

In [10]:
m = re.search('Frank', source)

if m:
    print(m.group())

Frank


In [11]:
# Now, let's use match() again but change the pattern
m = re.match('.*Frank', source)

if m:
    print(m.group())

Young Frank


In [12]:
# findall()
m = re.findall('n', source)
m

['n', 'n', 'n', 'n']

In [14]:
print('Found', len(m), 'matches.')

Found 4 matches.


In [15]:
# n followed by any character
m = re.findall('n.', source)
m

['ng', 'nk', 'ns']

In [16]:
# didn't find the last n, so make the following character '.' optional using ?
m = re.findall('n.?', source)
m

['ng', 'nk', 'ns', 'n']

In [17]:
# use split()
m = re.split('n', source)
m

['You', 'g Fra', 'ke', 'stei', '']

In [18]:
# replace matches with sub
m = re.sub('n', '?', source)
m

'You?g Fra?ke?stei?'

Now, let's test out more advanced pattern matching. We'll import `string`, which has predefined strings.

In [30]:
import string
from pprint import pprint

In [20]:
printable = string.printable

In [21]:
printable[0:50]

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN'

In [22]:
printable[50:]

'OPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

In [23]:
# let's find the digits
re.findall('\d', printable)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [33]:
# which are digits, characters, or underscores
m = re.findall('\w', printable)
pprint(m)

['0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'Q',
 'R',
 'S',
 'T',
 'U',
 'V',
 'W',
 'X',
 'Y',
 'Z',
 '_']


In [36]:
# what about spaces
re.findall('\s', printable)

[' ', '\t', '\n', '\r', '\x0b', '\x0c']

In [39]:
# matching is not confined to ASCII, but any Unicode
# to test, we'll add non-ASCII characters
x = 'abc' + '-/*' + '\u00ea' + '\u0115'

In [40]:
re.findall('\w', x)

['a', 'b', 'c', 'ê', 'ĕ']

More examples using specifiers.

In [41]:
source = '''I wish I may, I wish I might
Have a dish of fish tonight.'''

In [42]:
# find wish anywhere
re.findall('wish', source)

['wish', 'wish']

In [43]:
# find fish anywhere
re.findall('fish', source)

['fish']

In [44]:
# find dish or fish
re.findall('wish|fish', source)

['wish', 'wish', 'fish']

In [45]:
# find I wish at the beginning
re.findall('^I wish', source)

['I wish']

In [48]:
# find fish tonight. at the end
re.findall('fish tonight.$', source) # escaping the period \. would be more clear

['fish tonight.']

In [49]:
# find w or f followed by ish
re.findall('[fw]ish', source)

['wish', 'wish', 'fish']

In [50]:
# w, s, or h
re.findall('[wsh]', source)

['w', 's', 'h', 'w', 's', 'h', 'h', 's', 'h', 's', 'h', 'h']

In [52]:
# same as above, but use +
re.findall('[wsh]+', source)

['w', 'sh', 'w', 'sh', 'h', 'sh', 'sh', 'h']

In [53]:
# find ght followed by a non-alphanumeric
re.findall('ght\W', source)

['ght\n', 'ght.']

In [54]:
# find I followed by wish
re.findall('I (?=wish)', source)

['I ', 'I ']

In [60]:
# find wish preceded by I
re.findall('(?<=I )wish', source)

['wish', 'wish']

Sometime regular expression patterns and Python's string rules.

In [61]:
# should match any word that begins with fish
re.findall('\bfish', source)

[]

In [62]:
# to avoid this, convert to a Python raw string using r
re.findall(r'\bfish', source)

['fish']

Other cool features for matching output on page 163.

## Binary Data

Python 3 introduced the following sequences of eight-bit integers, with possible values from 0 to 255, in two types:

* *bytes* is immutable, like a tuple of bytes
* *bytearray* is mutabl, like a list of bytes

In [63]:
blist = [1, 2, 3, 255]

In [64]:
the_bytes = bytes(blist)

In [65]:
the_bytes

b'\x01\x02\x03\xff'

In [66]:
the_byte_array = bytearray(blist)

In [67]:
the_byte_array

bytearray(b'\x01\x02\x03\xff')

In [69]:
# remember that bytes are immutable
the_bytes[1] = 1

TypeError: 'bytes' object does not support item assignment

In [70]:
# but the byte array is not
print(the_byte_array)
the_byte_array[1] = 127
print(the_byte_array)

bytearray(b'\x01\x02\x03\xff')
bytearray(b'\x01\x7f\x03\xff')


### Discussion of `struct`
`struct` is a module for converting bytes to Python data structures